Introduction to Machine Learning (I2ML)
This website offers an open and free introductory course on (supervised) machine learning. The course is constructed as self-contained as possible, and enables self-study through lecture videos, PDF slides, cheatsheets, quizzes, exercises (with solutions), and notebooks.
The quite extensive material can roughly be divided into an introductory undergraduate part (chapters 1-10) and a more advanced second one on MSc level (chapters 11-20). At the LMU Munich we teach both parts in an inverted-classroom style (B.Sc. lecture “Introduction to ML” and M.Sc. lecture “Supervised Learning”). While the first part aims at a practical and operational understanding of concepts, the second part discusses focuses on theoretical foundations and more complex algorithms.
Why another ML course: A key goal of the course is to teach the fundamental building blocks behind ML, instead of introducing “yet another algorithm with yet another name”. We discuss, compare and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint and information theory and demonstrate that all of these are equally valid entry points to ML. Developing the ability to take on and switch between these perspectives is a major goal of this course, and in our opinion not always ideally presented in other courses.
We also want this course not only to be open, but open source.
What is not covered: (1) An in-depth coverage of deep learning, we offer this in our course Introduction to Deep Learning. (2) An in-depth coverage of optimization - we are working on a separate course for optimization.
- Chapter 1: ML Basics
- Chapter 2: Supervised Regression
- Chapter 03: Supervised Classification
- Chapter 04: Performance Evaluation
- Chapter 04.01: Generalization Error
- Chapter 04.02: Measures Regression
- Chapter 04.03: Training Error
- Chapter 04.04: Test Error
- Chapter 04.05: Overfitting & Underfitting
- Chapter 04.06: Resampling 1
- Chapter 04.07: Resampling 2
- Chapter 04.08: Measures Classification
- Chapter 04.09: ROC Basics
- Chapter 04.10: ROC Curves
- Chapter 04.11: Partial AUC & Multi-Class AUC
- Chapter 04.12: Precision-Recall Curves
- Chapter 04.13: AUC & Mann-Whitney-U Test
- Chapter 05: k-Nearest Neighbors (k-NN)
- Chapter 06: Classification and Regression Trees (CART)
- Chapter 07: Random Forests
- Chapter 08: Neural Networks
- Chapter 08.01: Introduction
- Chapter 08.02: Single Neuron
- Chapter 08.03: Single Hidden Layer NN
- Chapter 08.04: Single Hidden Layer Networks for Multi-Class Classification
- Chapter 08.05: MLP: Multi-Layer Feedforward Neural Networks
- Chapter 08.06: Chain Rule and Computational Graphs
- Extra: Brief History
- Extra: Basic Backpropagation 1
- Chapter 09: Tuning
- Chapter 10: Nested Resampling
- Chapter 11: Advanced Risk Minimization
- Chapter 11.01: Risk Minimizers
- Chapter 11.02: Pseudo-Residuals
- Chapter 11.03: L2 Loss
- Chapter 11.04: L1 Loss
- Chapter 11.05: Advanced Regression Losses
- Chapter 11.06: 0-1 Loss
- Chapter 11.07: Bernoulli Loss
- Chapter 11.08: Brier Score
- Chapter 11.09: Advanced Classification Losses
- Chapter 11.10: Maximum Likelihood Estimation vs Empirical Risk Minimization I
- Chapter 11.11: Maximum Likelihood Estimation vs Empirical Risk Minimization II
- Chapter 12: Multiclass Classification
- Chapter 13: Information Theory
- Chapter 13.01: Entropy
- Chapter 13.02: Differential Entropy
- Chapter 13.03: Kullback-Leibler Divergence
- Chapter 13.04: Entropy and Optimal Code Length
- Chapter 13.05: Cross-Entropy, KL and Source Coding
- Chapter 13.06: Information Theory for Machine Learning
- Chapter 13.07: Joint Entropy and Mutual Information
- Chapter 14: Curse of Dimensionality
- Chapter 15: Hypothesis Spaces
- Chapter 16: Regularization
- Chapter 16.01: Introduction to Regularization
- Chapter 16.02: Lasso and Ridge Regression
- Chapter 16.03: Lasso vs Ridge Regression
- Chapter 16.04: Elastic Net and Regularization for GLMs
- Chapter 16.05: Regularization for Underdetermined Problems
- Chapter 16.06: L0 Regularization
- Chapter 16.07: Regularization in NonLinear Models and Bayesian Priors
- Chapter 16.08: Geometric Analysis of L2 Regularization and Weight Decay
- Chapter 16.09: Geometric Analysis of L1 Regularization
- Chapter 16.10: Early Stopping
- Chapter 17: Linear Support Vector Machines
- Chapter 18: Nonlinear Support Vector Machines
- Chapter 19: Gaussian Processes
- Chapter 20: Boosting
- Chapter 20.01: Introduction to Boosting / AdaBoost
- Chapter 20.02: Boosting Concept
- Chapter 20.03: Boosting Illustration
- Chapter 20.04: Boosting Regularization
- Chapter 20.05: Boosting for Classification
- Chapter 20.06: Gradient Boosting with Trees I
- Chapter 20.07: Gradient Boosting with Trees II
- Chapter 20.08: XGBoost
- Coding ML [Python and sklearn]
- Coding ML [R and mlr3]