Introduction to Machine Learning (I2ML)
This website offers an open and free introductory course on (supervised) machine learning. The course is constructed as self-contained as possible, and enables self-study through lecture videos, PDF slides, cheatsheets, quizzes, exercises (with solutions), and notebooks.
The quite extensive material can roughly be divided into an introductory undergraduate part (chapters 1-10), a more advanced second one on MSc level (chapters 11-19), and a third course, on MSc level (chapters 20-23). At the LMU Munich we teach all parts in an inverted-classroom style (B.Sc. lecture “Introduction to ML” and M.Sc. lectures “Supervised Learning” and “Advanced Machine Learning”). While the first part aims at a practical and operational understanding of concepts, the second and third parts focus on theoretical foundations and more complex algorithms.
Remarks on Deep Dive sections: Certain sections exclusively present mathematical proofs, acting as deep-dives into the respective topics. It’s important to note that these deep-dive sections do not have accompanying videos.
Why another ML course: A key goal of the course is to teach the fundamental building blocks behind ML, instead of introducing “yet another algorithm with yet another name”. We discuss, compare, and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint, and information theory and demonstrate that all of these are equally valid entry points to ML. Developing the ability to take on and switch between these perspectives is a major goal of this course, and in our opinion not always ideally presented in other courses.
We also want this course not only to be open, but open source.
What is not covered: (1) An in-depth coverage of deep learning, we offer this in our course Introduction to Deep Learning. (2) An in-depth coverage of optimization - we are working on a separate course for optimization.
- All Slides Chapters 1-10 and 11-19
- Chapter 1: ML Basics
- Chapter 2: Supervised Regression
- Chapter 03: Supervised Classification
- Chapter 04: Performance Evaluation
- Chapter 04.00: Evaluation: In a Nutshell
- Chapter 04.01: Generalization Error
- Chapter 04.02: Measures Regression
- Chapter 04.03: Training Error
- Chapter 04.04: Test Error
- Chapter 04.05: Overfitting & Underfitting
- Chapter 04.06: Resampling 1
- Chapter 04.07: Resampling 2
- Chapter 04.08: Measures Classification
- Chapter 04.09: ROC Basics
- Chapter 04.10: ROC Curves
- Chapter 04.11: Partial AUC & Multi-Class AUC
- Chapter 04.12: Precision-Recall Curves
- Chapter 04.13: AUC & Mann-Whitney-U Test
- Chapter 05: k-Nearest Neighbors (k-NN)
- Chapter 06: Classification and Regression Trees (CART)
- Chapter 06.00: CART: In a Nutshell
- Chapter 06.01: Predictions with CART
- Chapter 06.02: Growing a Tree
- Chapter 06.03: Splitting Criteria for Regression
- Chapter 06.04: Splitting Criteria for Classification
- Chapter 06.05: Computational Aspects of Finding Splits
- Chapter 06.06: Stopping Criteria & Pruning
- Chapter 06.07: Discussion
- Chapter 07: Random Forests
- Chapter 08: Neural Networks
- Chapter 08.00: Neural Networks: In a Nutshell
- Chapter 08.01: Introduction
- Chapter 08.02: Single Neuron
- Chapter 08.03: Single Hidden Layer NN
- Chapter 08.04: Single Hidden Layer Networks for Multi-Class Classification
- Chapter 08.05: MLP: Multi-Layer Feedforward Neural Networks
- Extra: Brief History
- Extra: Basic Backpropagation 1
- Chapter 09: Tuning
- Chapter 10: Nested Resampling
- Chapter 11: Advanced Risk Minimization
- Chapter 11.01: Risk Minimizers
- Chapter 11.02: Pseudo-Residuals
- Chapter 11.03: L2 Loss
- Chapter 11.04: L1 Loss
- Chapter 11.05: Advanced Regression Losses
- Chapter 11.06: 0-1 Loss
- Chapter 11.07: Bernoulli Loss
- Chapter 11.08: Logistic Regression: Deep Dive
- Chapter 11.09: Brier Score
- Chapter 11.10: Advanced Classification Losses
- Chapter 11.11: Optimal constant model for the empirical log loss risk
- Chapter 11.12: Maximum Likelihood Estimation vs Empirical Risk Minimization I
- Chapter 11.13: Maximum Likelihood Estimation vs Empirical Risk Minimization II
- Chapter 11.14: Properties of Loss Functions
- Chapter 11.15: Bias Variance Decomposition
- Chapter 11.16: Bias Variance Decomposition: Deep Dive
- Chapter 12: Multiclass Classification
- Chapter 13: Information Theory
- Chapter 13.01: Entropy I
- Chapter 13.02: Entropy II
- Chapter 13.03: Differential Entropy
- Chapter 13.04: Kullback-Leibler Divergence
- Chapter 13.05: Cross-Entropy and KL
- Chapter 13.06: Information Theory for Machine Learning
- Chapter 13.07: Joint Entropy and Mutual Information I
- Chapter 13.08: Joint Entropy and Mutual Information II
- Chapter 13.09: Entropy and Optimal Code Length I
- Chapter 13.10: Entropy and Optimal Code Length II
- Chapter 13.11: MI under Reparametrization: Deep Dive
- Chapter 14: Curse of Dimensionality
- Chapter 15: Regularization
- Chapter 15.01: Introduction to Regularization
- Chapter 15.02: Ridge Regression
- Chapter 15.03: Lasso Regression
- Chapter 15.04: Lasso vs Ridge Regression
- Chapter 15.05: Elastic Net and Regularization for GLMs
- Chapter 15.06: Other Types of Regularization
- Chapter 15.07: Non-Linear Models and Structural Risk Minimization
- Chapter 15.08: Bayesian Priors
- Chapter 15.09: Geometry of L2 Regularization
- Chapter 15.10: Geometry of L1 Regularization
- Chapter 15.11: Early Stopping
- Chapter 15.12: Details on Ridge Regression: Deep Dive
- Chapter 15.13: Soft-thresholding and L1 regularization: Deep Dive
- Chapter 16: Linear Support Vector Machines
- Chapter 17: Nonlinear Support Vector Machines
- Chapter 18: Boosting
- Chapter 18.01: Introduction to Boosting / AdaBoost
- Chapter 18.02: Boosting Concept
- Chapter 18.03: Boosting Illustration
- Chapter 18.04: Boosting Regularization
- Chapter 18.05: Boosting for Classification
- Chapter 18.06: Gradient Boosting with Trees I
- Chapter 18.07: Gradient Boosting with Trees II
- Chapter 18.08: XGBoost
- Chapter 18.09: Component Wise Boosting Basics 1
- Chapter 18.10: Component Wise Boosting Basics 2
- Chapter 18.11: CWB and GLMs
- Chapter 18.12: Advanced CWB
- Chapter 19: Gaussian Processes
- Chapter 20: Imbalanced Learning
- Chapter 20.01: Introduction
- Chapter 20.02: Performance Measures
- Chapter 20.03: Cost-Sensitive Learning 1
- Chapter 20.03: Cost-Sensitive Learning 1
- Chapter 20.04: Cost-Sensitive Learning 2
- Chapter 20.04: Cost-Sensitive Learning 2
- Chapter 20.05: Cost-Sensitive Learning 3
- Chapter 20.06: Cost Curves 1
- Chapter 20.07: Cost Curves 2
- Chapter 20.08: Sampling Methods 1
- Chapter 20.09: Sampling Methods 2
- Chapter 21: Multitarget Learning
- Chapter 22: Online Learning
- Chapter 22.01: Introduction
- Chapter 22.02: Simple Online Learning Algorithm
- Chapter 22.03: Follow the Leader on OLO problems
- Chapter 22.04: Follow the regularized Leader
- Chapter 22.05: Follow the Leader on OQO problems
- Chapter 22.06: Online Convex optimization 1
- Chapter 22.07: Online Convex optimization 2
- Extra Chapter: Feature Selection
- Coding ML [Python and sklearn]
- Coding ML [R and mlr3]