Introduction to Machine Learning (I2ML)
This website offers an open and free introductory course on (supervised) machine learning. The course is constructed as self-contained as possible, and enables self-study through lecture videos, PDF slides, cheatsheets, quizzes, exercises (with solutions), and notebooks.
The quite extensive material can roughly be divided into an introductory undergraduate part (chapters 1-10) and a more advanced second one on MSc level (chapters 11-20). At the LMU Munich we teach both parts in an inverted-classroom style (B.Sc. lecture “Introduction to ML” and M.Sc. lecture “Supervised Learning”). While the first part aims at a practical and operational understanding of concepts, the second part discusses focuses on theoretical foundations and more complex algorithms.
Why another ML course: A key goal of the course is to teach the fundamental building blocks behind ML, instead of introducing “yet another algorithm with yet another name”. We discuss, compare and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint and information theory and demonstrate that all of these are equally valid entry points to ML. Developing the ability to take on and switch between these perspectives is a major goal of this course, and in our opinion not always ideally presented in other courses.
We also want this course not only to be open, but open source.
What is not covered: (1) An in-depth coverage of deep learning, we offer this in our course Introduction to Deep Learning. (2) An in-depth coverage of optimization - we are working on a separate course for optimization.
- All Slides Chapters 1-10
- Chapter 1: ML Basics
- Chapter 2: Supervised Regression
- Chapter 03: Supervised Classification
- Chapter 04: Performance Evaluation
- Chapter 04.01: Generalization Error
- Chapter 04.02: Measures Regression
- Chapter 04.03: Training Error
- Chapter 04.04: Test Error
- Chapter 04.05: Overfitting & Underfitting
- Chapter 04.06: Resampling 1
- Chapter 04.07: Resampling 2
- Chapter 04.08: Measures Classification
- Chapter 04.09: ROC Basics
- Chapter 04.10: ROC Curves
- Chapter 04.11: Partial AUC & Multi-Class AUC
- Chapter 04.12: Precision-Recall Curves
- Chapter 04.13: AUC & Mann-Whitney-U Test
- Chapter 05: k-Nearest Neighbors (k-NN)
- Chapter 06: Classification and Regression Trees (CART)
- Chapter 07: Random Forests
- Chapter 08: Neural Networks
- Chapter 09: Tuning
- Chapter 10: Nested Resampling
- Chapter 11: Advanced Risk Minimization
- Chapter 11.01: Risk Minimizers
- Chapter 11.02: Pseudo-Residuals
- Chapter 11.03: L2 Loss
- Chapter 11.04: L1 Loss
- Chapter 11.05: Advanced Regression Losses
- Chapter 11.06: 0-1 Loss
- Chapter 11.07: Bernoulli Loss
- Chapter 11.08: Brier Score
- Chapter 11.09: Advanced Classification Losses
- Chapter 11.10: Optimal constant model for the empirical log loss risk
- Chapter 11.12: Maximum Likelihood Estimation vs Empirical Risk Minimization I
- Chapter 11.12: Maximum Likelihood Estimation vs Empirical Risk Minimization II
- Chapter 11.13: Properties of Loss Functions
- Chapter 11.14: Bias Variance Decomposition
- Chapter 12: Multiclass Classification
- Chapter 13: Information Theory
- Chapter 13.01: Entropy
- Chapter 13.02: Differential Entropy
- Chapter 13.03: Kullback-Leibler Divergence
- Chapter 13.04: Entropy and Optimal Code Length
- Chapter 13.05: Cross-Entropy, KL and Source Coding
- Chapter 13.06: Information Theory for Machine Learning
- Chapter 13.07: Joint Entropy and Mutual Information
- Chapter 14: Curse of Dimensionality
- Chapter 15: Regularization
- Chapter 15.01: Introduction to Regularization
- Chapter 15.02: Lasso and Ridge Regression
- Chapter 15.03: Lasso vs Ridge Regression
- Chapter 15.04: Elastic Net and Regularization for GLMs
- Chapter 15.05: Regularization for Underdetermined Problems
- Chapter 15.06: L0 Regularization
- Chapter 15.07: Regularization in NonLinear Models and Bayesian Priors
- Chapter 15.08: Geometric Analysis of L2 Regularization and Weight Decay
- Chapter 15.09: Geometric Analysis of L1 Regularization
- Chapter 15.10: Early Stopping
- Chapter 16: Linear Support Vector Machines
- Chapter 17: Nonlinear Support Vector Machines
- Chapter 18: Boosting
- Chapter 18.01: Introduction to Boosting / AdaBoost
- Chapter 18.02: Boosting Concept
- Chapter 18.03: Boosting Illustration
- Chapter 18.04: Boosting Regularization
- Chapter 18.05: Boosting for Classification
- Chapter 18.06: Gradient Boosting with Trees I
- Chapter 18.07: Gradient Boosting with Trees II
- Chapter 18.08: XGBoost
- Chapter 18.09: Component Wise Boosting Basics 1
- Chapter 18.10: Component Wise Boosting Basics 2
- Chapter 18.11: CWB and GLMs
- Chapter 18.12: Advanced CWB
- Chapter 19: Feature Selection
- Chapter 20: Gaussian Processes
- Chapter 21: Imbalanced Learning
- Chapter 21.01: Introduction
- Chapter 21.02: Performance Measures
- Chapter 21.03: Cost-Sensitive Learning 1
- Chapter 21.04: Cost-Sensitive Learning 2
- Chapter 21.05: Cost-Sensitive Learning 3
- Chapter 21.06: Cost Curves 1
- Chapter 21.07: Cost Curves 2
- Chapter 21.08: Sampling Methods 1
- Chapter 21.09: Sampling Methods 2
- Chapter 22: Multitarget Learning
- Chapter 23: Online Learning
- Chapter 23.01: Introduction
- Chapter 23.02: Simple Online Learning Algorithm
- Chapter 23.03: Follow the Leader on OLO problems
- Chapter 23.04: Follow the regularized Leader
- Chapter 23.05: Follow the Leader on OQO problems
- Chapter 23.06: Online Convex optimization 1
- Chapter 23.07: Online Convex optimization 2
- Coding ML [Python and sklearn]
- Coding ML [R and mlr3]