Chapters

All Slides Chapters 1-10 and 11-19
Chapter 1: ML Basics
This chapter introduces the basic concepts of Machine Learning. We focus on supervised learning, explain the difference between regression and classification, show how to evaluate and compare Machine Learning models and formalize the concept of learning.
Chapter 2: Supervised Regression
This chapter treats the supervised regression task in more detail. We will see different loss functions for regression, how a linear regression model can be used from a Machine Learning perspective, and how to extend it with polynomials for greater flexibility.
Chapter 03: Supervised Classification
This chapter treats the supervised classification task in more detail. We will see examples of binary and multiclass classification and the differences between discriminative and generative approaches. In particular, we will address logistic regression, discriminant analysis and naive Bayes classifiers.
Chapter 04: Performance Evaluation
This chapter treats the challenge of evaluating the performance of a model. We will introduce different performance measures for regression and classification tasks, explain the problem of overfitting as well as the difference between training and test error, and, lastly, present a variety of resampling techniques.
Chapter 05: k-Nearest Neighbors (k-NN)
This chapter addresses \(k\)-nearest neighbors, a distance-based algorithm suited to both regression and classification. Predictions are made based upon neighboring observations, assuming feature similarity translates to target similarity.
Chapter 06: Classification and Regression Trees (CART)
This chapter introduces Classification and Regression Trees (CART), a well-established machine learning procedure. We explain the main idea and give details on splitting criteria, discuss computational aspects of growing a tree, and illustrate the idea of stopping criteria and pruning.
Chapter 07: Random Forests
This chapter introduces bagging as a method to increase the performance of trees (or other base learners). A modification of bagging leads to random forests. We explain the main idea of random forests, benchmark their performance with the methods seen so far and show how to quantify the impact of a single feature on the performance of the random forest as well as how to compute proximities between observations.
Chapter 08: Neural Networks
This chapter introduces the basic concepts of neural networks. We integrated chapters from our course on Deep Learning in order to be able to use (simple) neural networks for supervised ML on tabular data.
Chapter 09: Tuning
This chapter introduces and formalizes the problem of hyperparameter tuning. We cover basic techniques such as grid search and random search as well as more advanced techniques like evolutionary algorithms, model-based optimization and multi-fidelity optimization.
Chapter 10: Nested Resampling
This chapter first defines the untouched-test-set principle and proceeds to explain the concepts of train-validation-test split and nested resampling.
Chapter 11: Advanced Risk Minimization
This chapter revisits the theory of risk minimization, providing more in-depth analysis on established losses and the connection between empirical risk minimization and maximum likelihood estimation. We also introduce some more advanced loss functions for regression and classification.
Chapter 12: Multiclass Classification
This chapter treats the multiclass case of classification. Tasks with more than two classes preclude the application of some techniques studied in the binary scenario and require an adaptation of loss functions.
Chapter 13: Information Theory
This chapter covers basic information-theoretic concepts and discusses their relation to machine learning.
Chapter 14: Curse of Dimensionality
Frequently, our intuition developed in low-dimensional spaces does not generalize to higher dimensions. This chapter introduces the phenomenon of the curse of dimensionality and discusses its effects on the behavior of machine learning models.
Chapter 15: Regularization
Regularization is a vital tool in machine learning to prevent overfitting and foster generalization ability. This chapter introduces the concept of regularization and discusses common regularization techniques in more depth.
Chapter 16: Linear Support Vector Machines
This chapter introduces the linear support vector machine (SVM), a linear classifier that finds decision boundaries by maximizing margins to the closest data points, possibly allowing for violations to a certain extent.
Chapter 17: Nonlinear Support Vector Machines
Many classification problems warrant nonlinear decision boundaries. This chapter introduces nonlinear support vector machines as a crucial extension to the linear variant.
Chapter 18: Boosting
This chapter introduces boosting as a sequential ensemble method that creates powerful committees from different kinds of base learners.
Chapter 19: Gaussian Processes
This chapter introduces Gaussian processes as a model class. Gaussian processes are non-parametric approaches with ubiquitous application that model entire distributions in function space.
Chapter 20: Imbalanced Learning
This chapter introduces techniques for learning on imbalanced datasets.
Chapter 21: Multitarget Learning
This chapter introduces multitarget learning techniques.
Chapter 22: Online Learning
This chapter introduces online learning.
Extra Chapter: Feature Selection
This chapter introduces feature selection, i.e., dinding a well-performing, hopefully small set of features for a task.
Coding ML [Python and sklearn]
This section introduces basic concepts and implementations using Python and in particular sklearn. Installation: Notebook Python basics: Notebook The NumPy package: Notebook The pandas package: Notebook The scikit-learn package: Notebook
Coding ML [R and mlr3]
For an introduction to the R package mlr3 we recommend walking through some chapters of the mlr3 book as summarized in this document. After some basic concepts, this focuses on resampling, tuning and pipelines.