Chapter 11: Advanced Risk Minimization

This chapter revisits the theory of risk minimization, providing more in-depth analysis on established losses and the connection between empirical risk minimization and maximum likelihood estimation. We also introduce some more advanced loss functions for regression and classification.

Chapter 11.01: Risk Minimization Basics
We introduce important concepts in theoretical risk minimization: risk minimizer, Bayes risk, Bayes regret, consistent learners and the optimal constant model.
Chapter 11.02: Properties of Loss Functions
We introduce key properties of loss functions and explore how these influence model assumptions, sensitivity to outliers, and the tractability of training.
Chapter 11.03: Pseudo-Residuals
We introduce the concept of pseudo-residuals, i.e., loss residuals in function space, and discuss their relation to gradient descent.
Chapter 11.04: Regression Losses: L2 and L1 loss
In this section, we revisit L2 and L1 loss, highlighting that their risk minimizers are the conditional mean and median, respectively, and that their optimal constant models correspond to the empirical mean and median of observed targets.
Chapter 11.05: L1 loss: Deep Dive
In this deep dive, we revisit \(L1\) loss and derive its risk minimizer – the conditional median – and optimal constant model – the empirical median of observed target values. Please note that there are no videos accompanying this section.
Chapter 11.06: Advanced Regression Losses
In this section, we introduce and discuss the following advanced regression losses: Huber, log-cosh, Cauchy, epsilon-insensitive, and quantile loss.
Chapter 11.07: Classification and 0-1-Loss
In this section, we revisit the 0-1-loss and derive its risk minimizer.
Chapter 11.08: Bernoulli Loss
We study the Bernoulli loss and derive its risk minimizer and optimal constant model. We further discuss the connection between Bernoulli loss minimization and tree splitting according to the entropy criterion.
Chapter 11.09: Some details on classification losses: Deep Dive
In this section, we will discuss the equivalence of different classification losses and derive the risk minimizers in various settings. This section is presented as a deep-dive. Please note that there are no videos accompanying this section.
Chapter 11.10: Logistic Regression: Deep Dive
In this segment, we derive the gradient and Hessian of logistic regression and show that logistic regression is a convex problem. This section is presented as a deep-dive. Please note that there are no videos accompanying this section.
Chapter 11.11: Brier Score - L2/L1 Loss on Probabilities
In this section, we introduce the Brier score and derive its risk minimizer and optimal constant model.
Chapter 11.12: Advanced Classification Losses
In this section, we introduce and discuss the following advanced classification losses: (squared) hinge loss, \(L2\) loss on scores, exponential loss, and AUC loss.
Chapter 11.13: Proper Scoring Rules
We dive into how proper scoring rules - like log loss and the Brier score (unlike L1) - guarantee strictly proper, probability-calibrated predictions by satisfying a first‑order optimality condition.
Chapter 11.14: Loss functions and Tree Splitting: Deep Dive
Learn how minimizing Bernoulli (log) loss yields entropy‐based splits and minimizing the Brier score yields Gini‐based splits, unifying impurity and risk views for optimal tree splitting. This section is presented as a deep-dive. Please note that there are no videos accompanying this section.
Chapter 11.15: Maximum Likelihood Estimation vs. Empirical Risk Minimization
We discuss the connection between maximum likelihood estimation and risk minimization, then demonstrate the correspondence between a Gaussian error distribution and \(L2\) loss and that alternative likelihoods give rise to the \(L1\) and Bernoulli loss.
Chapter 11.16: Bias Variance Decomposition I
We discuss how to decompose the generalization error of a learner.
Chapter 11.17: Bias Variance Decomposition II
We discuss how to decompose the excess risk into the estimation, approximation and optimization error.
Chapter 11.18: Bias Variance Decomposition: Deep Dive
In this segment, we discuss details of the decomposition of the generalization error of a learner. This section is presented as a deep-dive. Please note that there are no videos accompanying this section.