Chapter 11: Advanced Risk Minimization
This chapter revisits the theory of risk minimization, providing more in-depth analysis on established losses and the connection between empirical risk minimization and maximum likelihood estimation. We also introduce some more advanced loss functions for regression and classification.
-
Chapter 11.01: Risk Minimizers
We introduce important concepts in theoretical risk minimization: risk minimizer, Bayes risk, Bayes regret, consistent learners and the optimal constant model.
-
Chapter 11.02: Pseudo-Residuals
We introduce the concept of pseudo-residuals, i.e., loss residuals in function space, and discuss their relation to gradient descent.
-
Chapter 11.03: L2 Loss
In this section, we revisit \(L2\) loss and derive its risk minimizer – the conditional mean – and optimal constant model – the empirical mean of observed target values.
-
Chapter 11.04: L1 Loss
In this section, we revisit \(L1\) loss and derive its risk minimizer – the conditional median – and optimal constant model – the empirical median of observed target values.
-
Chapter 11.05: Advanced Regression Losses
In this section, we introduce and discuss the following advanced regression losses: Huber, log-cosh, Cauchy, log-barrier, epsilon-insensitive, and quantile loss.
-
Chapter 11.06: 0-1 Loss
In this section, we revisit the 0-1 loss and derive its risk minimizer .
-
Chapter 11.07: Bernoulli Loss
We study the Bernoulli loss and derive its risk minimizer and optimal constant model. We further discuss the connection between Bernoulli loss minimization and tree splitting according to the entropy criterion.
-
Chapter 11.08: Logistic Regression: Deep Dive
In this segment, we derive the gradient and Hessian of logistice regression and show that logistic regression is a convex problem. This section is presented as a deep-dive. Please note that there are no videos accompanying this section.
-
Chapter 11.09: Brier Score
In this section, we introduce the Brier score and derive its risk minimizer and optimal constant model. We further discuss the connection between Brier score minimization and tree splitting according to the Gini index.
-
Chapter 11.10: Advanced Classification Losses
In this section, we introduce and discuss the following advanced classification losses: (squared) hinge loss, \(L2\) loss on scores, exponential loss, and AUC loss.
-
Chapter 11.11: Optimal constant model for the empirical log loss risk
In this segment, we explore the derivation of the optimal constant model concerning the empirical log loss risk. This section is presented as a deep-dive. Please note that there are no videos accompanying this section.
-
Chapter 11.12: Maximum Likelihood Estimation vs Empirical Risk Minimization I
We discuss the connection between maximum likelihood estimation and risk minimization, then demonstrate the correspondence between a Gaussian error distribution and \(L2\) loss.
-
Chapter 11.13: Maximum Likelihood Estimation vs Empirical Risk Minimization II
We discuss the connection between maximum likelihood estimation and risk minimization for further losses (\(L1\) loss, Bernoulli loss).
-
Chapter 11.14: Properties of Loss Functions
We discuss the concept of robustness, analytical and functional properties of loss functions and how they may influence the convergence of optimizers.
-
Chapter 11.15: Bias Variance Decomposition
We discuss how to decompose the generalization error of a learner.
-
Chapter 11.16: Bias Variance Decomposition: Deep Dive
In this segment, we discuss details of the decomposition of the generalization error of a learner. This section is presented as a deep-dive. Please note that there are no videos accompanying this section.