Chapter 15: Regularization

Regularization is a vital tool in machine learning to prevent overfitting and foster generalization ability. This chapter introduces the concept of regularization and discusses common regularization techniques in more depth.

Chapter 15.01: Introduction to Regularization
In this section, we revisit overfitting and introduce regularization as a remedy.
Chapter 15.02: Ridge Regression
We introduce Ridge regression as a key approach to regularizing linear models.
Chapter 15.03: Lasso Regression
We introduce Lasso regression as a key approach to regularizing linear models.
Chapter 15.04: Lasso vs Ridge Regression
This section provides a detailed comparison between Lasso and Ridge regression.
Chapter 15.05: Elastic Net and Regularization for GLMs
In this section, we introduce the elastic net as a combination of Ridge and Lasso regression and discuss regularization for logistic regression.
Chapter 15.06: Other Types of Regularization
In this section, we introduce other regularization approaches besides the important special cases \(L1\) and \(L2\).
Chapter 15.07: Non-Linear Models and Structural Risk Minimization
In this section, we demonstrate regularization in non-linear models like neural networks.
Chapter 15.08: Bayesian Priors
In this section, we motivate regularization from a Bayesian perspective, showing how different penalty terms correspond to different Bayesian priors.
Chapter 15.09: Weight decay and L2
In this section, we show that L2 regularization with gradient descent is equivalent to weight decay and see how weight decay changes the optimization trajectory.
Chapter 15.10: Geometry of L2 Regularization
In this section, we provide a geometric understanding of \(L2\) regularization, showing how parameters are shrunk according to the eigenvalues of the Hessian of empirical risk, and discuss its correspondence to weight decay.
Chapter 15.11: Geometry of L1 Regularization
In this section, we provide a geometric understanding of \(L1\) regularization and show that it encourages sparsity in the parameter vector.
Chapter 15.12: Early Stopping
In this section, we introduce early stopping and show how it can act as a regularizer.
Chapter 15.13: Details on Ridge Regression: Deep Dive
In this section, we consider Ridge regression as row-augmentation and as minimizing risk under feature noise. We also discuss the bias-variance tradeoff.
Chapter 15.14: Soft-thresholding and L1 regularization: Deep Dive
In this section, we prove the previously stated proposition regarding soft-thresholding and L1 regularization.