Chapter 15: Regularization
Regularization is a vital tool in machine learning to prevent overfitting and foster generalization ability. This chapter introduces the concept of regularization and discusses common regularization techniques in more depth.
-
Chapter 15.01: Introduction to Regularization
In this section, we revisit overfitting and introduce regularization as a remedy.
-
Chapter 15.02: Ridge Regression
We introduce Ridge regression as a key approach to regularizing linear models.
-
Chapter 15.03: Lasso Regression
We introduce Lasso regression as a key approach to regularizing linear models.
-
Chapter 15.04: Lasso vs Ridge Regression
This section provides a detailed comparison between Lasso and Ridge regression.
-
Chapter 15.05: Elastic Net and Regularization for GLMs
In this section, we introduce the elastic net as a combination of Ridge and Lasso regression and discuss regularization for logistic regression.
-
Chapter 15.06: Other Types of Regularization
In this section, we introduce other regularization approaches besides the important special cases \(L1\) and \(L2\).
-
Chapter 15.07: Non-Linear Models and Structural Risk Minimization
In this section, we demonstrate regularization in non-linear models like neural networks.
-
Chapter 15.08: Bayesian Priors
In this section, we motivate regularization from a Bayesian perspective, showing how different penalty terms correspond to different Bayesian priors.
-
Chapter 15.09: Weight decay and L2
In this section, we show that L2 regularization with gradient descent is equivalent to weight decay and see how weight decay changes the optimization trajectory.
-
Chapter 15.10: Geometry of L2 Regularization
In this section, we provide a geometric understanding of \(L2\) regularization, showing how parameters are shrunk according to the eigenvalues of the Hessian of empirical risk, and discuss its correspondence to weight decay.
-
Chapter 15.11: Geometry of L1 Regularization
In this section, we provide a geometric understanding of \(L1\) regularization and show that it encourages sparsity in the parameter vector.
-
Chapter 15.12: Early Stopping
In this section, we introduce early stopping and show how it can act as a regularizer.
-
Chapter 15.13: Details on Ridge Regression: Deep Dive
In this section, we consider Ridge regression as row-augmentation and as minimizing risk under feature noise. We also discuss the bias-variance tradeoff.
-
Chapter 15.14: Soft-thresholding and L1 regularization: Deep Dive
In this section, we prove the previously stated proposition regarding soft-thresholding and L1 regularization.