Introduction to Machine Learning (I2ML) | Chapter 13: Information Theory

This chapter covers basic information-theoretic concepts and discusses their relation to machine learning.

Chapter 13.01: Entropy I
We introduce entropy, which expresses the expected information for discrete random variables, as a central concept in information theory.
Chapter 13.02: Entropy II
We continue our discussion about entropy and introduce joint entropy, the uniqueness theorem and the maximum entropy principle.
Chapter 13.03: Differential Entropy
In this section, we extend the definition of entropy to the continuous case.
Chapter 13.04: Kullback-Leibler Divergence
The Kullback-Leibler divergence (KL) is an important quantity for measuring the difference between two probability distributions. We discuss different intuitions for KL and relate it to risk minimization and likelihood ratios.
Chapter 13.05: Cross-Entropy and KL
We introduce cross-entropy as a further information-theoretic concept and discuss the connection between entropy, cross-entropy, and Kullback-Leibler divergence.
Chapter 13.06: Information Theory for Machine Learning
In this section, we discuss how information-theoretic concepts are used in machine learning and demonstrate the equivalence of KL minimization and maximum likelihood maximization, as well as how (cross-)entropy can be used as a loss function.
Chapter 13.07: Joint Entropy and Mutual Information I
Information theory also provides means of quantifying relations between two random variables that extend the concept of (linear) correlation. We discuss joint entropy, conditional entropy, and mutual information in this context.
Chapter 13.08: Joint Entropy and Mutual Information II
Information theory also provides means of quantifying relations between two random variables that extend the concept of (linear) correlation. We discuss joint entropy, conditional entropy, and mutual information in this context.
Chapter 13.09: Entropy and Optimal Code Length I
In this section, we introduce source coding and discuss how entropy can be understood as optimal code length.
Chapter 13.10: Entropy and Optimal Code Length II
In this section, we continue our discussion on source coding and its relation to entropy.
Chapter 13.11: MI under Reparametrization: Deep Dive
In this deep dive, we discuss the invariance of MI under certain reparametrizations.