Introduction to Machine Learning (I2ML) | Chapter 20: Imbalanced Learning

Chapter 20.01: Introduction
We define the phenomenon of imbalanced data sets and explain its consequences on accuarcy. Furthermore, we introduce some techniques for handling imbalanced data sets.
Chapter 20.02: Performance Measures
We introduce performance measures other than accuracy and explain their advantages over accuracy for imbalanced date. In addition we introduce extensions of these measures for multiclass settings.
Chapter 20.03: Cost-Sensitive Learning 1
We introduce the concept of a Cost Matrix, the Minimum expected cost priciple and the optimal theoretical threshold.
Chapter 20.04: Cost-Sensitive Learning 2
In this section we focus on empirical thresholding and model-agnostic Meta Costs.
Chapter 20.05: Cost-Sensitive Learning 3
We explain the concepts of instance specific costs and cost-sensitive OVO.
Chapter 20.06: Cost Curves 1
We introduce cost curves for misclassif error and explain the duality between ROC points and cost lines.
Chapter 20.07: Cost Curves 2
We explain cost curves with cost matrices and comparing classifiers. In addition we do a wrap-up comparision to ROC.
Chapter 20.08: Sampling Methods 1
We introduce the idea of sampling methods for dealing with imbalanced data. In addition, we explain certain undersampling techniques.
Chapter 20.09: Sampling Methods 2
We introduce the state-of-art oversampling technique SMOTE.