Chapter 24: Imbalanced Learning
This chapter introduces techniques for learning on imbalanced datasets.
-
Chapter 24.01: Introduction
We define the phenomenon of imbalanced data sets and explain its consequences on accuarcy. Furthermore, we introduce some techniques for handling imbalanced data sets.
-
Chapter 24.02: Performance Measures
We introduce performance measures other than accuracy and explain their advantages over accuracy for imbalanced date. In addition we introduce extensions of these measures for multiclass settings.
-
Chapter 24.03: Cost-Sensitive Learning 1
We introduce the concept of a Cost Matrix, the Minimum expected cost priciple and the optimal theoretical threshold.
-
Chapter 24.04: Cost-Sensitive Learning 2
In this section we focus on empirical thresholding and model-agnostic Meta Costs.
-
Chapter 24.05: Cost-Sensitive Learning 3
We explain the concepts of instance specific costs and cost-sensitive OVO.
-
Chapter 24.06: Cost Curves 1
We introduce cost curves for misclassif error and explain the duality between ROC points and cost lines.
-
Chapter 24.07: Cost Curves 2
We explain cost curves with cost matrices and comparing classifiers. In addition we do a wrap-up comparision to ROC.
-
Chapter 24.08: Sampling Methods 1
We introduce the idea of sampling methods for dealing with imbalanced data. In addition, we explain certain undersampling techniques.
-
Chapter 24.09: Sampling Methods 2
We introduce the state-of-art oversampling technique SMOTE.