Chapter 20: Imbalanced Learning
This chapter introduces techniques for learning on imbalanced datasets.
-
Chapter 20.01: Introduction
We define the phenomenon of imbalanced data sets and explain its consequences on accuarcy. Furthermore, we introduce some techniques for handling imbalanced data sets.
-
Chapter 20.02: Performance Measures
We introduce performance measures other than accuracy and explain their advantages over accuracy for imbalanced date. In addition we introduce extensions of these measures for multiclass settings.
-
Chapter 20.03: Cost-Sensitive Learning 1
We introduce the concept of a Cost Matrix, the Minimum expected cost priciple and the optimal theoretical threshold.
-
Chapter 20.04: Cost-Sensitive Learning 2
In this section we focus on empirical thresholding and model-agnostic Meta Costs.
-
Chapter 20.05: Cost-Sensitive Learning 3
We explain the concepts of instance specific costs and cost-sensitive OVO.
-
Chapter 20.06: Cost Curves 1
We introduce cost curves for misclassif error and explain the duality between ROC points and cost lines.
-
Chapter 20.07: Cost Curves 2
We explain cost curves with cost matrices and comparing classifiers. In addition we do a wrap-up comparision to ROC.
-
Chapter 20.08: Sampling Methods 1
We introduce the idea of sampling methods for dealing with imbalanced data. In addition, we explain certain undersampling techniques.
-
Chapter 20.09: Sampling Methods 2
We introduce the state-of-art oversampling technique SMOTE.
-
Chapter 20.03: Cost-Sensitive Learning 1
We introduce the concept of a Cost Matrix, the Minimum expected cost priciple and the optimal theoretical threshold.
-
Chapter 20.04: Cost-Sensitive Learning 2
In this section we focus on empirical thresholding and model-agnostic Meta Costs.