Chapter 26: OpenML and Benchmarking
This chapter covers large-scale benchmarking methodology, OpenML for standardized experiments, and statistical hypothesis testing for model comparison.
-
Chapter 26.01: Large-Scale Benchmarking and OpenML
We explain why benchmarking is needed beyond theoretical analysis, present the steps of proper benchmarking, and introduce OpenML.
-
Chapter 26.02: Model Selection and Hypothesis Testing
We discuss what questions benchmarks can answer, the value of hypothesis tests, and which tests to use in different benchmarking scenarios.
-
Chapter 26.03: Practical Performance Evaluation
We cover how to visualize model performance beyond standard metrics using lift charts and calibration curves.