Introduction to Machine Learning (I2ML) | Chapter 07: Random Forests

This chapter introduces bagging as a method to increase the performance of trees (or other base learners). A modification of bagging leads to random forests. We explain the main idea of random forests, benchmark their performance with the methods seen so far and show how to quantify the impact of a single feature on the performance of the random forest as well as how to compute proximities between observations.

Chapter 07.00: Random Forests: In a Nutshell
In this nutshell chunk, we delve into Random Forests, an ensemble method that harnesses multiple decision trees for improved prediction accuracy and robustness.
Chapter 07.01: Bagging Ensembles
Bagging (bootstrap aggregation) is a method for combining many models into a meta-model which often works much better than its individual components. In this section, we present the basic idea of bagging and explain why and when bagging works.
Chapter 07.02: Basics
In this section we investigate random forests, a modification of bagging for trees.
Chapter 07.03: Out-of-Bag Error Estimate
We introduce the concepts of in-bag and out-of-bag observations and explain how to compute the out-of-bag error estimate.
Chapter 07.04: Feature Importance
In a complex machine learning model, the contributions of the different features to the model performance are difficult to evaluate. The concept of feature importance allows to quantify these effects for random forests.
Chapter 07.05: Proximities
The term proximity refers to the “closeness” between pairs of cases. Proximities are calculated for each pair of observations and can be derived directly from random forests.