Chapter 07.05: Proximities

The term proximity refers to the “closeness” between pairs of cases. Proximities are calculated for each pair of observations and can be derived directly from random forests.

Lecture video

Lecture slides

Code demo

Random Forests

You can run the code snippets in the demos on your local machine. The corresponding Rmd version of this demo can be found here. If you want to render the Rmd files to PDF, you need the accompanying style files.

Quiz

--- shuffle_questions: false --- ## Which statements are true? - [x] To compute permutation variable importance for feature $j$, we permute the feature and see how the performance changes (in OOB observations). - [ ] The random forest is a bad out-of-the box model and requires tuning of hyperparameters. - [x] Random forests and trees can be used for high-dimensional data. - [ ] Proximities are used in replacing missing data, but not in locating outliers.