Chapter 06: Classification and Regression Trees (CART)
This chapter introduces Classification and Regression Trees (CART), a well-established machine learning procedure. We explain the main idea and give details on splitting criteria, discuss computational aspects of growing a tree, and illustrate the idea of stopping criteria and pruning.
-
Chapter 06.00: CART: In a Nutshell
In this nutshell chunk, we unravel the workings of CARTs (Classification and Regression Trees).
-
Chapter 06.01: Predictions with CART
Decision trees are an important type of machine learning model and come in two main types: classification and regression trees. In this section, we explain the general idea of CART and show how they recursively divide up the input space into ever smaller rectangular partitions. Thus, we think of CART for now only as a predictor.
-
Chapter 06.02: Growing a Tree
In this section, we explain how to grow a tree starting with an empty tree, i.e., a root node containing all the data. It will be shown that trees are grown by recursively applying greedy optimization to each node.
-
Chapter 06.03: Splitting Criteria for Regression
CART algorithms require splitting criteria for trees, which are usually defined in terms of impurity reduction. In this section we formalize the idea of splitting criteria and explain the details of splitting. We start with regression and doing so we show how split criteria fit into our framework of empirical risk minimization.
-
Chapter 06.04: Splitting Criteria for Classification
We extend splitting criteria to classification task. Here, we see that there are analogies of ERM and impurity reduction. While these analogies are interested, proving the equivalence of ERM and impurity reduction are beyond the scope of this lecture. The interested reader can refer to chapter 11 of this lecture, where we proof the equivalence.
-
Chapter 06.05: Computational Aspects of Finding Splits
In this section, we explain the computational aspects of the node-splitting procedure, especially for nominal features. In addition, we illustrate how to deal with missing values.
-
Chapter 06.06: Stopping Criteria & Pruning
The recursive partitioning procedure used to grow a CART usually leads to problems such as exponential growth of computations, overfitting, and the horizon effect. To deal with these problems, we can use stopping criteria and pruning. In this section, we explain the basis of these two solutions.
-
Chapter 06.07: Discussion
In this section we discuss the advantages and disadvantages of CART and mention other tree methodologies.