Reference from course note of Machine Learning Foundation from University of Washington
Principle of Occam’s razor: Simpler trees are better
Among competing hypotheses, the one with fewest assumptions should be selected
Early stopping for learning decision trees
- Limit tree depth: Stop splitting after a certain depth
- Classification error: Do not consider any split that does not cause a sufficient decrease in classification error
- Typically, add magic parameter $\epsilon$,Stop if error doesn’t decrease by more than $\epsilon$
- Some pitfalls to this rule
- Very useful in practice
- Minimum node “size”: Do not split an intermediate node which contains too few data points
Pruning: Simplify the tree after the learning algorithm terminates
- Simple measure of complexity of tree
L(T) = number of leaf nodes
- Balance simplicity & predictive power
Desired total quality formatTotal cost = measure of fit + measure of complexity
- $\lambda = 0$: standard Decision Tree learning
- $\lambda = +\inf$: root, majority classification
- $\lambda = 0 - +\inf$: balance of fit and complexity