**Reference from course note of Machine Learning Foundation from University of Washington**

### Principle of Occam’s razor: Simpler trees are better

Among competing hypotheses, the one with fewest assumptions should be selected

### Early stopping for learning decision trees

**Limit tree depth**: Stop splitting after a certain depth**Classification error**: Do not consider any split that does not cause a sufficient decrease in classification error- Typically, add magic parameter $\epsilon$,Stop if error doesn’t decrease by more than $\epsilon$
- Some pitfalls to this rule
- Very useful in practice

**Minimum node “size”**: Do not split an intermediate node which contains too few data points

### Pruning: Simplify the tree after the learning algorithm terminates

- Simple measure of complexity of tree
**L(T) = number of leaf nodes** - Balance simplicity & predictive power
**Desired total quality format****Total cost = measure of fit + measure of complexity**

- $\lambda = 0$: standard Decision Tree learning
- $\lambda = +\inf$: root, majority classification
- $\lambda = 0 - +\inf$: balance of fit and complexity