Machine Learning Yarning(part two)

Bias and Variance

Techniques for reducing avoidable bias

If your learning algorithm suffers from high avoidable bias, you might try the following techniques: - Increase the model size (such as number of neurons/layers): This technique reduces bias, since it should allow you to fit the training set better. If you find that this increases variance, then use regularization, which will usually eliminate the increase in variance.

Modify input features based on insights from error analysis: Say your error analysis inspires you to create additional features that help the algorithm eliminate a particular category of errors. (We discuss this further in the next chapter.) These new features could help with both bias and variance. In theory, adding more features could increase the variance; but if you find this to be the case, then use regularization, which will usually eliminate the increase in variance.
Reduce or eliminate regularization (L2 regularization, L1 regularization, dropout): This will reduce avoidable bias, but increase variance.
Modify model architecture (such as neural network architecture) so that it is more suitable for your problem: This technique can affect both bias and variance. One method that is not helpful:
Add more training data: This technique helps with variance problems, but it usually has no significant effect on bias.
In addition to the techniques described earlier to address high bias, I sometimes also carry out an error analysis on the training data, following a protocol similar to error analysis on the Eyeball dev set. This can be useful if your algorithm has high bias—i.e., if it is not fitting the training set well.

Techniques for reducing variance

If your learning algorithm suffers from high variance, you might try the following techniques: - Add more training data: This is the simplest and most reliable way to address variance, so long as you have access to significantly more data and enough computational power to process the data.

Add regularization (L2 regularization, L1 regularization, dropout): This technique reduces variance but increases bias.
Add early stopping (i.e., stop gradient descent early, based on dev set error): This technique reduces variance but increases bias. Early stopping behaves a lot like regularization methods, and some authors call it a regularization technique.
Feature selection to decrease number/type of input features: This technique might help with variance problems, but it might also increase bias. Reducing the number of features slightly (say going from 1,000 features to 900) is unlikely to have a huge effect on bias. Reducing it significantly (say going from 1,000 features to 100—a 10x reduction) is more likely to have a significant effect, so long as you are not excluding too many useful features. In modern deep learning, when data is plentiful, there has been a shift away from feature selection, and we are now more likely to give all the features we have to the algorithm and let the algorithm sort out which ones to use based on the data. But when your training set is small, feature selection can be very useful.
Decrease the model size (such as number of neurons/layers): Use with caution. This technique could decrease variance, while possibly increasing bias. However, I don’t recommend this technique for addressing variance. Adding regularization usually gives better classification performance. The advantage of reducing the model size is reducing your computational cost and thus speeding up how quickly you can train models. If speeding up model training is useful, then by all means consider decreasing the model size. But if your goal is to reduce variance, and you are not concerned about the computational cost, consider adding regularization instead. Here are two additional tactics, repeated from the previous chapter on addressing bias:
Modify input features based on insights from error analysis: Say your error analysis inspires you to create additional features that help the algorithm to eliminate a particular category of errors. These new features could help with both bias and variance. In Page 53 Machine Learning Yearning-Draft Andrew Ng theory, adding more features could increase the variance; but if you find this to be the case, then use regularization, which will usually eliminate the increase in variance.
Modify model architecture (such as neural network architecture) so that it is more suitable for your problem: This technique can affect both bias and variance.