Advanced Learning Algorithm 14: Bias and Variance

brandon·2024년 1월 13일

SupervisedML

목록 보기

23/27

High bias means the model is too simple to capture the complexity of the underlying data.
- Both J_train() and J_cv() are high.
High variance means the model is too complicated
- J_train() is low, but J_cv() is high.
Both Js are low for an ideal model.

A large lambda value of the regularization term can lead to high bias, whereas a small lambda value leads to high variance.
Choosing a good lambda value is important for a good model.

Just like how we did for cross-validation set, we try different lambda values and pick the lambda value that leads to the least J_cv() value.
Then we report generalization error using test set.

As seen from the graph above, J_train() is low and J_cv() is high (high variance, overfitting) when the lambda is low, because low lambda means no penalties to the parameters.
and both J_train() and J_cv() are high (high bias, underfitting) when lambda is high, because high lambda means high penalties to the parameters, making the model a straight line.

High training error does not always mean that the algorithm is doing badly.
When assessing a model for speech recognition, for example, it is good to have a human level performance measured as well.
By comparing human level performance and training error, we can see that the model is actually not that bad.
One reason why the model shows high training error could be because the inputs are inaudible in the first place, as proven by the low human level performance as well.

If the difference between baseline performance and the training error is higher than the difference between training error and CV error, the model has high bias (underfitting).
If its the otherwise, the model has high variance (overfitting).
If the training error is close to the baseline performance, it means the algorithm is doing decently.

For a model with high bias, collecting more data does not really help lowering the error down to baseline level performance.
This is because a model with high bias is too simple that it lacks the capacity to learn from the existing data.

For high variance models, J_cv() is much much higher than J_train() with small training set size.
- This is because of the noise in the data. With little data, 4th degree polynomial could be too complicated that it overfits the data.
When training set size is small, J_train() is lower than human level performance, because of the overfitting.

Bias vs Variance takes a very short time to learn but a life time to master - a standford phd student LOL

Large neural networks are low bias machines.
- If we make neural networks large enough, we can almost always fit our training set well, so long as our training set is not enormous.
If J_train() is quite big, we can make a bigger network.
- Bigger networks will solve high bias problems.
- Bigger networks are of course more computationally expensive.
If it is quite negligible, then if J_cv() is big, we can collect more data.