trained using labeled examples, such as an input where the desired output is known
For example, a segment of text could have a category label, such as
- Spam vs Legitimate Email
- Positive vs Negative review
Network receives a set of inputs along with the corresponding correct outputs, and the algorithms learns by comparing its outputs
likely to predict future events
Data is often split into 3 sets
- Training Data : train the model parameters
- Validatation Data : determined what model hyperparameters to adjust
- Test Data : get some true final performance metric
You cannot tweak the parameters based on the Test data
Once we have the model's predictions from the X_test data, we compare it to the true y data.
We could organize our predicted values compared to the real values in a confusion matrix
Accuracy
Accuracy in classification problems is the number of correct predictions made by the model divided by the total number of predictions
useful when target classes are well balanced (e.g. equal ratio of dog and cat images)
- not a good choice with unbalanced classes (e.g. 99% of images are dogs)
Recall
ability of a model to find all the relevant cases within a dataset