Prediction Algorithm :
- Calculate the distance from x to all the points
- Sort the points in the data by increasing distance from x
- predict the majorith label of the 'k' closest points
increasing k will smooth the boundaries at the cost of mislabeling some data
Pros and Cons
Pros
- Very simple
- few parameters (K / Distance Metric)
- easy to add new data
- works with any number of classes
- training is trivial
Cons
- High Prediction cost (worse for large data sets)
- Not good with high dimensional data (throw off the ability to measure distances in diverse dimensions)
- Categorical Features don't work well