Basic Concept

  • Training Algorithm :
    - Store all the data
  • Prediction Algorithm :
    - Calculate the distance from x to all the points
    - Sort the points in the data by increasing distance from x
    - predict the majorith label of the 'k' closest points

  • increasing k will smooth the boundaries at the cost of mislabeling some data

Pros and Cons

  • Pros
    - Very simple
    - few parameters (K / Distance Metric)
    - easy to add new data
    - works with any number of classes
    - training is trivial
  • Cons
    - High Prediction cost (worse for large data sets)
    - Not good with high dimensional data (throw off the ability to measure distances in diverse dimensions)
    - Categorical Features don't work well

0개의 댓글