Approximate nearest neighbor methods and vector models

jkl133·2021년 2월 26일

출처: spotify engineering lead slideshare

  1. Start with high dimensional data
  2. Run dimension reduction to 10-1000 dims
  3. Do stuff in a small dimensional space


  1. Buliding an Annoy index

    • start with the point set
    • split it in 2 halves
    • split again (until k items in each leaf, takes n/k memory instead n)
    • binary tree
  2. Search

    • the closest isn't necessarily in the same leaf of the binary tree
    • 2 points that are really close may end up on different sides of split

    → Go both sides of a split if it's close

  • Tricks
  1. query structure
  • use priority queue to search all trees until we've k items
  • take union and remove dupliates
  • compute distance for remaining items
  • return NN items
pip install --user annoy
pip install pynndescent
재밌는게 재밌는거다

0개의 댓글