input: image —> output : assign image to one of a fixed set of categories
fine-grained categories
no obvious way to hard-code the algorithm for recognizing a cat, or other classes.
def train(images, labels):
# Machine Learning
return model
def predict(model, test_images):
# Use model to predict labels
return test_labels
MNIST, CIFAR10, CIFAR100, ImageNet,
ImageNet : Performance metric - top 5 accuracy : algortihm predicts 5 labels for each image; one of them needs to be right
1623 categories : characters from 50 different alphabet
20 images per category
Meant to test few shot learning
def train(images, labels):
# Machine Learning
return model
→ memorize all data and labels
def predict(model, test_images):
# Use model to predict labels
return test_labels
→ predict the label of the most similar training image
L1 distance (Manhattan)
L2 distance (Euclidean)

tf-idf similarity
robust and can apply various type od data
choices about our learning algorithm that we don’t learn from the training data; instead we set them at the start of the learning process
very problem-dependent.
how to set hyperparameters
curse of dimensionality : for uniform coverage of space, number of training points needed grows exponentially with demension
nearest neighbor with convNet features works well!
image captioning with nearest neighbor approach works well