Human can recognize that the query is pangliln based on difference between four images, but it is chellenging for computer because there are a few images
traditional purpose of deep learning is recognizing certain class from the image, however in few shot learning, it's purpose is learning to learn -> learning similarity and differences between objects
few shot learning is kind of meta learning
There are two kind of samples ( Positive samples and negative samples )
First, the model should extract features from sample with CNN(f). f will produce two feature vectors(h1 & h2). Then the difference between two vector(z) will go through dense layers and sigmoid. The output should be 1 if input sample is positive sample. Otherwise, the output should be 0(when negative sample). The fully connected layer will scale the difference between two object 0 to 1.
As you can see, the structure looks like siamese twins. The loss will go through dense layser and CNN for backpropagation
(when input is negative sample)
Another method for training siamese network
First, choose anchor object. After then choose one positive object and negative object from datset. By calculating feacture vector from these three objects, finally calculate postive distance and negative distance. The positive distance should be small and negative distance big enough.
Finally, the loss function should be like upper image.
After training siamese network, use the network for one-shot prediction
The cosine similarity means how two vectors are similiar to each other.
There are three normalized feature vectors, and calculate softmax to choose most similar object.
In fine tuning, it updates weight and bias of feature extractor(CNN) based on support set.
In few shot learning, there are a few dataset and can cause overfitting. To prevent the case we can sue entropy regularization and it makes good sense.
references