[논문 리뷰] (ImgCls) Prototypical Networks for Few-shot Learning
0. 사전 지식
- Bregman divergence (참고)
- 함수 F(x)에 대한 테일러 1차 근사(F′)와 실제값과의 차이
- DF(p,q)=F(p)−F(q)−<∇F(q),p−q>
- F′(p)=F(q)+∇F(p)(p−q)
- 에르미트 내적이 뭐죠
- ex.) Euclidean Norm -> Euclidean Distance, Entrophy -> KL-divergence,
- 아무튼 p,q 사이의 괴리 그런 것으로 이해
- (중요) p가 랜덤 벡터라면 이 식을 최소화 하는 q는 랜덤 p들의 평균이 된다.
- K-Nearest-Neighbor (KNN)
- 쿼리의 제일 가까운 k개의 점 중에서 다수결로 클래스를 정하는 방법
- Generative model vs. Discriminative model
1. Introduction
- Matching Networks(Vinyals et al.)
- embedding is learned(via neural network)
- weighted nearest-neighbor classifier
- meta-learning via LSTM(Ravi, Larochelle)
- learns to train a custom model for each episode
- very little data -> prone to overfit
- take advantage of this fact and assume that there exist a single embedding that represents each class("a prototype").
- this prototype is defined as an average of its support set embeddings.
- embeddings are generated by neural networks
- the importance of choosing a good metric
2. Prototypical Networks
- Prototype ck∈RM
- Embedding Function fϕ:RD→RM (ϕ is a learnable parameter)
- ck is an average of embedded support points in Sk (for class k)
- distance function d:RM×RM→[0,+∞)
- probablility that x is class k = softmax(−d(fϕ(x),ck))
- Objective: minimize J(ϕ)=−logpϕ(y=k∣x) for true class k
- Connections to Mixture Density Estimation
- If d is defined as a Bregman divergence such as Euclidean distance, Prototypical Network is equivalent to mixture density estimation. (~~)
3. Experiments
- Training Setting
- shot -> better to match test & training set
- 모델에 따라 다른 것 같다, 다른 논문에서는 shot 수를 다르게 한 것이 더 좋았다. 뭐가 달라서 이렇게 되는거지..
- way -> larger way for training gives better results
- way를 크게 할 수록 더 미세한 차이도 임베딩하도록 학습한다. 어려운 테스크로 훈련하면서 일반화가 잘 된다.
- Metric
- Euclidean distance is much more suitable for this model as prototypes are defined to be the average of the supporting points. Results are far superior compared to cosine similarity.
![](https://velog.velcdn.com/images/bbangsil20/post/bac5ca4a-3e60-4907-bf60-259f0b71efa9/image.png)
4. Opinion
- 관심이 가는 부분은 train/test 조건을 맞추는 것이 메타라는 것