[논문 리뷰] (ImgCls) Prototypical Networks for Few-shot Learning

빵 반죽·2023년 8월 24일
0

0. 사전 지식

  • Bregman divergence (참고)
    • 함수 F(x)F(x)에 대한 테일러 1차 근사(FF')와 실제값과의 차이
    • DF(p,q)=F(p)F(q)<F(q),pq>D_F(p,q)=F(p)-F(q)-<\nabla F(q),p-q>
    • F(p)=F(q)+F(p)(pq)F'(p)=F(q)+\nabla F(p)(p-q)
    • 에르미트 내적이 뭐죠
    • ex.) Euclidean Norm -> Euclidean Distance, Entrophy -> KL-divergence,
    • 아무튼 p,q 사이의 괴리 그런 것으로 이해
    • (중요) p가 랜덤 벡터라면 이 식을 최소화 하는 q는 랜덤 p들의 평균이 된다.
  • K-Nearest-Neighbor (KNN)
    • 쿼리의 제일 가까운 k개의 점 중에서 다수결로 클래스를 정하는 방법
  • Generative model vs. Discriminative model

1. Introduction

  • Matching Networks(Vinyals et al.)
    • embedding is learned(via neural network)
    • weighted nearest-neighbor classifier
  • meta-learning via LSTM(Ravi, Larochelle)
    • learns to train a custom model for each episode
  • very little data -> prone to overfit
    • take advantage of this fact and assume that there exist a single embedding that represents each class("a prototype").
    • this prototype is defined as an average of its support set embeddings.
    • embeddings are generated by neural networks
  • the importance of choosing a good metric

2. Prototypical Networks

  • Prototype ckRMc_k\in\mathbb{R}^M
    • Embedding Function fϕ:RDRMf_{\phi}:\mathbb{R}^D\rarr\mathbb{R}^M (ϕ\phi is a learnable parameter)
    • ckc_k is an average of embedded support points in SkS_k (for class kk)
    • distance function d:RM×RM[0,+)d:\mathbb{R}^M\times\mathbb{R}^M\rarr[0,+\infin)
    • probablility that x\bold{x} is class kk = softmax(d(fϕ(x),ck))softmax(-d(f_{\phi}(\bold{x}),c_k))
    • Objective: minimize J(ϕ)=logpϕ(y=kx)J(\phi)=-\log p_{\phi}(y=k|\bold x) for true class kk
  • Connections to Mixture Density Estimation
    • If dd is defined as a Bregman divergence such as Euclidean distance, Prototypical Network is equivalent to mixture density estimation. (~~)

3. Experiments

  • Training Setting
    • shot -> better to match test & training set
      • 모델에 따라 다른 것 같다, 다른 논문에서는 shot 수를 다르게 한 것이 더 좋았다. 뭐가 달라서 이렇게 되는거지..
    • way -> larger way for training gives better results
      • way를 크게 할 수록 더 미세한 차이도 임베딩하도록 학습한다. 어려운 테스크로 훈련하면서 일반화가 잘 된다.
  • Metric
    • Euclidean distance is much more suitable for this model as prototypes are defined to be the average of the supporting points. Results are far superior compared to cosine similarity.

4. Opinion

  • 관심이 가는 부분은 train/test 조건을 맞추는 것이 메타라는 것

0개의 댓글

관련 채용 정보