[ZeroShot Learning 개요] A comprehensive survey of zero-shot image classification: methods, implementation, and fair evaluation

권유진·2023년 2월 12일

ZeroShot

논문 리뷰

목록 보기

17/17

Abstract

딥러닝 방법은 라벨링된 학습 샘플 수가 제한되면 성능이 감소한다.
- = few shot learning 상황
few shot learning에서는 이전에 보지 못한 class instance의 분류 정확도가 감소한다.
- = zero shot learning 상황
zero shot learning 방법이 급격히 발전
- 효과성을 공정히 측정하기 어렵다.

Introduction

딥러닝은 이미지 분류에서 전례없는 성공을 달성
- 강력한 딥러닝 모델을 사용하면 라벨링된 샘플 수가 충분하기만 하다면 응용분야에 활용 가능
딥러닝 모델은 매우 많은 양의 학습 샘플을 라벨링해야 함
- 많은 비용이 들음
실용에서 흔히 사용되는 시나리오는 다음과 같다.
- Large target size: 인간은 3000개의 basic level classes를 구별할 수 있다.
  - 각 class는 종속된 세부 클래스로 확장될 수 있다.
    - ex) 개 class는 여러가지 종 class로 확장 가능
  - 많은 수의 카테고리는 충분한 수의 라벨링된 sample을 갖기 어렵게 한다.
- Rare target classes: 흔히 볼 수 없는 class는 sample을 수집하기 얿다.
- Growing target size: 몇 task의 target set은 빠르게 변화한다.
  - class의 수가 시간이 지날수록 증가
위 상황에서 딥러닝 모델을 재학습시키는 것은 어렵다.
- target sample을 얻었을 때 fine-tuning하는 것이 현실성 있다.
이러한 한계을 극복하기 위해, zero-shot learning(zero-data learning)이 제안되었다.
- 예시
  - 아이가 말의 모양, stripe, 검은/하얀색 개념이 있을 때
  - 얼룩말을 처음 보고 이를 얼룩말로 인식 가능
- auxiliary information은 각 카테고리와 sample을 설명
  - model은 sample과 auxiliary classifier 사이의 correlation 학습
  - 따라서 auxiliary information과의 correlation을 기반으로 unseen category 분류 가능

Overview of zero-shot learning

Zero-Shot Image Classification (information extractor기반으로 범주 분류)
- Embedding methods
  - Feature-vector based methods
    - Space alignment framework
    - Graph based framework
    - Meta learning framework
  - Image-based methods
    - Supervision based
    - Attention based
  - Mechanism improved methods
    - Training process focused
    - Test process focused
    - Entire process focused
- Generative methods
  - VAE based
  - GAN based
  - Multi-architecture based
  - Meta learning based

Auxiliary information

학습 시에 라벨링되어 존재하는 sample의 class를 seen class라고 함
- 학습 데이터셋에 존재하지 않는 target class를 unseen class라고 함
auxiliary information의 공간은 모든 class를 분류할 수 있는 충분한 정보를 보유해야 함
- aux info와 sample 사이의 효과적인 correlation이 학습하기에 unique하고 충분히 representative해야 함
zero-shot learning은 인간의 효율적인 학습 과정의 영감을 받음
- semantic information은 가장 대표적인 aux info가 됐다.
- image 처리에서의 feature 공간과 비슷하게 numeric 값에도 semantic space 존재
  - semantic space는 주로 attributes와 textual description으로 나뉜다.
Attribute
- attribute는 zero-shot learning에서 처음으로 사용됐고 가장 흔히 사용되는 source
- 인간이 수집하고 주석을 단 정보(human-annotated)는 정확한 정보를 보유하지만 시간 소요가 심하다.
  - 특성을 설명하는 단어, 구문으로 작성할 수도 있다.
  - 이러한 특성을 조합해 seen, unseen class 모두를 설명할 수 있다.
    - 이 조합된 특성은 class마다 모두 달라야 한다.
- 0/1 binary vector: 각 attribute를 보유했는지 여부를 표현
  - 즉, attribute vector는 같은 크기를 갖고 각 차원은 같은 순서로 같은 특성을 의미한다.
  - 종종 sample과 attributes 사이의 mismatch가 발생할 수 있다.
    - 말(horse) class 중에서도 흑마와 백마는 attribute vector가 다르다.
      - attribute vector가 [black, white, stripe]라면, 흑마는 [1,0,0] 백마는 [0,1,0]으로 다르다.
- 따라서, binary vector보다는 continuous value를 사용
  - 해당 특성의 정도 또는 신뢰도를 의미
  - voting 결과의 평균, 해당 특성에 일치하는 sample의 분포를 채택하기도 함
  - 클래스 간 특성의 정도를 측정하는 상대적인 특성도 제안됐음
Text
- class의 이름이나 정의같은 묘사를 auxiliary information으로 사용
  - 다른 부가 정보 없이 class 이름만 사용하는 것은 성능 향상에 크게 도움이 되지 않는다.
- NLP의 pre-trained word embedding model을 사용해 class name을 vector로 임베딩하고 의미 있는 semantic space 형성
  - 단어 간 의미적 유사도는 embedding vector 간 거리로 측정 가능
    - 학습 corpora가 포함하는 유사도 지식이 aux info로 활용
  - Word2Vec, GloVe가 주로 사용
- 온톨로지 측면에서 유사도 구축 가능
  - WordNet으로부터 hierarchical embedding 활용
- Bag-of-Words에서 binary occurrence 지표, 빈도를 활용
  - TDM을 활용
Other auxiliary information
- 분류에 활용하기에는 정보가 부족하여 semantic information과 함께 사용해야 함
- 사례
  - hierarchcial labels in taxonomy
  - 각 sample의 시각선 지점을 포착하는 attributes 사이의 correlation
    - attention 모듈 활용

Learning scenarios

이미지 분류 task에서 학습과 검증 데이터셋의 instance 분포 차이로인해 모델이 검증 시에 학습때 만큼 잘 작동하지 못했다.
- 해당 현상은 zero-shot learning에서도 존재
  - 오히려 검증셋에 seen class와 unseen class를 모두 보유해 더욱 심각
- 이러한 차이를 domain shift라고 부름
- 성능이 열등한 모델을 class-level over-fitting됐다고 부른다.
이 문제를 해결하기 위해 다양한 방법론 제안
- 학습 단계 관점에서 3가지 시나리오로 분류 가능
  - Inductive zero-shot learning: 학습 단계에서 seen class의 sample과 보조 정보만 이용
    - target class와 instance 모두 unknown이기 때문에 학습 난이도가 높음
    - class-level over-fitting이 잘 발생
  - Semantic transductive zero-shot learning: 학습 단계에서 labeling된 sample과 모든 class의 보조 정보 이용
  - Transductive zero-shot learning: 모든 class의 labeling된 학습 sample과 labeling되지 않은 test sample을 보조 정보와 함께 학습때 활용
    - 위 두 transductive learning은 unseen information을 제공 받아 학습 시나리오가 명확
    - 하지만 일반화 능력 부족
Conventional Zero-Shot Learning: zero-shot problem이 제시된 초기에는 unseen class를 잘 분류하는 것에만 초점을 맞춤
- 추후에 unseen class 분류는 seen class가 분류에 치명적임을 발견
- 따라서 초기에 제안된 모델은 seen, unseen category를 잘 분류하지 못했다.
결과적으로, 더욱 도전적인 Generalized Zero-Shot Learning이 많은 관심을 받았다.
- seen과 unseen class 모두 분류하는 task
Zero-Shot Learning의 목적은 인간이 학습한 지식과 보조 정보를 활용해 개념을 인지하고 새로운 sample을 분류하는 것을 모방하는 것
- 따라서 개념 인지는 unseen과 seen class가 모두 잘 분류될 때 정확하다고 평가 가능

Problem Definitions

각 이미지 sample은 픽셀 값으로 이뤄진 tensor 형태로 표현
$K$ 개 클래스를 갖는 $N$ 개의 sample을 보유
- $X=X^S \cup X^U$ : seen, unseen class 모두를 보유하는 image set
- $F(\cdot)$ : feature extractor
- $Y = Y^S \cup Y^U$ : label set
- $A=A^S\cup A^U$ : auxiliary information
  - $K$ 개의 vector 존재
- $K^S, K^U$ : seen, unseen class의 개수
- $X^S\cap X^U = Y^S\cap Y^U = A^S\cap A^U = \emptyset$
seen set의 일부는 test set으로 사용
- train set에 포함되지 않는다.
- $X^S = X^S_{tr} \cup X^S_{te}, Y^S=Y^S_{tr} \cup Y^S_{te}$
- train, test set 모두 $K^S$ 개의 class 보유해야 함
학습 process에는 3가지 scenarios 존재
- $D_{tr} = \{X_{tr},Y_{tr},A_{tr}\}$
  - inductive: $D_{tr}^I = \{X_{tr}^S,Y_{tr}^S,A^S\}$
  - semantic trasductive: $D_{tr}^{ST} = \{X_{tr}^S, Y_{tr}^S, A\}$
  - transductive: $D_{tr}^T = \{X_{tr}^S\cup X^U, Y_{tr}^S, A\}$
검증 process에는 2가지 형태 존재
- $D_{te} = \{X_{te},Y_{te},A_{te}\}$
  - conventional task: $D_{te}^C = \{X^U, Y^U, A^U\}$
  - generalized task: $D_{te}^G = \{X^U\cup X_{te}^S, Y^U \cup Y^S_{te}, A\}$
Zero-Shot Learning은 feature extractor $F(\cdot)$ 을 포함하는 information extractor $M$ 와 classifier $C$ 를 training set $D_{tr}$ 로 학습해 $X_{te}$ 를 분류하는 것을 의미

References

Yang, Guanyu, et al. "A comprehensive survey of zero-shot image classification: methods, implementation, and fair evaluation." Applied Computing and Intelligence 2.1 (2022): 1-31.

권유진

데이터사이언스를 공부하는 권유진입니다.

이전 포스트

[ZeroShot Learning 개요] A comprehensive survey of zero-shot image classification: methods, implementation, and fair evaluation

논문 리뷰

Abstract

Introduction

Overview of zero-shot learning

Auxiliary information

Learning scenarios

Problem Definitions

References

[논문 리뷰] Improving Language Understanding by Generative Pre-training

0개의 댓글

관련 채용 정보