(작성중) [ZSL SOTA] Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

김예지·2022년 11월 21일

용어 정의

Generalized ZSL : test instance가 unseen일 수도 있고 seen일 수도 있음
Cyclic Consistency : ??
Semantic Embedding Decoder : ??

Abstract

sota - GAN으로 class-specific semantic embedding을 leverage하여 unseen class feature를 합성함.
train 과정 도안 sematically constistent feature를 만들지만, feature synthesis와 classification 과정에서는 이러한 제약을 버림.

| 본 논문의 제안

semantic embedding decoder로부터 training과 feature synthesis 단계 모두에서 generated features를 refine하는 feedback loop를 제안
decoder에서 얻은 corresponding latent embedding과 함께 synthesize된 feature는 categories 간 ambiguities를 줄이기 위한 분류에서 discriminative features로 변환되고 이용됨.
semantic consistency와 iterative feedback의 효용을 확인함.
(generalized) ZSL의 모든 과정에서 semantic consistency를 enforce하자.

Introduction

| 기존의 ZSL 방법들

labelled seen class instances로 inter-class relationship을 encodinggksms class specific semanti emedding을 만든다.
GAN을 이용해 real data와 generated data의 divergence를 줄이는 방향으로 optimizing함
(논문명, 42) GAN이 seen class feature instances에 대응되는 corresponding class-specific semantic embedding을 만듦
- Unseen class의 feature instance는 학습된 GAN으로 만듦
- ZSL classifier는 fully-supervised setting에서 학습함
(논문명, 8, 13, 25) decoder 같은 auxillary module을 이용해 학습 동안 reconstruction of semantic embeddings에 cycle-consistency constraint를 enforce함.
- 이 auxillary decoder module은 generator가 semantically consistent feature를 만들도록 도와줌.
- 이 module은 training 과정에만 쓰이고 feature synthesis와 ZSL classification 단계에서는 버려짐.
- auxillary module이 generator를 도와주면서 feature synthesis 때 discriminative feature를 얻거나 classification 때 ambiguity를 줄이는 데에 기여함.
GAN은 mode collapse issue를 겪어 diversity of generated features가 줄어드는 결과를 내기도 함.
- VAE가 더 안정적인 성능을 내긴 하지만 approximate inference distribution이 true posterior와 다를 가능성이 높음.
(논문명, 43) VAE decoder와 GAN generator를 같이 씀. > f-VAEGAN
generated feature가 real feature의 분포에 semantically 가깝게 하기 위해 training 과정에서 generated, original feature간에 cycle-consistency loss를 씀.

우리는 training 과정에서 semantic embedding에 similar consistency loss를 쓰고, feature synthesis와 classification 단계에서도 학습된 정보를 쓸 것을 제안함.

1. Contributions

ZSL framework의 모든 단계에서 Semantic Embedding Decoder(SED)를 효과적으로 쓸 수 있게 하는 novel method를 제안함.
VAE-GAN architecture 기반.
SED를 training과 FS에 모두 사용하기 위한 (G)ZSL을 위한 feedback module을 이용함.
- feedback module은 먼저 SED의 latent embeddings를 변환하고, 이는 latent representation of generator를 modulate하는 데에 쓰임.
classification 단계에 Discriminative Feature Transformation을 도입하여 SED의 latent embedding을 corresponding visualf eature와 함께 사용하여 categories 간의 ambiguity를 줄일 수 있게 함.

초기 ZSL image classification
| Inductive Approaches
- seen classes의 labelled data만 사용함.
- (논문명, 16, 21) : seen, unseen classes를 관련시켜 semantic embedding classifier를 학습함.
- (논문명, 1, 9, 34) : semantic embedding과 visual feature space 간의 compatibility function을 학습함.
| Transductive Approaches
- (논문명, 10, 33, 45) : transductive ZS setting에서 label propagation을 통해 unseen classes의 unlabelled data를 좀 더 활용함...
GAN 활용
- unseen class의 feature를 GAN으로 생성하여 fully-supervised setting에서 ZSL를 돌림.
- Conditional Wasserstein GAN (WGAN)은 unseen class feature synthesis generator를 학습하기 위해 seen category classifier를 이용함.
  - WGAN loss와 classification loss를 활용
- (논문명, 8) : seen category classifier를 cycle-consistency loss와 합친 decoder로 대체함.
- (논문명, 35) : 2개의 VAE를 이용해 shared latent space에서 visual feature와 corresponding embedding을 align하는 cross and distribution loss를 제안함.
- (논문명, 43) : FS를 위해 VAE decoder와 GAN generator를 공유하여 VAE와 GAN을 합친 f-VAEGAN을 제안함.
  - 학습 과정에서 generated, original visual features 간의 cycle-consistency constraint를 도입함.
  - 그러나 semantic embedding에서 비슷한 constraint는 enforce되지 않음.
- (논문명, 8, 47, 13, 25) : 다른 GAN-based ZSL classif 모델은 embedding에도 cycle-consistency를 enforce하기 위해 auxiliary module을 사용함.
  - 그러나 이 module도 training에만 쓰이고 fs나 zsl classf 단계에서는 안 쓰임.
다른 domain 적용
- (논문명, 46, 14, 23, 36) : classf, img2img translation, SR 등 다양한 분야의 성능을 점진적으로 올리기 위해 levraging feedback info를 도입함.

우리는 ZS recognition 맥락에서 fs를 향상시기키기 위해 feedback loop를 연구함 -> VAE-GAN framework에 feedback module를 설계한 후 ZSL에서 synthesized feature를 반복적으로 refine함.

Method

| Notations

x : encoded feature instances of images

y: label

a(k) : category-specific semantic embeddings

3.1 Preliminaries : f-VAEGAN

f-VAEGAN은 f-CLSWGAN과 비교해서 VAE와 GAN의 decoder와 generator를 공유하여 semantically consistent feature를 생성함.
- VAE : E(x, a) -> z (latent code) + GAN : G(z, a) -> x (reconstructed from z)
- L_v: Kullback-Leibler divergence
- p(z|a) : prior distribution, assumed to be N(0, 1)
- log G(z, a) : reconstruction loss
- 식 (1) : Kullback-Leibler + Constrain term
Feature Generating network (f-WGAN)
- generator G(z,a)와 discriminator D(x,a)로 이뤄짐.
- G(z,a) : random input noise z에서 feature x_hat을 만듦. -> D(x,a)가 x를 받아서 진짜 feature인지 구분
- WGAN loss (lambda : penalty coefficient)
Limitations
- 식 1 Constrain term : ensures the generated visual features are cyclically-consistent at train time, with the original visual features.
- semantic embedding에서는 비슷한 cycle-consistency constraint 없음.
  - 다른 GAN-based ZSL method들은 auxiliary module을 사용해서 embedding에서도 cyclic-consistency를 하기도 함.
    - but 이들 역시 train 과정에서만 쓰고 FS나 ZSL classf 단계에서는 버림.
우리 주장
- generator와 SED는 feature instance에 대한 complementary information을 갖고 있다.
- generator는 semantic embedding을 feature instance로 바꿈
- SED는 feature instance를 semantic embedding으로 바꿈.

3.2. 전체적인 구조

Encoder E, Generator G, Discriminator D로 구성됨
Encoder E
- input : real features of seen classes x, semantic embeddings a
- output : parameters of a noise distribution
- 이 parameter들은 KL divergence를 통해 zero-mean unit-variance Gaussian prior distribution으로 matching됨
Generator G
- input : noise z, embeddings a
- output: synthesizes feature x_hat
- 이 x_hat과 x는 BCE(binary cross entropy) loss로 비교됨
Discriminator D
- input: x나 x_hat과 embedding a
- output: x or x_hat이 진짜 or 가짜
- WGAN Loss L_w를 output D에 적용하여 real, fake feature를 구분하는 법을 배움
이 논문의 핵심 : SED Dec과 feedback module F
- FS, (G)ZSL Classf 단계에 additional semantic embedding decoder (SED) Dec을 넣음
- Dec과 함께 학습과 FS에 쓰이는 feedback module F도 추가함.
- Enhanced Feature Synthesis, Reduced ambiguities among categories during classification.
DEC
- x나 x_hat을 받아 embedding a_hat을 재구성함.
- cycle-consistency loss L_R을 사용해 학습.
- learned DEC은 그 후에 (G)ZSL classifier에 이용됨.
feedback module F
- Dec의 latent embedding을 변환 -> G의 latent repr에 다시 넣어주어 FS 능력을 향상시킴.

3.3 Semantic Embedding Decoder

SED
- generated feature x_hat을 받아 semantic embedding a를 만듦.
- cycle-consistency를 enforce하여 FS 과정에서 generated feature가 다시 feature를 만든 embedding으로 변환됨을 보장함.
- SE의 cycle-consistency는 l1 reconstruction loss를 이용해 달성됨.
- 식 4: Train 과정에서 TF-VAEGAN의 loss formulation은 아래와 같다. (beta는 hyperparameter)
- training에만 SED를 사용했던 기존의 GAN-based ZSL과 달리 여기서는 train, FS, classf 모든 단계에 씀.
Discriminative Feature Transformation
- classf에서의 SED의 중요성과 FS에서의 역할을 살펴보자
- ZSL classf 단게에서 SED의 auxiliary info를 잘 사용하기 위한 discriminative feature transformation을 소개함.
- generator G는 seen class의 feature와 embedding 만을 활용해 per-class "single semantic embedding to many instances" mapping을 학습함.
- SED도 G와 비슷하게 seen class만을 활용하지만 per-class "many instances to 1 embedding" inverse mapping을 학습함.
  -> G와 SED가 서로 complementary info를 학습할 수 있음.
- classf 단계에서 SED의 latent embedding의 정보를 사용하면 서로 다른 category의 feature instance간의 ambiguities를 줄일 수 있을 것.
- 1. feature generator G와 semantic embedding decoder Dec를 학습함
- 1. Dec으로 feature를 embedding space A로 변환함.
- 1. Dec의 latent embedding과 Dec의 input repective visual features를 concat함.

-> final classifier가 categories를 더 잘 구분할 수 있게 됨.

3.4 Feedback Module

baseline f-VAEGAN은 attribute space에서는 cycle-consistency를 enforce하지 않았음.
- class-specific embedding a에서 generator를 통해 visual feature x_hat를 바로 만듦.
  -> real feature와 sythesized visual feature간의 sematic gap을 만듦.
그래서 우리는 feature generation 과정을 refine하는 feedback module F를 만들었다!
- feedback loop는 F에 의해 Dec에서 G로 향함.
- gl을 G의 _l th layer output, x_hat^f를 gl_에 대한 feedback component라고 하자. x_hat^f=F(h), h=latent embedding of Dec이다. delta : feedback modulation을 control.
Feedback Module Input
- adversarial feedback은 unconditional discriminator D의 latent repr을 input으로 받음.
- 그러나 ZSL에서는 D는 conditional(???)하며 seen categories의 real/fake 여부를 구분하는 objective로 학습됨.
  -> unseen class feature synthesis에 대한 reliable feedback을 받지 못함.
- 그래서 우리는 SED Dec에 관심을 돌림.
  - SED Dec : feature instances에서 class-specific semantic embedding을 reconstruct하는 게 목적
- Dec은 visual features->semantic embedding으로부터 class-specific transformation을 배우기 때문에, G한테 feedback을 주기에 D보다 적합함.
Training Strategy
- Originally,F는 2 stage fashion으로 학습됨.
  - standard GAN처럼 G, D는 먼저 fully trained.
  - G를 얼린 다음 D로부터 feedback을 받아(???) F를 학습함.
  - F의 feedback으로 G output이 개선되면 D도 F를 따라 학습하기 시작.
    -> G가 항상 고정되어있어 FS를 개선하지 못하기 때문에 ZSL에는 sub-optimal인듯.
- 그래서 우리는 G, F를 번갈아가며 학습함. (아직 이해 못함)
  - our alternating strategy에서는 generator training iter는 변하지 않음.
  - F의 train iters 과정에서 2 sub-iteration을 수행함.
    | 1st sub iteration:
    - x_hat[0]=G(z,a) (z: noise, a: semantic embedding)를 만든 후 Dec으로 보냄.
    | 2nd sub iteration
    - Dec이 만든 latent embedding h_hat을 F로 보내 x_hat^f[t]=F(h_hat)을 얻음ㅁ.
    - x_hat^f[t]에 delta를 곱해 G의 g_l에 더해줌.
    - 1st sub iter에 쓰였던 z, a를 G에 input으로 주어 x_hat[t+1]=G(z, a, x_hat^f[t])를 얻음 -> refined feature
    - x_hat[t+1]를 D와 Dec에 input으로 주고 train을 위해 식4의 loss를 계산함.
    - in practice, 2nd sub iter는 한 번만 수행됨.
  - feedback module F는 G가 current generated features로 Dec의 latent embedding을 볼 수 있게 해줌.