[ FastViT ] 1. ref papers.

d4r6j·2023년 9월 25일

Vision AI paper-review

vision-paper

목록 보기

2/11

1. RepVGG

Architecture
re-parameterize
- training time : multi-branched architecture
- inference time : a plain CNN-like structure
Review blog
- https://mole-starseeker.tistory.com/87

2. MobileOne

Architecture
- Left : Train time MobileOne block with reparameterizable branches.
- Right : MobileOne block at inference where the branches are reparameterized.
- Up : depth-wise conv
- Down : point-wise conv
Conv-BN 이야기. ( 수식 - update 예정.)

3. MetaFormer

Architecture
- Token Mixer ( via Fast-ViT )
```
token_mixers = ("repmixer", "repmixer", "repmixer", "attention")
```
- github : https://github.com/sail-sg/poolformer/tree/main

4. MLP-Mixer

Architecture

5. ResNet-v2

Architecture

6. Xception

Architecture
Depthwise Separable Conv

7. DeiT

Architecture

ViT 에 distillation token 을 추가한다. 그것은 self-attention layer들을 통해 class 와 patch token 과 함께 interact 한다.
이 distillation token 은 class token 과 유사한 방법 (similar fashion) 으로 이용하는데, nework 의 출력으로 true label 대신에 teacher 에 의해 예측된 (hard) label 을 reproduce 하는 것이 목적.
transformer 에 입력된 class 와 distillation 두 토큰은 back-propagation 에 의해 학습된다.

Notation

notation	description
$Z_t$	the logits of the teacher model.
$Z_s$	the logits of the student model.
$\tau$	the temperature for the distillation
$\lambda$	the coefficient balancing the KL divergence loss
${\mathcal{L}}_{CE}$	cross-entropy
$y$	ground truth labels
$\psi$	softmax function

Soft distillation

minimizes the Kullback-Leibler divergence between the softmax of the teacher and the softmax of the student model.

\mathcal{L}_{global} = (1-\lambda)\mathcal{L}_{CE}(\psi(Z_s), y) + \lambda \tau^2{\rm KL} \left( \psi\left(\frac{Z_s}{\tau}\right), \psi\left(\frac{Z_t}{\tau}\right) \right)

Hard-label distillation

We introduce a variant of distillation where we take the hard decision of the teacher as a true label. Let $y_t = {\rm argmax}_cZ_t(c)$ be the hard decision of the teacher, the objective associated with this hard-label distillation

\mathcal{L}^{\rm hardDistill}_{\rm global} = \frac{1}{2}\mathcal{L}_{\rm CE}(\psi(Z_s), y) + \frac{1}{2}\mathcal{L}_{\rm CE}(\psi(Z_s), y_t)

이전 포스트

[ ViT ] paper, code review

다음 포스트

[ FastViT ] 2. prerequisite.

0개의 댓글