Post-Training Quantization for Vision Transformer

문상준·2025년 10월 22일

논문 리뷰

목록 보기

14/28

Post-Training Quantization for Vision Transformer

PTQ for ViT에 대한 내용

Abstract

Attention mechanism 기능을 보존하기 위해,
Q된 attention map의 relative order를 유지하게하는 ranking loss를 최종 Q 목적 함수에 도입
MHA의 Attention map과 MLP의 output feature 간 nuclear norm을 활용하여, MixQ

1 Introduction

이 논문 이전의 PTQ 방법들은 CNN이나 RNN을 위해 설계됨
⇒ transformer의 특징 고려 X

⇒

Attention map의 relative order를 유지하기 위해, ranking loss 도입
Attention map과 MLP의 output feature의 sensitivity 측정을 위해, nuclear norm을 사용해서, MixQ
Bias correction을 통한, Q error 보정

생략

3 Methodology

Linear layer를 위한 similarity-aware quantization + ranking-aware quantization
Accuracy ↑을 위한 bias correction
Nuclear norm을 통한 MixQ

3.1 Preliminaries

에서 $X$ 의 가로 한 줄이 patch 하나에 대한 정보

에서 $X_lW_l^Q$ 의 결과 $Q_l$ 또한 가로 한 줄이 patch 하나에 대한 정보

3.2 Ranking-Aware Post-Training Quantization

는 Q 함수이다.

는 Q된 출력이다.

scale factor ( $\Delta_W^l, \Delta_X^l$ )가 Q 결과에 큰 영향을 미침을 알 수 있다.
∵ 사실상 조절 가능한 것이 $Y$ 와 $\Delta$ 이기 때문에...

⇒ Calibration Dataset에서 생성된 weight와 activation에 대한 optimal scale factor를 찾자!!!

Self-attention layer는 전역적 관련성을 계산하는 CNN과 차별화를 갖는 구성 요소임.

Q 後 attention map의 relative order가 변경되는 것을 empirically 관찰
⇒ 이는 심각한 performance degradation 발생

Relative order란?
Attention map의 한 행 내에서, 임의의 두 값 $A_{kp}, A_{kq}$ 의 대소 관계를 의미

Q 後 attention map의 relative order가 변경됬다는 것은
Q 前에는 한 row에
..., 3.61, 5.234, ..., 4.24, ...
이었는데,
Q 後에는 한 row에
..., 4, 5, ..., 4, ...
가 되면,
$3.61 < 4.24$ 의 대소 관계가 사라짐