[짧은 개념] knowledge-distillation / self-distillation

kiteday·2024년 2월 7일

baseline

목록 보기

6/7

distillation 개념들이 자꾸 논문에서 등장하는데 용어의 개념을 정확하게 모르니까 헷갈려서 간단하게 정리하고자 한다.

한 줄로 말하자면 teacher model을 보고 student model을 훈련 시키는 방법

teacher가 가진 지식을 student에게 잘 전달한다는 관점
전통적인 knwledge distillation은 teacher의 파라미터 수가 압도적으로 student보다 많다.
- (생각해보면 당연하다. 더 많은 knoledge를 가지고 있으려면 당연히 파라미터 수가 많겠지)

문제점
teacher 모델을 잘 구축하는 것이 너무 어렵다!
teacher 모델은 특정 task에 대해서 잘 해결하는 모델이어야 한다. == knowledge가 풍부하다. → 어렵다!

그래서 나온 것이 self-distillation

전통적인 개념의 teacher 모델을 두고 student 수준 내에서 학습하는 방법이다.

자세한 개념은 추후에 다뤄야겠다.

[1] Das, R., & Sanghavi, S. (2023). Understanding Self-Distillation in the Presence of Label Noise. arXiv preprint arXiv:2301.13304.
→ 이 논문에서 처음으로 distillation개념을 정리해야겠다고 생각했다.
[2]| 노형종, 텍스트 생성 성능을 높기기 위한 Self-distillation 기술, nc research blog.
→ 여기서 다른 모델들과 잘 설명해두셨다.

공부