Knowledge Distillation
https://velog.io/@dldydldy75/Leveraging-Pre-trained-Information#knowledge-distillation
Kullback-Leibler Divergence
https://3months.tistory.com/436
Multi-Head Self-Attention
https://jalammar.github.io/illustrated-transformer/
Inductive Bias
https://www.dacon.io/forum/405840
https://moon-walker.medium.com/transformer%EB%8A%94-inductive-bias%EC%9D%B4-%EB%B6%80%EC%A1%B1%ED%95%98%EB%8B%A4%EB%9D%BC%EB%8A%94-%EC%9D%98%EB%AF%B8%EB%8A%94-%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C-4f6005d32558
Repeated Augmentation
https://visionhong.tistory.com/29
Stochastic Depth
https://paperswithcode.com/method/stochastic-depth
DeiT
https://junha1125.github.io/blog/artificial-intelligence/2021-04-14-DeiT/ https://www.youtube.com/watch?v=DjEvzeiWBTo