시리즈

[Paper Review] Fundamentals in Deep Learning (Norm, TTA, Gumbel Trick etc.)

1.[2015 ICML] Batch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift

Author : Sergey Ioffe, Christian SzegedyConference : ICML 2015(International Conference on Machine Learning)

2023년 9월 17일

2.BatchNorm vs. LayerNorm vs. RMSNorm

Zhu, Jiachen, et al. "Transformers without normalization." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025. 위 논문을 읽으면서, no

2025년 11월 28일

3.[2025 CVPR] Transformers without Normalization

(background)modern neural networks에서, Normalization layers는 어디서든 사용되고(unbiquitous) 필수적으로 고려되어 왔다.(이 논문의 핵심)이 연구에서는 매우 간단한 technique을 사용하여, Transformer

2025년 11월 26일

4.[2021 ICML] [Simple review] High-Performance Large-Scale Image Recognition Without Normalization

https://arxiv.org/abs/2102.06171(배경)BN은 image classification model에서 key componenet이지만,the batch size and interactions between examples에 대한 depen

2025년 12월 30일

5.[2021 ICLR] TRAINING BATCHNORM AND ONLY BATCHNORM: ON THE EXPRESSIVE POWER OF RANDOM FEATURES IN CNNS

https://arxiv.org/abs/2003.00152DL techniques은 training affine transformations of features를 학습하는데 의존한다.이러한 것들 중 가장 prominent한 것은 feature normaliz

2026년 1월 13일

6.(Forward) Gumbel-Max, (Backward) Gumbel-Softmax

문제 상황: 어떠한 policy network가 policy를 보고 고해상도, 중해상도, 저해상도, skip 중 하나를 골라야 함.이렇게 하나를 고르는 방식 (argmax)는 수학적으로 미분이 불가능 (0 또는 불연속)하기 때문에,Error가 policy network

2026년 5월 23일