[Paper Review] Fundamentals in Deep Learning (Norm, TTA, Gumbel Trick etc.)

1.[2015 ICML] Batch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift

post-thumbnail

2.BatchNorm vs. LayerNorm vs. RMSNorm

post-thumbnail

3.[2025 CVPR] Transformers without Normalization

post-thumbnail

4.[2021 ICML] [Simple review] High-Performance Large-Scale Image Recognition Without Normalization

post-thumbnail

5.[2021 ICLR] TRAINING BATCHNORM AND ONLY BATCHNORM: ON THE EXPRESSIVE POWER OF RANDOM FEATURES IN CNNS

post-thumbnail

6.(Forward) Gumbel-Max, (Backward) Gumbel-Softmax

post-thumbnail