How to Train 10,000-Layer Vanilla Convolutional Neural Networks

Hanna·2022년 1월 30일

CV 논문리뷰

0

논문리뷰

목록 보기

8/9

QUESTION

train architectures that are currently untrainbalbe?
eliminate the need to search over hyperparameters?
disentangle trainablilty, expressivity, and gerneralization?

MOTIVATION : train the untrainable, eliminate hyperparameters, disentangle contributions to sucess

Signal propagation in deep networks : predicts trainability by examing weather correlations between inputs survive with depth
Mean field analysis : Vanishing / exploding gradients
Dynamical isometry : ensure well conditioned Jacobian
Delta orthogonal : guarantee survival of Fourier modes and enable training of 10,000 layer vanilla CNNs.

CONCLUSION

Developed a mean field theory to understand signal propagation in deep CNNs
Developed connection between Fourier modes and generalization
Two new initialization methods:
- Random orthogonal kernels
- Delta-orthogonal kernels
Trained 10k layer tanh network w/o use of batch norm or residual connections, and w/o reduction in test accuracy

매일 성장하고 있습니다

이전 포스트

[TIL] Understanding the difficulty of training deep feed-forward neural networks

다음 포스트

[TIL] You Only Look Once: Unified, Real-Time Object Detection

0개의 댓글

관련 채용 정보