How to Train 10,000-Layer Vanilla Convolutional Neural Networks

Hanna·2022년 1월 30일
0

논문리뷰

목록 보기
8/9

QUESTION

  1. train architectures that are currently untrainbalbe?
  2. eliminate the need to search over hyperparameters?
  3. disentangle trainablilty, expressivity, and gerneralization?

MOTIVATION : train the untrainable, eliminate hyperparameters, disentangle contributions to sucess

  • Signal propagation in deep networks : predicts trainability by examing weather correlations between inputs survive with depth
  • Mean field analysis : Vanishing / exploding gradients
  • Dynamical isometry : ensure well conditioned Jacobian
  • Delta orthogonal : guarantee survival of Fourier modes and enable training of 10,000 layer vanilla CNNs.

CONCLUSION

  • Developed a mean field theory to understand signal propagation in deep CNNs
  • Developed connection between Fourier modes and generalization
  • Two new initialization methods:
    - Random orthogonal kernels
    • Delta-orthogonal kernels
  • Trained 10k layer tanh network w/o use of batch norm or residual connections, and w/o reduction in test accuracy
profile
매일 성장하고 있습니다

0개의 댓글