How to Train 10,000-Layer Vanilla Convolutional Neural Networks

Hanna·2022년 1월 30일


목록 보기


  1. train architectures that are currently untrainbalbe?
  2. eliminate the need to search over hyperparameters?
  3. disentangle trainablilty, expressivity, and gerneralization?

MOTIVATION : train the untrainable, eliminate hyperparameters, disentangle contributions to sucess

  • Signal propagation in deep networks : predicts trainability by examing weather correlations between inputs survive with depth
  • Mean field analysis : Vanishing / exploding gradients
  • Dynamical isometry : ensure well conditioned Jacobian
  • Delta orthogonal : guarantee survival of Fourier modes and enable training of 10,000 layer vanilla CNNs.


  • Developed a mean field theory to understand signal propagation in deep CNNs
  • Developed connection between Fourier modes and generalization
  • Two new initialization methods:
    - Random orthogonal kernels
    • Delta-orthogonal kernels
  • Trained 10k layer tanh network w/o use of batch norm or residual connections, and w/o reduction in test accuracy
매일 성장하고 있습니다

0개의 댓글