Kaiming Initialization, also known as He Initialization, is a technique used to initialize the weights of deep neural networks, particularly those with ReLU activation functions. It is named after Kaiming He, who proposed this method to address the issues of vanishing and exploding gradients in deep networks. Proper weight initialization is crucial as it significantly impacts the convergence and performance of the model.
In neural networks, especially deep architectures, the choice of weight initialization can affect the training dynamics significantly. Poor initialization can lead to problems such as:
Kaiming Initialization addresses these issues by considering the variance of the outputs from a layer to be equal to the variance of its inputs. This balance helps in maintaining a stable gradient flow across layers, which is critical in deep networks.
For a layer with incoming connections (fan-in), Kaiming Initialization sets the weight as:
where denotes a normal distribution with mean and variance . This choice is derived from the consideration that the ReLU function, which outputs zero for any negative input and acts as a linear function for positive inputs, effectively reduces the variance of the outputs by half compared to the variance of the inputs.
Kaiming Initialization is widely used in networks with ReLU activations and its variants. It has been proven effective in:
Adjustments to Kaiming Initialization can be made to better suit other activation functions. For example, different scaling factors might be used based on the expected variance reduction of other activations.
- He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification." Proceedings of the IEEE international conference on computer vision. 2015.