1: // Training BN network
2:
3: Add transformation to (Alg. 1)
4: Modify each layer in with input to take instead
5: end for
6: Train to optimize the parameters
7: // Inference BN network with frozen parameters
8:
9: // For clarity,
10: Process multiple training mini-batches , each of size m, and average over them:
$$E[x] \gets E_{\mathcal{B}}[\mu_{\mathcal{B}}]$$
11: In , replace the transform with
12: end for
https://en.wikipedia.org/wiki/Batch_normalization
https://arxiv.org/pdf/1502.03167.pdf
https://eehoeskrap.tistory.com/430
https://m.blog.naver.com/laonple/220808903260
https://towardsdatascience.com/batch-normalization-in-3-levels-of-understanding-14c2da90a338
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-batch-normalization/