https://openreview.net/pdf?id=B1Yy1BxCZ
The most popular form of learning rate annealing is a step decay
where the learning rate is reduced by some percentage after a set number of training epochs.
https://www.jeremyjordan.me/nn-learning-rate/
https://www.baeldung.com/cs/learning-rate-batch-size
https://inhovation97.tistory.com/32
https://arxiv.org/abs/1812.01187
“A model tweak is a minor adjustment to the network architecture, such as changing the stride of a particular convolution layer. Such a tweak often barely changes the computational complexity but might have a non-negligible effect on the model accuracy.”
https://norman3.github.io/papers/docs/bag_of_tricks_for_image_classification.html