learning_rate: 2e-05, batch size: 16 learning_rate: 0.001, batch size: 256
https://openreview.net/pdf?id=B1Yy1BxCZBatch Sizesmall: converges quickly at the cost of noise in the training processlarge: converges slowly wit
The process of finding the right combination of hyperparameters to maximize the model performanceRandom SearchGrid SearchEach iteration tries a combin
Before RNN LSTM : slow to train Can we parallelize sequential data? Transformers Input sequence can be transmitted **parallel ** No concept of t
A python file("hw1.py") runs well on my laptop.But what if it does not on my friend's?➡ ModuleNotFoundError: No Module named 'xgboost'➡ The module isn