
Perceptron

Sigmoid, Tanh, ReLU

Negative Log Likelihood Loss, Cross Entropy Loss, Mean Squared Loss, Gradient Descent

GD, SGD, Momentum, Adagrad, RMSProp, Adam, AdamW, LambdaLR, MultiplicativeLR, StepLR, ReduceLROnPlateau

Overfitting, Underfitting, Weight Decay, L1 Regularization, L2 Regularization

Data normalization, Standardization, Batch Normalization, Weight Normalization, Layer Normalization, Instance Normalization, Group Normalization

DP, All-reduce, DDP, Gradient Bucketing