Artificial Neural Networks
Deep Neural Network (DNN)
perceptron
Linear layer (= Fully-connected layer)
Softmax layer
Training NN via Gradient Descent
Sigmoid Activation
Tanh Activation
ReLU Activation
Batch Normalization
ConvNets or CNN (Convolutional Neural Network)
Process
Various CNN Architectures
RNN (Recurrent Neural Networks)
Various Problem settings of RNN-based Sequence Modeling
Auto regressive model
: at test-time sample characters one at a time, feed back as an input to the model at the next time step
Advanced RNN models such as LSTM or GRU are often used in practice
Long Short Term Memory (LSTM)
Seq2seq model
Seq2seq with Attention
Advanced Attention Techniques
Scaled Dot-product Attention
Multi-head attention
Layer Normalization
Self supervised learning
: Given unlabeled data, hide part of the data and train the model so that it can predict such a hidden part of data, given remaining data
BERT
GPT (Generative Pre-Training Transformer)