Residual learning Neural Machine Translation of Rare Words with Subword Units (BPE) BERT GPT1 Attention is all you need LoRA Longformer / bigbird GAN Auto Encoder VAE