Attention Is All You Need, NIPS 2017
Improving Language Understanding by Generative Pre-Training
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, arXiv 2018
RoBERTa: A Robustly Optimized BERT Pretraining Approach, Facebook AI
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Multitask Prompted Training Enables Zero-Shot Task Generalization, ICLR 2022
LLaMA: Open and Efficient Foundation Language Models, arXiv 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4, arXiv 2023
OPT: Open Pre-trained Transformer Language Models, arXiv 2022