BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, arXiv 2018
Masked Language Model
Pre-train and Fine-tune