Advanced Self-supervised Pre-trained Language Models
GPT-2
GPT-2: Language Models are Unsupervised Multi-task Learner
- Down-stream tasks in a zero-shot setting
GPT-2: Motivation (decaNLP)
GPT-2: Datasets
Reddit에서 3개 이상의 좋아요를 받은 글들을 수집해서 학습했다.
- Preprocess
- Bypte pair encoding (BPE)
GPT-2: Question Answering
- Use conversation question answering dataset(CoQA)
GPT-2: Translation
GPT-3
GPT-2를 개선한 모델
GPT-2보다 더 많은 파라미터 수와 더 큰 batch size를 이용한다.
- Zero-shot
- One-shot
- Few-shot
ALBERT: A Lite BERT for Self-supervised Learning of Language Representation
- Factorized Embedding Parameterization