GPT-2
1.Just a really big transformer LM
2.Trained on 40GB of text
2-1)Quite a bit of effort going into making sure the dataset is good quality
2-2)Take webpages from reddit links with high karma
3.Language model can perform down-stream tasks in a zero-shot setting – without any parameter or architecture modification
The Natural Language Decathlon: Multitask Learning as Question Answering
ex)Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
1.A promising source of diverse and nearly unlimited text is web scrape such as common crawl
2.Preprocess
2-1)Byte pair encoding (BPE)
2-2)Minimal fragmentation of words across multiple vocab tokens
Use conversation question answering dataset(CoQA)
• Achieved 55 F1 score, exceeding the performance 3 out of 4 baselines
without labeled dataset
• Fine-tuned BERT achieved 89 F1 performance
GPT-3
1.Scaling up language models greatly improves task-agnostic, few-shot performance
2.An autoregressive language model with 175 billion parameters in the few-shot setting
3.96 Attention layers, Batch size of 3.2M
4.Zero-shot performance improves steadily with model size
5.Few-shot performance increases more rapidly
진행 x
진행 X
진행 X
2주간 같이 한 팀원들과 각자 부스트캠프 중 및 이후 계획에 대해 서로 얘기를 나누는 시간을 가졌습니다. 다들 좋으신 분들인데 2주밖에 같이 하지 못 한 것이 참 아쉽습니다. 다만 그중 많은 분들이 다음 기수도 같이 하게 되었습니다. 추석연휴기간 동안 2주간의 u-stage 내용을 복습하며 한동안 쓰지 않았던 git/vs code 등을 다루려 합니다.