- Our results strongly suggest that larger models will continue to perform better, and will also be much more sample efficient than has been previously appreciated.
- Big models may be more important than big data.
Larger models
will continue to perform better
Empirical scaling laws
for language model performanceN, D, C
의 3개 factor 관점에서 수식화하여 정리parameters N
, size of dataset D
, amount of compute C
cross-entropy loss
에 대한 performance 를 의미함📌 [Summary]
- These results show that language modeling performance improves smoothly and predictably as we appropriately scale up model size, data, and compute.
- We expect that larger language models will perform better and be more sample efficient than current models.