language-model

1.[Transformer] Attention Is All You Need

post-thumbnail

2.[GPT] Improving Language Understanding by Generative Pre-Training

post-thumbnail

3.BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

post-thumbnail

4.RoBERTa: A Robustly Optimized BERT Pretraining Approach

post-thumbnail

5.BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

post-thumbnail

6.[T0] Multitask Prompted Training Enables Zero-Shot Task Generalization

post-thumbnail

7.LLaMA: Open and Efficient Foundation Language Models

post-thumbnail

8.Sparks of Artificial General Intelligence: Early experiments with GPT-4

post-thumbnail

9.OPT: Open Pre-trained Transformer Language Models

post-thumbnail

10.Self-Attention with Relative Position Representations

post-thumbnail

11.Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

post-thumbnail

12.KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

post-thumbnail

13.FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

post-thumbnail

14.LLM 사전학습 강의 by Upstage - 0. Introduction

post-thumbnail

15.LLM 사전학습 강의 by Upstage - 1. 왜 사전학습이 필요한가?

post-thumbnail

16.LLM 사전학습 강의 by Upstage - 2. 데이터 준비

post-thumbnail