ex) SVM 학습데이터: example1.tar.gz -- 로이터 뉴스기사
Deep neural network를 이용한 차원축소
-입력벡터 -> 은닉층 -> 출력 벡터
Deep neural network --> Deep learning (computing power, GPU, Memory가 증가하면서 다층 layer 구성이 가능해짐)

"You shall know a word by the company it keeps", J.R. Firth (1957)
"Efficient Estimation of Word Representations in Vector Space"
단어 벡터의 차원수: 모든 단어
'Word2vec': CBOW, skip-gram = 'PageRank algorithm' 과 매우 유사함

-학습속도 빠름
-대규모 학습말뭉치에 적합
-Skipgram: n-gram with a gap(현재 단어)
-빈도가 높은 단어 vector를 잘 표현함
-학습 속도가 느림
Python Library: gensim,Glove
visualizaion: scikit-learn t-SNE
Word2vec
Glove
FastText
ELMo
BERT: ALBERT,RoBERTa
XLNet
ELECTRA
GPT-2
단어의 출현 문맥(좌,우)을 반영해야 더 정교한 vector를 만들 수 있다.
Evaluation(MRC): SQuAD, RACE, GLUE, etc
Feature-based approach: ELMO (Peters et al, 2018a)
-pre-trained representations as additional features
Fine-tuning approach: OpenAI GPT (Radford et al, 2018) - 단방향
-introduces minimal task-specific parameters
-simply fine-tuning all pretrained parameters
Two approaches use unidirectional language models to learn general language representations
BERT: focused on bidirectional pre-training - 양방향
masked language model (MLM)
-randomly masks some of the tokens from the input and predicts the original vocabulary
next sentence prediction
-jointly pretrains text-pair representations




Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Soace. In proceedings of Workshop at ICLR, 2014
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPs, 2013
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013
GLoVE: Global Vectors for Word Representation
XLNet: Generalized Autoregressive Pretraining for Language Understanding
SQuAD
GLUE: A Multi-Task Benchmark and analysis Platform for Natural Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (NAACL, 2019)
워드임베딩 개요 - 강승식 교수