U-stage day 25

사공진·2021년 9월 15일

AI tech 2기

목록 보기

15/23

1.강의 내용

[NLP]Intro to NLP, Bag-of-Words

1.Intro to Natural Language Processing(NLP)

Goal of This Course

Natural language processing (NLP), which aims at properly understanding and generating human languages, emerges as a crucial application of artificial intelligence, with the advancements of deep neural networks.
This course will cover various deep learning approaches as well as their applications such as language modeling, machine translation, question answering, document classification, and dialog systems.

Academic Disciplines related to NLP

Natural language processing (major conferences: ACL, EMNLP, NAACL)
Includes state-of-the-art deep learning-based models and tasks
Low-level parsing
Tokenization, stemming
Word and phrase level
Named entity recognition(NER), part-of-speech(POS) tagging, noun-phrase chunking, dependency parsing, coreference resolution
Sentence level
Sentiment analysis, machine translation
Multi-sentence and paragraph level
Entailment prediction, question answering, dialog systems, summarization
Text mining(major conferences: KDD, The WebConf (formerly, WWW), WSDM, CIKM, ICWSM)
Extract useful information and insights from text and document data
Document clustering (e.g., topic modeling)
Highly related to computational social science
Information retrieval (major conferences: SIGIR, WSDM, CIKM, RecSys)
Highly related to computational social science
This area is not actively studied now
It has evolved into a recommendation system, which is still an active area of research

Trends of NLP

Text data can basically be viewed as a sequence of words, and each word can be represented as a vector through a technique such as Word2Vec or GloVe.
RNN-family models (LSTMs and GRUs), which take the sequence of these vectors of words as input, are the main architecture of NLP tasks.
Overall performance of NLP tasks has been improved since attention modules and Transformer models, which replaced RNNs with self-attention, have been introduced a few years ago.
As is the case for Transformer models, most of the advanced NLP models have been originally developed for improving machine translation tasks.
In the early days, customized models for different NLP tasks had developed separately.
Since Transformer was introduced, huge models were released by stacking its basic module, self-attention, and these models are trained with large-sized datasets through language modeling tasks, one of the self-supervised training setting that does not require additional labels for a particular task.
Afterwards, above models were applied to other tasks through transfer learning, and they outperformed all other customized models in each task.
Currently, these models has now become essential part in numerous NLP tasks, so NLP research become difficult with limited GPU resources, since they are too large to train.

2.Bag of words

Bag-of-Words Representation

Step 1. Constructing the vocabulary containing unique words
Example sentences: “John really really loves this movie“, “Jane really likes this song”
Vocabulary:{“John“,“really“,“loves“,“this“,“movie“,“Jane“,“likes“,“song”}
Step 2. Encoding unique words to one-hot vectors
Vocabulary:{“John“,“really“,“loves“,“this“,“movie“,“Jane“,“likes“,“song”}
John: [1 0 0 0 0 0 0 0]
really: [0 1 0 0 0 0 0 0]
loves: [0 0 1 0 0 0 0 0]
this: [0 0 0 1 0 0 0 0]
movie: [0 0 0 0 1 0 0 0]
Jane: [0 0 0 0 0 1 0 0]
likes: [0 0 0 0 0 0 1 0]
song: [0 0 0 0 0 0 0 1]
For any pair of words, the distance is 2^0.5
For any pair of words, cosine similarity is 0
A sentence/document can be represented as the sum of one-hot vectors
Sentence 1: “John really really loves this movie“
John + really + really + loves + this + movie: [1 2 1 1 1 0 0 0]
Sentence 2: “Jane really likes this song”
Jane + really + likes + this + song: [0 1 0 1 0 1 1 1]

NaiveBayes Classifier for Document Classification

Bayes’ Rule Applied to Documents and Classes

For a document d, which consists of a sequence of words w, and a class c
The probability of a document can be represented by multiplying the probability of each word appearing

2.과제 수행 과정/결과물 정리

진행 중

3.피어 세션

학습 내용 공유

1.과제 코드 리뷰

Q)나이브베이즈 분류기에서 클래스 각 클래스의 확률을 곱해주는데 데이터 불균형이면 학습이 잘 안되겠죠?

A)train data가 충분치 않은 경우, doc에 있는 단어가 train data에 없을 수 있으니 이 경우 Laplace Smoothing 기법을 쓴다고 합니다. (indep 조건이 있어서 각 단어별로 확률 값 구할 때 한 단어라도 train data에 없으면 분자가 0이라 전체 확률 곱이 0이 되니 분자에 1씩 더해준다고 하네요)

laplace smoothing은 곱셈에 취약한 0을 없애는 방법!

Q)Glove Model에서 함수 f는 어떠한 함수일까요?

A)기존 loss fn에서는 단어 빈도수에 맞게 penalty를 주지 못 해서 weight fn을 통해 너무 자주 등장하는 단어에 큰 가중치를 주지 않되, 자주 등장하지 않는 단어에 대해서도 penalty를 주는 최종적인 형태의 loss fn인 거 같습니다.

GloVe