CS224N Lecture 1

진수·2024년 1월 14일

0

CS224N

목록 보기

2/5

WordNet(dictionary by human)

Common NLP solution(과거에 사용하던 것들이다)

synonym sets와 hypernyms(상위어)들을 구분해 두었다.

cons :

need lots of human resource
lack of nuances
missing new meanings of words(keep update is impossible)
Can’t compute accurate word similarity

Representation words as discrete symbols(as one-hot vector)

cons :

Vector dimension = number of words in vocabulary( 500,000 dim)
Can’t represent similarity with words

Solution

learn to encode similarity in the vectors themselves

Representing words by their context

Use Distribunal semantics

Distributional semantics : Word’s meaning is given by the words that frequently appear close-by

if word $w$ appears in a text, its context is the set of words that appear nearby(within a fixed-size window)

Word Vector

representin words as n-dimensional vector(not one-hot)

word vector also called word embeddings

Word2vec

Word2vec : framework for learning word vectors

Idea

Corpuses(문단, 문장 등등) are consisted of lots of texts
Every word in a fixed vocabulary is represented by a vector
Go through each position t in the text, which has a center word c and context (“outside”) words o
Use the similarity of the word vectors for c and o to calculate the probability of o given c (or vice versa)
- 주어진 center word에 대해서 outside word인 “o”가 등장할 확률을 similarity of word vectors(c &o)를 이용해 구한다. (o를 통해 c를 구하기도 함)
Keep adjusting the word vectors to maximize this probability

objective function

For each position $t = 1, … , T$ , predict context words within a
window of fixed size $m$ , given center word $w_j$ .

Likelihood = $L(\theta) = \prod_{t=1}^{T} \prod_{\substack{j=-m \\ j \neq 0}}^{m} P(w_{t+j} | w_t; \theta)$

이를 통해

Minimizing objective function $iff$ Maximizing predictive accuracy

Q : How to calculate $P(w_{t+j}|w_t;\theta )$ ?

A : use two vectors per word $w$

$v_w$ : w is center word
$u_w$ : w is context word

→ If center word “c” and context word is “o”

$P(o|c) = \frac{exp(u_o^T v_c)}{\sum_{w\in V} exp(u_w^T v_c)}$

u에 T를 붙이는 이유 : u와 v는 똑같은 dim의 column vector이다. 이를 similarity로 표현하고 싶으면 u를 전치한 후에 내적을 통해 similarity를 구해야한다.

To trian the model : Optimize value of parameters to minize loss

2dV = each word has two vectors(v,u) and word vectors have size of V

Hey! 모두들 안녕!

이전 포스트

CS224N 1주차 Assignment

다음 포스트

CS224N Lecture 2

0개의 댓글

관련 채용 정보