CS224N Lecture 1

진수·2024년 1월 14일
0

CS224N

목록 보기
2/5

WordNet(dictionary by human)

Common NLP solution(과거에 사용하던 것들이다)

synonym sets와 hypernyms(상위어)들을 구분해 두었다.

cons :

  1. need lots of human resource
  2. lack of nuances
  3. missing new meanings of words(keep update is impossible)
  4. Can’t compute accurate word similarity

Representation words as discrete symbols(as one-hot vector)

cons :

  1. Vector dimension = number of words in vocabulary( 500,000 dim)
  2. Can’t represent similarity with words

Solution

learn to encode similarity in the vectors themselves

Representing words by their context

Use Distribunal semantics

Distributional semantics : Word’s meaning is given by the words that frequently appear close-by

if word ww appears in a text, its context is the set of words that appear nearby(within a fixed-size window)

Word Vector

representin words as n-dimensional vector(not one-hot)

word vector also called word embeddings

Word2vec

Word2vec : framework for learning word vectors

Idea

  1. Corpuses(문단, 문장 등등) are consisted of lots of texts
  2. Every word in a fixed vocabulary is represented by a vector
  3. Go through each position t in the text, which has a center word c and context (“outside”) words o
  4. Use the similarity of the word vectors for c and o to calculate the probability of o given c (or vice versa)
    • 주어진 center word에 대해서 outside word인 “o”가 등장할 확률을 similarity of word vectors(c &o)를 이용해 구한다. (o를 통해 c를 구하기도 함)
  5. Keep adjusting the word vectors to maximize this probability

objective function

For each position t=1,,Tt = 1, … , T, predict context words within a
window of fixed size mm, given center word wjw_j.

Likelihood = L(θ)=t=1Tj=mj0mP(wt+jwt;θ)L(\theta) = \prod_{t=1}^{T} \prod_{\substack{j=-m \\ j \neq 0}}^{m} P(w_{t+j} | w_t; \theta)

이를 통해

Minimizing objective function iffiff Maximizing predictive accuracy

Q : How to calculate P(wt+jwt;θ)P(w_{t+j}|w_t;\theta )?

A : use two vectors per word ww

  • vwv_w : w is center word
  • uwu_w : w is context word

→ If center word “c” and context word is “o”

P(oc)=exp(uoTvc)wVexp(uwTvc)P(o|c) = \frac{exp(u_o^T v_c)}{\sum_{w\in V} exp(u_w^T v_c)}

  • u에 T를 붙이는 이유 : u와 v는 똑같은 dim의 column vector이다. 이를 similarity로 표현하고 싶으면 u를 전치한 후에 내적을 통해 similarity를 구해야한다.

To trian the model : Optimize value of parameters to minize loss

2dV = each word has two vectors(v,u) and word vectors have size of V

profile
Hey! 모두들 안녕!

0개의 댓글

관련 채용 정보