CS224N Lecture 9

Sunmin Lee·2023년 4월 5일

building block을 바꾸고 싶다

같은 목적을 가지고 있지만 다른 특징을 가지고있는 여러 building block을 살펴보자

Word window
: aggregate(집합하다) local context

word window classifiers -> use to represent information about center word
1D convolution(later!)
sequence length만큼 길어지지 않음
word window classifier -> build a representation of the word take into account(고려하다) its local context
local window를 독립적으로 보고 word h를 연산할 수 있음
layer를 쌓을 수 있음(LSTM encoder처럼 보임)
미래 정보를 못보게 window를 자르면 language model의 decoder와 같음
그러나 여전히 long-distance dependency를 가지고 있음
- maximum interaction distance = sequence length / window size
- window size = local n words
- stack windows without growing window size -> pretty far (그러나 구조적 한계 존재함)

(contextualizer : 맥락화)

Attention

Query : word's representation, try to sort of access information from set of values
block 구조 -> every embedding layer attends to next layer word
(all words attend to all words in previous layer)

Attention equation

eij = scalar values not bounded in(-에 묶여) size