AI Tech Day 26 (RNN, LSTM, and GRU)

이연걸·2021년 9월 7일

네이버 커넥트 부스트캠프 AI Tech

AI Tech Week 6 - NLP 이론 1

목록 보기

2/5

1. 학습 일정

1) 강의 수강
2) 과제
3) 피어 세션

2. 학습 내용

NLP

3강: Recurrent Neural Network and Language Modeling

RNN 기본 구조

Unrolled version

rolled version

RNN의 hidden state를 계산하는 방법

$h_{t-1}$ : 지난 hidden state 벡터

$x_t$ : t일 때 input 벡터

$h_t$ : 새로운 hidden state 벡터

$f_W$ : $W$ 로 이루어진 RNN 함수

$y_t$ : t일 때 output 벡터

$h_t = f_W(h_{t-1}, x_t)$

중요한 점은 매번 같은 함수와 같은 파라미터 셋이 사용된다는 점이다.

$h_t = f_W(h_{t-1}, x_t)$ -> $h_t = tanh(W_{hh}h_{t-1}, W_{xh}x_t)$
$y_t = W_{hy}h_t$

Types of RNNs

One-to-one: Standard Neural Networks

One-to-many: Image Captioning

Many-to-one: Sentiment Classification

Many-to-many (Sequence-to-sequence): Machine Translation

Many-to-many (Sequence-to-sequence): Video classification on frame level (해당 씬이 전쟁인지? 예측)

Character-level Language Model

Example of training sequence "hello"

Vocabulary: [h, e, l, o]

$h_{t}=\tanh \left(W_{h h} h_{t-1}+W_{x h} x_{t}+b\right)$

$\text{Logit} = W_{h y} h_{t}+b$

At test-time, sample characters one at a time, feed back to model

Backpropagation through time (BPTT)

loss 계산을 위해 Forward
gradient 계산을 위해 backward
전체를 진행하기 보다 chunk로 나누어서 진행한다.
forward를 계속 진행하되, 일부분만 backpropagation할 수 있다.

Vanishing/Exploding Gradient Problem in RNN

RNN에서는 역전파가 진행되는 동안 계속 같은 행렬이 곱해진다. 따라서 gradient vanishing 혹은 exploding 문제가 생긴다.
다음은 그에 대한 간단한 예시이다.

4강: LSTM and GRU

Long Short-Term Memory (LSTM)

long term dependency와 vanishing / exploding 등을 해결하기위해 등장함.
중요 아이디어: cell state 정보를 변형없이 건네주는 것!
- long-term dependency를 해결하기 위해

구조

i: input gate, Whether to write to cell

$i_{t}=\sigma\left(W_{i} \cdot\left[h_{t-1}, x_{t}\right]+b_{i}\right)$

$\widetilde{C}_{t}=\tanh \left(W_{C} \cdot\left[h_{t-1}, x_{t}\right]+b_{C}\right)$

$C_{t}=f_{t} \cdot C_{t-1}+i_{t} \cdot \widetilde{C}_{t}$

f: Forget gate, Whether to erase cell

$f_{t}=\sigma\left(W_{f} \cdot\left[h_{t-1}, x_{t}\right]+b_{f}\right)$

o: Output gate, How much to reveal cell

$o_{t}=\sigma\left(W_{o}\left[h_{t-1}, x_{t}\right]+b_{o}\right)$

$h_{t}=o_{t} \cdot \tanh \left(C_{t}\right)$

g: Gate gate, How much to write to cell

Gated Recurrent Unit (GRU)

What is GRU?
- $z_{t}=\sigma\left(W_{z} \cdot\left[h_{t-1}, x_{t}\right]\right)$
- $r_{t}=\sigma\left(W_{r} \cdot\left[h_{t-1}, x_{t}\right]\right)$
- $\widetilde{h}_{t}=\tanh \left(W \cdot\left[r_{t} \cdot h_{t-1}, x_{t}\right]\right)$
- $h_{t}=\left(1-z_{t}\right) \cdot h_{t-1}+z_{t} \cdot \tilde{h_{t}}$
- c.f) $C_{t}=f_{t} \cdot C_{t-1}+i_{t} \cdot \widetilde{C}_{t}$ in LSTM

특징
1. $C_t, h_t \rightarrow h_t$
2. 2개 게이트 $\rightarrow$ 1개 게이트 (계산, 메모리 $\downarrow$ )

LSTM/GRU의 Backpropagation

정보가 더해지는 방식으로 explode/vanish가 사라져서 long term dependency도 해결

3. 피어 세션 정리

특이사항 없음

4. 과제 수행 과정

필수 과제 해결
시간날 때 코드 뜯어보는 게 좋아보임

5. 회고

좋지도 나쁘지도 않았던 하루.
슬슬 코테 준비해야할 듯 한데.. 내일부터는 해야겠다.

6. 내일 할일

강의 수강
시각화 강의 수강
코테 준비

이연걸

AI가 세상을 바꾼다. 열심히 AI를 배워서 선한 영향력을 펼치는 개발자가 되고싶다. 인생은 Gradient Descent와 같지.

AI Tech Day 26 (RNN, LSTM, and GRU)

AI Tech Week 6 - NLP 이론 1

1. 학습 일정

2. 학습 내용

NLP

3강: Recurrent Neural Network and Language Modeling

Character-level Language Model

Backpropagation through time (BPTT)

Vanishing/Exploding Gradient Problem in RNN

4강: LSTM and GRU

Long Short-Term Memory (LSTM)

Gated Recurrent Unit (GRU)

LSTM/GRU의 Backpropagation

3. 피어 세션 정리

4. 과제 수행 과정

5. 회고

6. 내일 할일

AI Tech Day 25 (Bag of Words & Word Embedding)

AI Tech Day 27 (Seq2seq with Attention, Beam Search and BLEU Score)

0개의 댓글