RNN (Recurrent Neural Network) πŸ”

JhyuneeΒ·2024λ…„ 2μ›” 7일
0

NLP

λͺ©λ‘ 보기
1/2
post-thumbnail

πŸ”— reference link

Summary !

πŸ’­ (μž…λ ₯κ°’, μ§μ „κΉŒμ§€ κ³„μ‚°λœ νžˆλ“ λ²‘ν„°)λ₯Ό 기반으둜,
κ°œλ³„ λ˜λŠ” 순차 데이터λ₯Ό μ²˜λ¦¬ν•˜μ—¬ μ˜ˆμΈ‘μ„ μˆ˜ν–‰ν•˜λŠ” "μž¬κ·€μ " λͺ¨λΈ

Sequence data Recursive

Β 

1. RNN model structure πŸ”’

β€œThe same function and set of parameters are used at every time step.”

  • 각 λ‹¨κ³„λ§ˆλ‹€ λ™μΌν•œ λͺ¨λ“ˆ Aκ°€ μž¬κ·€μ μœΌλ‘œ 호좜되며, 이전 μŠ€ν…μ˜ 좜λ ₯이 λ‹€μŒ μŠ€ν…μ˜ μž…λ ₯으둜 μ‚¬μš©λœλ‹€.
    • N-1 step's output β†’ N step's input

Β 

RNN Function & Parameters

RNN의 κΈ°λ³Έ ν•¨μˆ˜λŠ” λ‹€μŒκ³Ό κ°™μœΌλ©°, 이λ₯Ό 톡해 맀 μŠ€ν…λ§ˆλ‹€ hidden state hhκ°€ κ°±μ‹ λœλ‹€.

ht=fw(htβˆ’1,xt)h_t = f_w(h_{t - 1}, x_t)

xtx_t : ν˜„μž¬ μŠ€ν…(tt)μ—μ„œμ˜ Input data
htβˆ’1h_{t-1} : 직전 μŠ€ν…(tβˆ’1t - 1)κΉŒμ§€ κ³„μ‚°λœ Hidden-state vector
hth_t : Hidden-state vector output at tt

  • fwf_w : RNN function, ttμ—μ„œμ˜ Hidden state vector (hth_t)λ₯Ό 좜λ ₯ν•œλ‹€.

    • ww : Linear transformation matrix β†’ μ„ ν˜• λ³€ν™˜ μˆ˜ν–‰
  • yty_t : Output vector at tt, based on hth_t

Β 

Hidden state vector (hth_t) μ—°μ‚°

ht=fW(htβˆ’1,xt)h_t = f_W(h_{t-1}, x_t)

κ°€μ • :

νƒ€μž„ tt , μž…λ ₯ 벑터 xtx_t와 이전 νƒ€μž„μŠ€ν…μ˜ 좜λ ₯ 벑터 htβˆ’1h_{t-1}의 크기λ₯Ό 각각

  • xtΒ [3Β XΒ 1],Β htβˆ’1Β [2Β XΒ 1]x_t \ [3 \ X \ 1],\ h_{t-1} \ [2 \ X \ 1] 라고 κ°€μ •ν•˜λ©΄,

  • FC layer의 μ„ ν˜• λ³€ν™˜ matrix WλŠ”, WΒ [2Β XΒ 5]W \ [2 \ X \ 5] μž„μ„ μ•Œ 수 μžˆλ‹€.
    (hth_t의 크기 μ—­μ‹œ htβˆ’1h_{t-1}와 λ™μΌν•˜κ²Œ μœ μ§€λ˜μ–΄μ•Ό ν•˜κΈ° λ•Œλ¬Έ)

ht=tanh(Whhhtβˆ’1Β +Β Wxhxt)yt=Whyhth_t = tanh(W_{hh}h_{t-1}\ + \ W_{xh}x_t) \\ y_t = W_{hy}h_t

μ—°μ‚° κ³Όμ • :

xt,htβˆ’1x_t, h_{t-1}은 WW와 내적을 κ±°μΉœλ‹€. μ΄λ•Œ,

  • Matrix WW의 쒌츑 Wxh[2Β XΒ 3]W_{xh}[2 \ X \ 3]은, xtx_t λ²‘ν„°μ™€λ§Œ κ³„μ‚°λœλ‹€.

  • λ§ˆμ°¬κ°€μ§€λ‘œ, Matrix WW의 우츑 Whh[2Β XΒ 2]W_{hh}[2 \ X \ 2]은, htβˆ’1h_{t-1} λ²‘ν„°μ™€λ§Œ κ³„μ‚°λœλ‹€.

    즉, Wxh:xtβ†’ht,Β Whh:htβˆ’1β†’htW_{xh} : x_t \rightarrow h_t, \ W_{hh} : h_{t-1} \rightarrow h_t으둜, λ³€ν™˜μ„ μˆ˜ν–‰ν•œλ‹€.
    이후 task에 따라 좔가적인 Output layer의 μ„ ν˜• λ³€ν™˜ Matrix WhyW_{hy} 와 κ³±ν•˜μ—¬ - μ΅œμ’… Ouput yty_tκ°€ 좜λ ₯λœλ‹€.

    • λ§ˆμ°¬κ°€μ§€λ‘œ, Why:htβ†’ytW_{hy} : h_t \rightarrow y_t 의 λ³€ν™˜μ„ μˆ˜ν–‰ν•œλ‹€.
  • yty_t의 ν™œμš©
    • Binary classification : yty_t (1차원, μŠ€μΉΌλΌκ°’) β†’ sigmoid func β†’ ν™•λ₯ κ°’
    • Multi-class classification : yty_t (class 수 만큼의 차원을 κ°€μ§€λŠ” 벑터) β†’ softmax β†’ ν™•λ₯ λΆ„포

Β 

Β 

2. λ‹€μ–‘ν•œ RNN models ➰

Many or One ?
: νƒœμŠ€ν¬μ™€ 데이터에 λ§žλŠ” ꡬ쑰둜 μ‘°ν•©ν•˜μ—¬ μ΄μš©ν•œλ‹€.

Many <- Sequential data

< 5 differenct models from RNN >

1. One to One

  • μž…λ ₯, 좜λ ₯의 νƒ€μž„μŠ€ν… = 1
    • Sequence data X
    • (ν‚€, λͺΈλ¬΄κ²Œ, λ‚˜μ΄) β‡’ μ €ν˜ˆμ••, κ³ ν˜ˆμ•• λΆ„λ₯˜

2. One to Many

  • μž…λ ₯만 ν•˜λ‚˜μ˜ νƒ€μž„μŠ€ν…
    • λ‹€μŒ μŠ€ν…μ˜ μž…λ ₯이 μ—†λŠ” 경우, 동일 크기의 0 값을 input
    • Image captioning task (1 img β‡’ sequential words)

3. Many to One

  • 좜λ ₯만 ν•˜λ‚˜μ˜ νƒ€μž…μŠ€ν…
    • 맀 νƒ€μž„μŠ€ν…μ— Sequential data input, λ§ˆμ§€λ§‰ μŠ€ν…μ— output
    • Sentiment classification (sequential words β‡’ pos / neg)

4. Many to Many (1)

  • μž…λ ₯, 좜λ ₯ 데이터가 λͺ¨λ‘ 연속적
    • μž…λ ₯ 데이터λ₯Ό λͺ¨λ‘ 받은 ν›„, 좜λ ₯ 데이터 생성
    • Machine translation

5. **Many** to **Many** (2)

  • μž…λ ₯, 좜λ ₯ 데이터가 λͺ¨λ‘ 연속적
    • μž…λ ₯이 μ£Όμ–΄μ§€λŠ” 맀 νƒ€μž„μŠ€ν…λ§ˆλ‹€ 예츑 좜λ ₯을 μˆ˜ν–‰
    • μ‹€μ‹œκ°„μ„± μš”κ΅¬ task : video classification - frame level
    • 단어별 λ¬Έμž₯μ„±λΆ„ 예츑!
profile
μ’‹μ•„ν•˜λŠ” 것 λ§Žμ€ μ‚¬λžŒ

0개의 λŒ“κΈ€