[Week 4] RNN (nlp)

ํ˜œ ์ฝฉยท2022๋…„ 10์›” 13์ผ
0
post-thumbnail

๐Ÿšฉ RNN

  • ํ˜„์žฌ ์‹œ์ ์˜ ์ž…๋ ฅ ๋ฒกํ„ฐ xtx_t์™€ ์ด์ „ ์‹œ์ ์˜ ํžˆ๋“  ๋ฒกํ„ฐ htโˆ’1h_{t-1} ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์„œ, RNN์˜ WW ํŒŒ๋ผ๋ฏธํ„ฐ ํ•จ์ˆ˜(fwf_w)์— ์ ์šฉ ---> ์ถœ๋ ฅ: ํ˜„์žฌ ์‹œ์  ํžˆ๋“  ๋ฒกํ„ฐ hth_t
    < ht=fw(htโˆ’1,xt)h_t = f_w(h_{t-1}, x_t) >
  • ํŠน์ • ์‹œ์ ์˜ output ๋ฒกํ„ฐ yty_t๋Š” ๋ฌธ์žฅ์˜ ๊ฐ์ • ๋ถ„์„์„ ํ•˜๋Š” ๊ฒฝ์šฐ,
    ๋งˆ์ง€๋ง‰ ์‹œ์ ์—์„œ ๊ฐ์ • ์˜ˆ์ธก์„ ํ•˜๊ฒŒ ๋œ๋‹ค. (yty_t)
  • RNN์˜ fwf_w๋Š” ๋ชจ๋“  ์‹œ์ ์—์„œ ๊ฐ™์€ function๊ณผ parameters๋“ค์„ ๊ณต์œ ํ•œ๋‹ค.



๐Ÿ Types of RNN

๐Ÿšฉ Character-level Language Model

์ž…๋ ฅ training sequence: hello
Vocabulary: [h,e, l, o] --> 4์ฐจ์› ์›-ํ•ซ ๋ฒกํ„ฐ ํ‘œํ˜„ ๊ฐ€๋Šฅ

์ฒซ time step์—์„œ h๊ฐ€ ์ฃผ์–ด์ง€๋ฉด e๋ฅผ ์˜ˆ์ธกํ•ด์•ผ ํ•˜๊ณ ,
2๋ฒˆ์งธ time step์—์„œ h์™€ e๊ฐ€ ์ฃผ์–ด์ง€๋ฉด l๋ฅผ ์˜ˆ์ธก
ht=tanh(Whhโ‹…htโˆ’1+Wxhโ‹…xt+b)h_t = tanh(W_{hh}ยทh_{t-1} + W_{xh}ยทx_t + b)
hth_t์™€ htโˆ’1h_{t-1}์˜ ์ฐจ์›์„ 3์ด๋ผ๊ณ  ๊ฐ€์ •. (๋Œ€๋ถ€๋ถ„ 1์”ฉ ์ค„์—ฌ์„œ ํ•œ๋‹ค - ๋‹จ์–ด 2๊ฐœ์”ฉ ์กฐํ•ฉ )

  • ์ฒซ๋ฒˆ์งธ hidden layer (h0h_0)๋Š” ์ด์ „์˜ time step์ด ์—†์œผ๋ฏ€๋กœ
    (์ด์ „ hidden) =[0,0,0] ์œผ๋กœ default ์ค€๋‹ค.

  • ๊ฐ hth_t์—์„œ ํ–‰๋ ฌ WhhW_{hh}๊ฐ€ ๊ด€์—ฌํ•˜๊ณ  xtx_t์—์„œ๋Š” WxhW_{xh}๊ฐ€ ๊ด€์—ฌํ•œ๋‹ค.

  • output์„ ๋‚ด๊ธฐ ์œ„ํ•ด์„œ ๊ฐ time step์˜ hidden layer์— WhyW_{hy} ๋ฅผ ์ ์šฉ
    Logit=Whyโ‹…ht+bLogit = W_{hy}ยทh_t + b

  • output layer์˜ ๋…ธ๋“œ ์ˆ˜๋Š” vocabulary(์‚ฌ์ „)์˜ ํฌ๊ธฐ์™€ ๊ฐ™๋‹ค. (4์ฐจ์›)
    softmax๋ฅผ ์ทจํ•ด์„œ ๋‹ค์Œ ๋‹จ์–ด๊ฐ€ ๋ฌด์—‡์ธ์ง€ ์˜ˆ์ธกํ•ด ์›-ํ•ซ ๋ฒกํ„ฐ๋กœ ํ‘œ๊ธฐํ•ด์•ผํ•˜๋‹ˆ๊นŒ!
    1๋ฒˆ์งธ output layer [h: 1.0, e: 2.2, l:-3.0, o: 4.1] --> o๋กœ ์˜ˆ์ธก
    ํ•˜์ง€๋งŒ, target chars๋Š” e์ด๋ฏ€๋กœ ๋‘๋ฒˆ์งธ ํ™•๋ฅ ์„ ๋†’์ด๋„๋ก ํ•™์Šตํ•ด์•ผํ•œ๋‹ค.

  • Wxh,Whh,WhyW_{xh}, W_{hh}, W_{hy} ๊ฐ€ ์—ญ์ „ํŒŒ๋ฅผ ํ†ตํ•ด ๊ฐ’์ด ์—…๋ฐ์ดํŠธ

  • ํ˜„ time step์˜ ์˜ˆ์ธก๊ฐ’์„ ๋‹ค์Œ time step์˜ input์œผ๋กœ ์žฌ์‚ฌ์šฉ

    ๊ธด ๋ฌธ์žฅ๋“ค ๊ฐ™์€ ๊ฒฝ์šฐ์—, ๋‹จ์–ด 1byte ํ•˜๋‚˜ํ•˜๋‚˜ vocabulary๋กœ ๋“ค์–ด๊ฐ€๊ณ  ๊ณต๋ฐฑ ๋˜๋Š” . , ๋„ ํŠน์ˆ˜ ๋ฌธ์ž๋กœ ๋“ค์–ด๊ฐ€ 1 dimension์„ ์ฐจ์ง€ํ•˜๊ฒŒ ๋œ๋‹ค.

๐Ÿ ์—ญ์ „ํŒŒ (BPTT)

Backpropagation through time = BPTT

  • sequence data๋‹ค ๋ณด๋‹ˆ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์ง€๋ฉด ํ•œ๊บผ๋ฒˆ์— ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด๊ฐ€ ํ•œ์ •์ ์ด๋ผ ์ •๋ณด๊ฐ€ ๋‹ค ๋‹ด๊ธฐ์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ ๋ฐœ์ƒ

    ---> ๊ตฐ๋ฐ๊ตฐ๋ฐ ์ž˜๋ผ์„œ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด truncation ์‚ฌ์šฉ
  • ์ œํ•œํ•œ ๊ธธ์ด์˜ ์‹œํ€€์Šค ๋ฐ์ดํ„ฐ ์ง‘๋‹จ์„ ์ˆœ์„œ๋Œ€๋กœ forward, BPTT ์ง„ํ–‰
profile
๋ฐฐ์šฐ๊ณ  ์‹ถ์€๊ฒŒ ๋งŽ์€ ๊ฐœ๋ฐœ์ž๐Ÿ“š

0๊ฐœ์˜ ๋Œ“๊ธ€