๐Ÿ”ฅ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ - seq2seq

esc247ยท2023๋…„ 9์›” 9์ผ
0

AI

๋ชฉ๋ก ๋ณด๊ธฐ
21/22

  • LM

    • ๋ฌธ์žฅ์— ํ™•๋ฅ  ๋ถ€์—ฌํ•˜๋Š” ๋ชจ๋ธ
      โ†’ ํŠน์ • ์ƒํ™ฉ์—์„œ ์ ์ ˆํ•œ ๋ฌธ์žฅ ๋‹จ์–ด ์˜ˆ์ธก ๊ฐ€๋Šฅ.
    • ํ•˜๋‚˜์˜ ๋ฌธ์žฅ์€ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋‹จ์–ด๋กœ ๊ตฌ์„ฑ
    • joint probability
  • ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ Context Vector ์‚ฌ์šฉ

  • Context Vecotr๋กœ๋ถ€ํ„ฐ ๋””์ฝ”๋”๊ฐ€ ๋ฒˆ์—ญ ๊ฒฐ๊ณผ ์ถ”๋ก 

  • ๊ธด ๋ฌธ์žฅ ์ฒ˜๋ฆฌ ์šฉ์ด

  • ์ธ์ฝ”๋” ๋งˆ์ง€๋ง‰ Hidden State๋งŒ์„ Context Vector ํ™œ์šฉ

  • ์ธ์ฝ”๋” ๋””์ฝ”๋” โ†’ ์„œ๋กœ ๋‹ค๋ฅธ Parameters(Weight)

  • ์‹œ์ž‘ sos, ๋ eos

  • rnn ๋Œ€์‹  LSTM

  • ์ž…๋ ฅ ๋ฌธ์žฅ ๊ฑฐ๊พธ๋กœ ํ–ˆ์„ ๋•Œ ๋” ๋†’์€ ์ •ํ™•๋„

    • ์•ž์ชฝ์— ์œ„์น˜ํ•œ ๋‹จ์–ด๋ผ๋ฆฌ ์—ฐ๊ด€์„ฑ ๋†’๊ธฐ ๋•Œ๋ฌธ
  • ๋ณธ ๋…ผ๋ฌธ ์ดํ›„์—” ์ž…๋ ฅ ์‹œํ€€์Šค ์ „์ฒด์—์„œ ์ •๋ณด ์ถ”์ถœ


Abstract

  • Sequence Learning์— end to end approach ์ œ์•ˆ
  • LSTM 4๋ฒˆ ์Œ“์Œ
  • ๋ฌธ์žฅ ๊ฑฐ๊พธ๋กœ โ†’ ์„ฑ๋Šฅ ํ–ฅ์ƒ
    • ํ•™์Šต ๋‚œ์ด๋„ ๋‚ฎ์ถฐ์ค˜์„œ

Introduction

  • ํ†ต๊ณ„์  ๋ชจ๋ธ๋ณด๋‹ค Neural Network์ด ๋ณต์žกํ•˜์ง€๋งŒ ์ข‹์€ ์„ฑ๋Šฅ
  • ๋‹จ ์ด๋Š” ์ž…์ถœ๋ ฅ ์ฐจ์› ๊ณ ์ •๋œ ๊ฒฝ์šฐ ๋งŽ๋‹คโ†’ Sequential Problem์—์„œ ํ•œ๊ณ„
    • ๋ฏธ๋ฆฌ ๋ฌธ์žฅ์˜ ๊ธธ์ด๋ฅผ ์•Œ์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ
    • speech recognition and machine translation
  • LSTM์œผ๋กœ Input Sequence์—์„œ Context Vector(Large fixed- dimensional vector representation) ์ถ”์ถœ ํ›„ Decoder์˜ ๋˜ ๋‹ค๋ฅธ LSTM์œผ๋กœ ์ถœ๋ ฅ ์‹œํ€€์Šค ๋ฝ‘์•„ โ‡’ Entire Input Sequence โ†’ Vector
  • LSTM + SMT โ†’ ์„ฑ๋Šฅ ํ–ฅ์ƒ
  • ๋ฌธ์žฅ์˜ ์ˆœ์„œ ๋’ค์ง‘์Œ โ†’ Very Long Sequence์— ์˜ํ–ฅX
  • {๊ฐ€๋ณ€ ๊ธธ์ด ์ž…๋ ฅ ์‹œํ€€์Šค โ†’ ๊ณ ์ • ํฌ๊ธฐ ๋ฒกํ„ฐ}๋ฅผ ํ•™์Šต

The Model

3๊ฐ€์ง€ ํŠน์ง•

  1. LSTM 2๊ฐœ
    1. ์ž…๋ ฅ ์‹œํ€€์Šค, ์ถœ๋ ฅ ์‹œํ€€์Šค
  2. 4 layered LSTM
    1. Shallow < Deep
  3. ์ž…๋ ฅ ๋ฌธ์žฅ ๊ฑฐ๊พธ๋กœ

Experiments

  • ์˜์–ด โ†’ ๋ถˆ์–ด ๋ฒˆ์—ญ task

Conclusion

  • Deep LSTM > SMT
profile
๋ง‰์ƒ ํ•˜๋ฉด ๋ชจ๋ฅด๋‹ˆ๊นŒ ์ผ๋‹จ ํ•˜์ž.

0๊ฐœ์˜ ๋Œ“๊ธ€