๐Ÿ”ฅ ๋…ผ๋ฌธ๋ฆฌ๋ทฐ - Attention Is All You Need

esc247ยท2023๋…„ 6์›” 29์ผ
0

AI

๋ชฉ๋ก ๋ณด๊ธฐ
16/22

Keyword

Self-Attention

Scaled Dot-Product Attention

Multi-Head Attention

Positional Encoding


Abstract

  • ์ด์ „ ์‹œํ€€์Šค ๋ณ€ํ™˜ ๋ชจ๋ธ์€ RNN์ด๋‚˜ ์ธ์ฝ”๋”-๋””์ฝ”๋” ํฌํ•จํ•œ CNN ๊ธฐ๋ฐ˜
    • ํ•œ๊ณ„ ๋งŽ์•„ ex) ๋ณ‘๋ ฌํ™” X, Sequence ๊ธธ์–ด์ง€๋ฉด ์ฒ˜๋ฆฌ ํž˜๋“ค์–ด
  • ๋ณธ ๋…ผ๋ฌธ์€ ์˜ค๋กœ์ง€ Attention ๊ธฐ๋ฐ˜
  • ๋ณ‘๋ ฌํ™” ๋ฐ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ๋›ฐ์–ด๋‚จ

Introduction

(Abstract์— ๋‚˜์˜จ ๋‚ด์šฉ ๋ณด์ถฉ ์„ค๋ช…)

  • ๊ธฐ์กด RNN, LSTM, GRU ๋“ฑ์ด Sequence modeling์—์„œ SOTA์˜€์ง€๋งŒ ์•ž์„œ ๋งํ•œ ๋ฌธ์ œ ์žˆ์–ด
    • ๋ณ‘๋ ฌํ™” ์•ˆ๋ผ โ†’ ๊ธด sequence์— ์น˜๋ช…์  โ†’ ๋ฆฌ์†Œ์Šค ๋งŽ์ด ๋“ค์–ด

Background

Self-Attention

  • ํ•˜๋‚˜์˜ ์ž…๋ ฅ Sequence ๋‚ด์—์„œ ๊ฐ ๋‹จ์–ด ๊ฐ„์˜ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•œ๋‹ค

Model Architecture


๋ชจ๋ธ ์ „์ฒด ๊ตฌ์กฐ

  • ์ข‹์€ ์„ฑ๋Šฅ ๊ฐ€์ง„ Sequence model์€ ๋Œ€๋ถ€๋ถ„ ์ธ์ฝ”๋”-๋””์ฝ”๋” ๊ตฌ์กฐ, Transformer๋„ ๋งˆ์ฐฌ๊ฐ€์ง€
  • self-attention๊ณผ fully connected layer๋กœ ์ด๋ฃจ์–ด์ง


Encoder

  • ๋ณธ ๋…ผ๋ฌธ์€ 6๊ฐœ layer ์Œ“์Œ
    • ๊ฐ Layer๋Š” 2๊ฐœ์˜ sub-layer๋กœ ๊ตฌ์„ฑ
    • multi-head + feed forward
  • Reisdual Connection ์‚ฌ์šฉ
    • โˆต ๊ฐ sub layer ์ถœ๋ ฅ์ด LayerNorm(x+Sublayer(x))

Decoder

  • ์ธ์ฝ”๋”์™€ ๊ฑฐ์˜ ์œ ์‚ฌ, ๋™์ผํ•˜๊ฒŒ 6๊ฐœ ์Œ“์Œ
    • ๋‹จ ๊ฐ layer๊ฐ€ 3๊ฐœ์˜ layer๋กœ ๊ตฌ์„ฑ๋จ
    • multi-head + multi-head + feed forward
  • Residual Connection ์‚ฌ์šฉ
  • Masking
    • ๊ฐ ํฌ์ง€์…˜๋ณด๋‹ค ๋’ค์— ์žˆ๋Š” ๋‹จ์–ด์— ๋Œ€ํ•ด ์•Œ์ง€ ๋ชปํ•˜๊ฒŒ

  • Query : ๋ฌผ์–ด๋ณด๋Š” ์ฃผ์ฒด
  • Key : ๋Œ€์ƒ
  • Value :
  • Query๊ฐ€ Key์— ๋Œ€ํ•ด ์งˆ๋ฌธํ•œ๋‹ค

profile
๋ง‰์ƒ ํ•˜๋ฉด ๋ชจ๋ฅด๋‹ˆ๊นŒ ์ผ๋‹จ ํ•˜์ž.

0๊ฐœ์˜ ๋Œ“๊ธ€