[Papers] DIET: Lightweight Language Understanding for Dialogue Systems ๐Ÿƒโ€โ™‚๏ธ

KwanHongยท2020๋…„ 12์›” 21์ผ
1

Papers

๋ชฉ๋ก ๋ณด๊ธฐ
2/3
post-thumbnail

๐ŸŽฏ๊ฐœ์š”

โ” Introduction

Two common approaches for data-driven dialouge modeling

  • ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ฃผ๋„ ๋Œ€ํ™” ๋ชจ๋ธ๋ง(data-driven dialogue modeling)์€ ํฌ๊ฒŒ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‘ ๊ฐ€์ง€ ๋ฐฉ์‹์œผ๋กœ ์ ‘๊ทผํ•œ๋‹ค

    • Modular system
      Natural Language Understanding(NLU) ์‹œ์Šคํ…œ๊ณผ Natural Language Generation(NLG) ์‹œ์Šคํ…œ์œผ๋กœ ๋ถ„๋ฆฌํ•˜์—ฌ ์ „์ฒด ์‹œ์Šคํ…œ ๊ตฌ์„ฑ
      Dialogue policy๊ฐ€ NLU ์‹œ์Šคํ…œ์—์„œ ๋‚˜์˜จ ๋ถ„์„ ๊ฒฐ๊ณผ๋กœ ์‹œ์Šคํ…œ์˜ ๋‹ค์Œ ํ–‰๋™(action)์„ ์„ ํƒํ•œ๋‹ค.
      ๊ทธ ๋‹ค์Œ์— NLG ์‹œ์Šคํ…œ์ด ์ด์— ๋Œ€์‘ํ•˜๋Š” ์‘๋‹ต์„ ์ƒ์„ฑํ•œ๋‹ค.

    • End-to-End
      ์‚ฌ์šฉ์ž์˜ ์ž…๋ ฅ์„ dialogue policy์— ์ง์ ‘ ๋„ฃ์–ด ์‹œ์Šคํ…œ์˜ ๋‹ค์Œ ์˜ˆ์ƒ ๋ฐœํ™”๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค.

NLU: Intent classification and Entity recognition

๋Œ€ํ™” ์‹œ์Šคํ…œ์˜ NLU๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ intent classification๊ณผ entity recognition์˜ ๋‘ ๊ฐ€์ง€ sub-task๋ฅผ ๋งํ•œ๋‹ค.

๋‹จ์ˆœํžˆ ์‹œ์Šคํ…œ์—์„œ ๋‘ ๊ฐœ์˜ task๋ฅผ ๋ณ„๊ฐœ๋กœ ๋ชจ๋ธ๋งํ•˜๋ฉด error propagation์˜ ์•…์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค.
์ด์™€ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์ค‘ ํƒœ์Šคํฌ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋‹จ์ผ ์•„ํ‚คํ…์ฒ˜(single multi-task architecture)๋ฅผ ๊ตฌ์„ฑํ•˜์—ฌ, ๋‘ ๊ฐœ์˜ ํƒœ์Šคํฌ ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ์˜ ์ด์ ์„ ์–ป์–ด์•ผ ํ•œ๋‹ค.

์ตœ๊ทผ์˜ ์—ฐ๊ตฌ๋“ค์—์„œ ๋Œ€๊ทœ๋ชจ pre-trained ๋ชจ๋ธ์ด ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์ง€๋งŒ, ์ด์™€ ๊ฐ™์€ ๋ชจ๋ธ์„ ์œ„ํ•œ pre-training ๋ฐ fine-tuning์˜ ํ•™์Šต ๋น„์šฉ์€ ์ƒ๋‹นํžˆ ๋†’๋‹ค.

DIET (Dual Intent and Entity Transformer)

๋ณธ ์—ฐ๊ตฌ๋Š” intent classification๊ณผ entity recognition๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋‹ค์ค‘-ํƒœ์Šคํฌ ์•„ํ‚คํ…์ฒ˜(multi-task architecture)๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

์ด ์•„ํ‚คํ…์ฒ˜์˜ ์ฃผ์š” ํŠน์ง•์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • Sparse feature + Dense feature
    ์–ธ์–ด ๋ชจ๋ธ์˜ pre-trained ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ(dense)๊ณผ character level n-gram ํŠน์ง•๊ณผ ์กฐํ•ฉํ•  ์ˆ˜ ์žˆ๋‹ค.
    DIET sparse features๋งŒ ์‚ฌ์šฉํ•˜์—ฌ๋„ ๋ณต์žกํ•œ NLU ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•˜์—ฌ SOTA(state of the art) ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๊ณ , pre-trained ํŠน์ง•์„ ์ถ”๊ฐ€ํ•˜๋ฉด ์„ฑ๋Šฅ์ด ๋”์šฑ ๊ฐœ์„ ๋˜์—ˆ๋‹ค.

๐Ÿ”จ DIET architecture

DIET ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ํ•ต์‹ฌ ์š”์†Œ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

Featurization

  • ์ž…๋ ฅ ๋ฌธ์žฅ์€ ํŒŒ์ดํ”„๋ผ์ธ์— ๋”ฐ๋ผ word ๋˜๋Š” sub-word ํ† ํฐ sequence๋กœ ๋‹ค๋ฃฌ๋‹ค.
  • ๊ฐ ๋ฌธ์žฅ ๋์—๋Š” CLS ํ† ํฐ์„ ์ถ”๊ฐ€ํ•œ๋‹ค.
  • ๊ฐ ํ† ํฐ์€ sparse feature๋กœ ํŠน์ง•ํ™”(featureize)๋˜๋ฉฐ, ์„ ํƒ์ ์œผ๋กœ dense feature๋กœ๋„ ํŠน์ง•ํ™” ๋œ๋‹ค.
  • Fully connected layer๋ฅผ ํ†ต๊ณผํ•œ sparse feature์™€ dense feature๊ฐ€ concatenate ๋˜์–ด Transformer๋กœ ์ž…๋ ฅ๋œ๋‹ค.

  • Sparse feature
    • ํ† ํฐ ๋ ˆ๋ฒจ์˜ one-hot encoding ๋˜๋Š” multi-hot encodings of character n-grams
    • Character n-grams์€ ๋ถˆํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ๋งŽ์ด ํฌํ•จํ•˜๊ณ  ์žˆ์–ด์„œ, ์˜ค๋ฒ„ํ”ผํŒ…(overfitting)์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ๋“œ๋กญ์•„์›ƒ(dropout)์„ ์ ์šฉํ•œ๋‹ค.

  • Dense feature
    • ConveRT, BERT, GloVe์™€ ๊ฐ™์€ pre-trained ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉ
    • CLS ํ† ํฐ์˜ ๊ฒฝ์šฐ, ConveRT์˜ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ,BERT์˜ CLS ํ† ํฐ, GloVe์˜ ํ† ํฐ ์ž„๋ฒ ๋”ฉ ํ‰๊ท ๊ฐ’์œผ๋กœ ์ดˆ๊ธฐ ์„ค์ •ํ•œ๋‹ค.

Transformer

  • 2-layer transformer๋กœ ๋ฌธ์žฅ์„ ์ธ์ฝ”๋”ฉํ•œ๋‹ค
  • Transformer layer์™€ ์ž…๋ ฅ์˜ ์ฐจ์›์€ ๋™์ผํ•˜๊ฒŒ ๋งž์ถ”์–ด์•ผ ํ•˜๋ฏ€๋กœ, concatenateํ•œ ์ž…๋ ฅ feature๋ฅผ ๋˜ ๋‹ค๋ฅธ fully connected layer๋ฅผ ํ†ต๊ณผ์‹œํ‚จ๋‹ค.

Named entity recognition

  • ๊ฐœ์ฒด ๋ ˆ์ด๋ธ” ์‹œํ€€์Šค๋Š” transformer ์œ„์— CRF layer๋ฅผ ํ†ต๊ณผํ•˜์—ฌ ์˜ˆ์ธก๋œ๋‹ค.

Intent classification

  • Transformer๋ฅผ ํ†ต๊ณผํ•œ __CLS__ ํ† ํฐ์˜ ์ถœ๋ ฅ aCLS์™€ ์ธํ…ํŠธ ๋ ˆ์ด๋ธ” yintent์€ ๋‹จ์ผ ๋ฒกํ„ฐ ๊ณต๊ฐ„์— ์ž„๋ฒ ๋”ฉ๋œ๋‹ค.

  • Dot-product loss๋กœ ํƒ€๊ฒŸ(์ •๋‹ต) ๋ ˆ์ด๋ธ” y+intent๊ณผ์˜ ์œ ์‚ฌ๋„ ๊ฐ’์„ ์ตœ๋Œ€ํ™”
  • Negative sample์ธ y-intent๊ณผ์˜ ์œ ์‚ฌ๋„ ๊ฐ’์€ ์ตœ์†Œํ™”์‹œํ‚จ๋‹ค.

  • Intent loss(LI)๋Š” ์œ„์—์„œ ๊ตฌํ•œ positive/negative ๊ฐ๊ฐ์˜ ์œ ์‚ฌ๋„ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•œ๋‹ค.

Masking

  • BERT์—์„œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ž…๋ ฅ ํ† ํฐ์„ ๋ฌด์ž‘์œ„๋กœ ๋งˆ์Šคํ‚นํ•˜๋Š” ํ•™์Šต ์„ค์ •.
  • ์ž…๋ ฅ ์‹œํ€€์Šค์˜ 15%๋ฅผ ๋งˆ์Šคํ‚น
  • ๋งˆ์Šคํ‚นํ•˜๋Š” ํ† ํฐ์€ 70%์˜ ํ™•๋ฅ ๋กœ __MASK__ ํ† ํฐ์œผ๋กœ, 10%์˜ ํ™•๋ฅ ๋กœ ๋žœ๋ค ํ† ํฐ์œผ๋กœ, 20%์˜ ํ™•๋ฅ ๋กœ ์›๋ž˜ ํ† ํฐ์„ ์œ ์ง€ํ•œ๋‹ค.

  • Intent loss๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ์‹๊ณผ ๋™์ผํ•˜๊ฒŒ Mask loss(LM)๋ฅผ ๊ตฌํ•œ๋‹ค.

Total loss

  • Multi-task ํ•™์Šต ๋ฐฉ์‹์˜ ๋ชจ๋ธ๋กœ ๊ฐ๊ฐ์˜ task์˜ loss๋ฅผ ์ดํ•ฉํ•œ loss๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต๋œ๋‹ค.
  • ์ด ์•„ํ‚คํ…์ฒ˜์—์„œ๋Š” ํŠน์ • loss๋ฅผ turn-off ํ•˜์—ฌ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

๐ŸงชExperiments

Experiments on NLU-benchmark dataset

  • NLU-benchmark dataset์€ 10๊ฐœ์˜ fold๋กœ ๊ตฌ์„ฑ
  • ๊ฐ fold๋ณ„๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜์—ฌ ์„ฑ๋Šฅ์˜ ํ‰๊ท  ๊ณ„์‚ฐ
  • ์‹คํ—˜ํ•œ DIET ๋ชจ๋ธ์€ token level์—์„œ one-hot encoding๊ณผ character n-grams์˜ multi-hot encodings์„ ์‚ฌ์šฉ / ConveRT์˜ dense embedding์„ ์‚ฌ์šฉ
  • Entity task์˜ precision์„ ์ œ์™ธํ•œ ๋ชจ๋“  ์ง€ํ‘œ์—์„œ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„

Importance of different featurization components and masking

  • Feature์˜ ๋‹ค์–‘ํ•œ ์กฐํ•ฉ์œผ๋กœ ์ธํ•œ ์„ฑ๋Šฅ์„ ๋น„๊ต

Comparison with fine-tuned BERT (NLU-benchmark dataset)

  • DIET ๋ชจ๋ธ์€ ConveRT์˜ ์ž„๋ฒ ๋”ฉ์„ dense feature๋กœ, ๋‹จ์–ด-์บ๋ฆญํ„ฐ ๋ ˆ๋ฒจ์˜ sparse feature๋ฅผ ์‚ฌ์šฉ
  • BERT๋Š” DIET ๋ชจ๋ธ ์•ˆ์—์„œ fine tuningํ•˜๋Š” ๋ฐฉ์‹
  • ๋‘ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์€ ๋Œ€๋“ฑํ•˜๊ฒŒ ๋‚˜์˜ค์ง€๋งŒ, ํ•™์Šต ์†๋„๊ฐ€ DIET์ด 6๋ฐฐ์ •๋„ ๋น ๋ฆ„

๐Ÿ”ŽConclusion

  • ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ข…๋ฅ˜์˜ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋ชจ๋“  ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ž„๋ฒ ๋”ฉ ๊ตฌ์„ฑ์€ ์—†๋Š” ๊ฒƒ์„ ํ™•์ธ.
    • ๋”ฐ๋ผ์„œ, ์ด๋Š” ๋ชจ๋“ˆํ™”๋œ ์•„ํ‚คํ…์ฒ˜(modular architecture) ๋ฐฉ์‹์˜ ์ค‘์š”์„ฑ์„ ๊ฐ•์กฐํ•จ.
  • GloVe์™€ ๊ฐ™์€ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ๋„ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ์ž„๋ฒ ๋”ฉ๊ณผ ๋น„๊ตํ•˜์—ฌ ๋Œ€๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์„ ํ™•์ธ.
    • Pre-trained ์ž„๋ฒ ๋”ฉ์„ ๊ตณ์ด ์‚ฌ์šฉํ•˜์ง€ ์•Š๋”๋ผ๋„, ๋‹ค๋ฅธ ๋ชจ๋ธ๊ณผ ๋Œ€๋“ฑํ•œ ์„ฑ๋Šฅ์ด ๋‚˜์˜จ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธ
  • ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” pre-trained ์ž„๋ฒ ๋”ฉ ๊ตฌ์„ฑ์œผ๋กœ, DIET ๋ชจ๋ธ์ด fine-tuning BERT ๋ณด๋‹ค ํ•™์Šต์†๋„๊ฐ€ ์—ฌ์„ฏ๋ฐฐ ๋น ๋ฅด๋ฉด์„œ๋„ ์„ฑ๋Šฅ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ
profile
๋ณธ์งˆ์— ์ง‘์ค‘ํ•˜๋ ค๊ณ  ๋…ธ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ”จ

0๊ฐœ์˜ ๋Œ“๊ธ€