[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

์•ˆ์œ ๋ฏผยท2025๋…„ 7์›” 18์ผ

๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

๋ชฉ๋ก ๋ณด๊ธฐ
3/7
post-thumbnail

๐Ÿ“Œ Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
๐Ÿ“ ์ €์ž : Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela
๐Ÿ“… ๋ฐœํ–‰ ์—ฐ๋„ : Submitted on 22 May 2020 (v1), last revised 12 Apr 2021 (this version, v4)
๐Ÿ”— ๋…ผ๋ฌธ ๋งํฌ : https://arxiv.org/abs/2005.11401


Abstract

์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ sequence-to-sequence ์–ธ์–ด ๋ชจ๋ธ(BART)์— ์™ธ๋ถ€ ์ง€์‹ ์†Œ์Šค๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ง€์‹ ์ง‘์•ฝํ˜• ํƒœ์Šคํฌ(Knowledge-Intensive Tasks)๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” Retrieval-Augmented Generation (RAG) ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

๊ธฐ์กด์˜ GPT, BERT ๊ธฐ๋ฐ˜ ์ƒ์„ฑ๊ธฐ๋Š” ๋ชจ๋“  ์ง€์‹์„ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ ๋‚ด๋ถ€์—๋งŒ ์ €์žฅํ•˜๊ณ  ์žˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์—,

  • ์—…๋ฐ์ดํŠธ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ง€์‹
  • ์‚ฌ์‹ค ์˜ค๋ฅ˜(hallucination)
  • ์ถœ์ฒ˜ ๋ถˆ๋ช… ์ƒ์„ฑ๋ฌผ

๋“ฑ์˜ ๋ฌธ์ œ๋ฅผ ์•ˆ๊ณ  ์žˆ์—ˆ๋‹ค.

RAG๋Š” ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด,

  • Dense retriever (DPR)๋กœ ๊ด€๋ จ ์ง€์‹์„ ๊ฒ€์ƒ‰ํ•˜๊ณ ,
  • BART ๊ธฐ๋ฐ˜ ์ƒ์„ฑ๊ธฐ๋ฅผ ํ†ตํ•ด ํ•ด๋‹น ์ง€์‹์„ ์กฐ๊ฑด์œผ๋กœ
  • fact-awareํ•œ ํ…์ŠคํŠธ ์ƒ์„ฑ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

โ— ์™ธ๋ถ€ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•ด์„œ ๊ทธ ๋‚ด์šฉ์„ ์กฐ๊ฑด์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” LLM ๊ตฌ์กฐ = "๊ฒ€์ƒ‰๊ณผ ์ƒ์„ฑ์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ" ๋ชจ๋ธ


1. Introduction

๊ธฐ์กด LLM์˜ ํ•œ๊ณ„

Transformer ๊ธฐ๋ฐ˜ LLM๋“ค์€ ๋Œ€๊ทœ๋ชจ ํ…์ŠคํŠธ ์ฝ”ํผ์Šค๋กœ ์‚ฌ์ „ ํ•™์Šต๋˜์–ด ์ƒ๋‹นํ•œ ์ง€์‹์„ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋‚ด์žฌํ™”ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

  • ์ง€์‹ ์—…๋ฐ์ดํŠธ ๋ถˆ๊ฐ€ : ์ƒˆ๋กœ์šด ์ •๋ณด๊ฐ€ ์ƒ๊ฒจ๋„ ๊ธฐ์กด ๋ชจ๋ธ์€ ์žฌํ•™์Šต ์—†์ด๋Š” ๋ฐ˜์˜ ๋ถˆ๊ฐ€
  • fact-check ๋ฏธ์ง€์› : ๋ชจ๋ธ์ด ์–ด๋–ค ๊ทผ๊ฑฐ๋กœ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ–ˆ๋Š”์ง€ ์ถ”์  ์–ด๋ ค์›€
  • hallucination ๋ฌธ์ œ : ์‚ฌ์‹ค๊ณผ ๋ฌด๊ด€ํ•œ ๋‚ด์šฉ์„ โ€œ๊ทธ๋Ÿด๋“ฏํ•˜๊ฒŒโ€ ์ƒ์„ฑํ•˜๋Š” ํ˜„์ƒ

ํŠนํžˆ ์ง€์‹ ์ง‘์•ฝํ˜• ์ž‘์—…(Knowledge-Intensive Tasks)

  • ์˜คํ”ˆ ๋„๋ฉ”์ธ ์งˆ์˜์‘๋‹ต (Open-domain QA)
  • ์žฅ๋ฌธ ์‘๋‹ต ์ƒ์„ฑ (ELI5)
  • fact verification (FEVER)

๋“ฑ์—์„œ๋Š” ์ด ํ•œ๊ณ„๊ฐ€ ์น˜๋ช…์ ์ด๋‹ค.

RAG์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด

RAG๋Š” ์ด๋Ÿฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด retrieval(๊ฒ€์ƒ‰)๊ณผ generation(์ƒ์„ฑ)์„ ๊ฒฐํ•ฉํ•œ ๋ชจ๋ธ์ด๋‹ค.

  • Query๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ๋จผ์ € Dense Retriever(DPR)๋ฅผ ์‚ฌ์šฉํ•ด ์™ธ๋ถ€ ์ธ๋ฑ์Šค(Wikipedia ๋“ฑ)์—์„œ ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ k๊ฐœ ๊ฒ€์ƒ‰

    Dense Retriever(DPR) : ๊ณ ์ •๋œ ๋ฌธ์„œ ์ง‘ํ•ฉ์—์„œ ์ฟผ๋ฆฌ์™€ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ๋ฒกํ„ฐ ๊ธฐ๋ฐ˜์œผ๋กœ ํšจ์œจ์ ์œผ๋กœ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•

    โ†’ ๊ธฐ์กด์˜ sparse retrieval ๋ฐฉ์‹(BM25 ๋“ฑ)๊ณผ ๋‹ฌ๋ฆฌ, ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜๋ฏธ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ๋ฌธ์„œ๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ

  • ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ์กฐ๊ฑด์œผ๋กœ BART decoder๊ฐ€ ์ž์—ฐ์–ด ์‘๋‹ต์„ ์ƒ์„ฑ

  • Knowledge Injection without Retraining
    : ์ง€์‹์„ ์™ธ๋ถ€์—์„œ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ฃผ์ž…๋ฐ›์œผ๋ฏ€๋กœ, ์ง€์‹์ด ๋ณ€๊ฒฝ๋˜๋”๋ผ๋„ ๋ชจ๋ธ ์žฌํ•™์Šต์ด ํ•„์š” ์—†๋‹ค!


2. Methods

2.1 Models

RAG๋Š” ๋‘ ๊ฐ€์ง€ ๋ฒ„์ „์œผ๋กœ ๋‚˜๋‰œ๋‹ค.

  • RAG-Sequence : ํ•˜๋‚˜์˜ ๋ฌธ์„œ ์ „์ฒด๋ฅผ ์กฐ๊ฑด์œผ๋กœ ์ƒ์„ฑ

  • RAG-Token : ํ† ํฐ๋งˆ๋‹ค ๋‹ค๋ฅธ ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•ด ์ƒ์„ฑ

2.2 Dense Retriever (DPR)

RAG๋Š” Dense Passage Retriever (DPR)๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ๊ณ ์ •๋œ ๋ฌธ์„œ ์ง‘ํ•ฉ์—์„œ query์— ๊ฐ€์žฅ ์œ ์‚ฌํ•œ ๋ฌธ์„œ๋“ค์„ ๋ฒกํ„ฐ ๊ณต๊ฐ„ ์ƒ์—์„œ ๊ฒ€์ƒ‰ํ•œ๋‹ค.

  • ๋‘ ๊ฐœ์˜ ๋…๋ฆฝ์ ์ธ BERT encoder๋ฅผ ์‚ฌ์šฉ

    • Query encoder : ์งˆ๋ฌธ์„ ์ž„๋ฒ ๋”ฉ
    • Context encoder : ๋ฌธ์„œ๋“ค์„ ์ž„๋ฒ ๋”ฉ
  • ๋‚ด์  ์œ ์‚ฌ๋„(dot product)๋ฅผ ๊ณ„์‚ฐํ•ด top-k ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•จ

์ด ๊ตฌ์กฐ๋Š” Sparse BM25๋ณด๋‹ค ๋น ๋ฅด๋ฉฐ, ํŒŒ์ธํŠœ๋‹์ด ๊ฐ€๋Šฅํ•ด downstream task์— ์ตœ์ ํ™”๋œ๋‹ค.

2.3 Generator (BART ๊ธฐ๋ฐ˜)

  • ์ƒ์„ฑ๊ธฐ๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ BART ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ฉฐ, sequence-to-sequence ๊ตฌ์กฐ
    • Encoder : ์„ ํƒ๋œ ๋ฌธ์„œ z๋ฅผ ์ž…๋ ฅ
    • Decoder : query๋ฅผ ์กฐ๊ฑด์œผ๋กœ ์‘๋‹ต์„ ์ƒ์„ฑ
  • Latent variable z : ์„ ํƒ๋œ ๋ฌธ์„œ

  • ์ตœ์ข… ์ถœ๋ ฅ ๋ถ„ํฌ๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ๋งˆ๋ฅด์ฝ”ํ”„ ๋ชจ๋ธ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

    • x : ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ (query)
    • y : ์ƒ์„ฑ๋œ ์‘๋‹ต (output)
    • z : ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ (retrieved latent variable)

๊ฒ€์ƒ‰๊ธฐ(DPR)๋Š” p(z|x)๋ฅผ ์˜ˆ์ธกํ•˜๊ณ , ์ƒ์„ฑ๊ธฐ(BART)๋Š” p(y|x,z)๋ฅผ ์˜ˆ์ธกํ•˜์—ฌ ์ „์ฒด์ ์œผ๋กœ p(y|x)๋ฅผ ๊ทผ์‚ฌํ•จ

โ†’ z๋Š” latent ์ƒํƒœ, y๋Š” ๊ด€์ธก๊ฐ’์ฒ˜๋Ÿผ ํ•ด์„ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ RAG๋Š” ๋งˆ๋ฅด์ฝ”ํ”„ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ ๋ชจ๋ธ๋กœ ํ•ด์„ ๊ฐ€๋Šฅ

์ฆ‰, ๊ฒ€์ƒ‰ ํ™•๋ฅ ๊ณผ ์ƒ์„ฑ ํ™•๋ฅ ์„ ๊ฒฐํ•ฉํ•œ ํ˜•ํƒœ๋กœ ์„ค๊ณ„๋œ๋‹ค.

2.4 Training

  • DPR๊ณผ BART๋ฅผ ํ•จ๊ป˜ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก end-to-end fine-tuning์ด ๊ฐ€๋Šฅ
  • ํ•™์Šต ์‹œ์—๋Š” p(z|x)๋ฅผ approximation ํ•˜๊ธฐ ์œ„ํ•ด top-k ๋ฌธ์„œ๋งŒ ์„ ํƒํ•˜์—ฌ marginalization ์ˆ˜ํ–‰

  • Generator๋Š” ๊ฐ ๋ฌธ์„œ๋ฅผ ์กฐ๊ฑด์œผ๋กœ ์‘๋‹ต์„ ์ƒ์„ฑํ•˜๋ฉฐ, loss๋Š” cross-entropy ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณ„์‚ฐ

2.5 Decoding

  • ์ถ”๋ก  ์‹œ์—๋Š” beam search๋ฅผ ํ†ตํ•ด top-k ๋ฌธ์„œ์—์„œ ๊ฐ€๋Šฅํ•œ ์‘๋‹ต์„ ์ƒ์„ฑํ•˜๊ณ , ํ™•๋ฅ ์ ์œผ๋กœ ๊ฐ€์žฅ ๊ทธ๋Ÿด๋“ฏํ•œ ์‘๋‹ต์„ ์„ ํƒ

RAG๋Š” retriever์™€ generator์˜ ๊ฒฐํ•ฉ์ด๋ฏ€๋กœ, ์„ฑ๋Šฅ์€ retriever์˜ ํ’ˆ์งˆ์— ๋ฏผ๊ฐ
โ†’ retriever ์„ฑ๋Šฅ์ด ๋†’์„์ˆ˜๋ก hallucination์ด ์ค„์–ด๋“ฌ


3. Experiments

RAG ๋ชจ๋ธ์ด ๋‹ค์–‘ํ•œ ์ง€์‹ ์ง‘์•ฝํ˜• NLP ํƒœ์Šคํฌ์—์„œ ์–ด๋–ป๊ฒŒ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š”์ง€ ํ‰๊ฐ€ํ•œ๋‹ค. ๋ชจ๋“  ์‹คํ—˜์€ ๋™์ผํ•œ Wikipedia dump(2018๋…„ 12์›” ๋ฒ„์ „)์„ non-parametric ์ง€์‹ ์†Œ์Šค๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

  • ์ „์ฒด Wikipedia ๋ฌธ์„œ๋ฅผ 100๋‹จ์–ด ๋‹จ์œ„ ์ฒญํฌ๋กœ ๋ถ„ํ•  โ†’ ์ด ์•ฝ 2100๋งŒ ๊ฐœ ๋ฌธ์„œ

  • ๊ฐ ๋ฌธ์„œ ์ž„๋ฒ ๋”ฉ์€ Dense Passage Encoder (DPR)๋กœ ์ƒ์„ฑํ•˜๊ณ , ๋น ๋ฅธ ์œ ์‚ฌ ๋ฌธ์„œ ๊ฒ€์ƒ‰์„ ์œ„ํ•ด FAISS์˜ Hierarchical Navigable Small World (HNSW) ๊ตฌ์กฐ ๊ธฐ๋ฐ˜ MIPS ์ธ๋ฑ์Šค๋ฅผ ๊ตฌ์ถ•

  • ํ•™์Šต ์‹œ์—๋Š” ์ฟผ๋ฆฌ๋‹น ์ƒ์œ„ k๊ฐœ ๋ฌธ์„œ ๊ฒ€์ƒ‰ (k=5 ๋˜๋Š” 10)

  • ํ…Œ์ŠคํŠธ ์‹œ์—๋Š” dev data ๊ธฐ๋ฐ˜์œผ๋กœ k๋ฅผ ์„ค์ •


๋ฐ์ดํ„ฐ์…‹

  • Open-domain QA
    • Natural Questions : Google ์‹ค์ œ ์‚ฌ์šฉ์ž๋“ค์˜ ๊ฒ€์ƒ‰ ์งˆ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋Œ€๊ทœ๋ชจ QA ๋ฐ์ดํ„ฐ์…‹
    • TriviaQA : ํ€ด์ฆˆ ๋ฌธ์ œ ํ˜•์‹์˜ ์งˆ๋ฌธ์œผ๋กœ ๊ตฌ์„ฑ
    • WebQuestions : ์›น ์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•ด Freebase์—์„œ ์ •๋‹ต์„ ์—ฐ๊ฒฐํ•œ QA ๋ฐ์ดํ„ฐ์…‹
    • CuratedTREC : TREC QA ํŠธ๋ž™์—์„œ ์ˆ˜์ž‘์—…์œผ๋กœ ๋งŒ๋“  ์งˆ๋ฌธ/์ •๋‹ต ์Œ
  • Generation
    • MS MARCO : Microsoft๊ฐ€ ๋งŒ๋“  ์›น ๋ฌธ์„œ ๊ธฐ๋ฐ˜ ์งˆ๋ฌธ์‘๋‹ต ๋ฐ์ดํ„ฐ์…‹
    • ELI5(Explain Like Iโ€™m 5) : Reddit์—์„œ ์ˆ˜์ง‘๋œ โ€œ์•„์ฃผ ์‰ฌ์šด ์„ค๋ช…โ€ ์งˆ๋ฌธ-์‘๋‹ต ๋ฐ์ดํ„ฐ
  • Fact verification
    • FEVER(Fact Extraction and VERification) : ์œ„ํ‚คํ”ผ๋””์•„ ๊ธฐ๋ฐ˜ fact-checking ๋ฐ์ดํ„ฐ์…‹


4. Results

๊ธฐ์กด BART ๋Œ€๋น„ 2๋ฐฐ ์ด์ƒ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ

โžก๏ธ BART์— ๋น„ํ•ด factual consistency๊ฐ€ ํ–ฅ์ƒ, hallucination ๊ฐ์†Œ

  • ์‚ฌ์‹ค์„ฑ(factuality) : ์ƒ์„ฑ ๋ฌธ์žฅ์ด ์‹ ๋ขฐ ๊ฐ€๋Šฅํ•œ ์™ธ๋ถ€ ์ถœ์ฒ˜๋กœ ์ž…์ฆ ๊ฐ€๋Šฅํ•œ๊ฐ€?
  • ๊ตฌ์ฒด์„ฑ(specificity) : ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ ๊ฐ„์˜ ๊ฐ•ํ•œ ์ƒํ˜ธ ์˜์กด์„ฑ ์กด์žฌ ์—ฌ๋ถ€

  • RAG-Token-BM25 : BM25๋กœ ๊ฒ€์ƒ‰๋œ ์—ฌ๋Ÿฌ ๋ฌธ์„œ ์ค‘์—์„œ ํ† ํฐ๋งˆ๋‹ค ๋‹ค๋ฅธ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜๋ฉฐ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹
  • RAG-Sequence-BM25 : BM25๋กœ ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ „์ฒด ์‘๋‹ต ์ƒ์„ฑ์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹

  • RAG-Token-Frozen : Dense retriever๋Š” ๊ณ ์ •์‹œํ‚ค๊ณ , ํ† ํฐ๋งˆ๋‹ค ๋ฌธ์„œ๋ฅผ ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹
  • RAG-Sequence-Frozen : retriever๋Š” ๊ณ ์ •, ํ•˜๋‚˜์˜ ๋ฌธ์„œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ „์ฒด ์‘๋‹ต ์ƒ์„ฑ

  • RAG-Token : retriever์™€ generator ๋ชจ๋‘ ํ•™์Šต๋˜๋ฉฐ, ๊ฐ ํ† ํฐ๋งˆ๋‹ค ๋‹ค๋ฅธ ๋ฌธ์„œ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์‹
  • RAG-Sequence : retriever์™€ generator ๋ชจ๋‘ ํ•™์Šต๋˜๋ฉฐ, ํ•˜๋‚˜์˜ ๋ฌธ์„œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ „์ฒด ์‘๋‹ต ์ƒ์„ฑ

โžก๏ธ ํŠนํžˆ RAG-Token์€ NQ, TQA, WQ, Jeopardy, MS MARCO ๋“ฑ ๋‹ค์–‘ํ•œ task์—์„œ fine-grained control์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ํŠน์„ฑ ๋•๋ถ„์— ๋” ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ์ •ํ™•ํ•œ ์‘๋‹ต ์ƒ์„ฑ์— ์œ ๋ฆฌํ•œ ๋ชจ์Šต์„ ๋ณด์ธ๋‹ค. ์ผ๋ถ€ task์—์„œ๋Š” RAG-Sequence๋ณด๋‹ค๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ–ˆ๋‹ค.

โžก๏ธ ๋ฐ˜๋ฉด, RAG-Sequence๋Š” ๋ฌธ์„œ ์ „์ฒด ๋งฅ๋ฝ์„ ๋ฐ”ํƒ•์œผ๋กœ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฌธ๋งฅ ์—ฐ๊ฒฐ์ด๋‚˜ ๊ธธ์ด ์žˆ๋Š” ์‘๋‹ต ์ƒ์„ฑ์—์„œ ๋” ์•ˆ์ •์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. ํŠนํžˆ CT๋‚˜ MS MARCO R-L ์ง€ํ‘œ์—์„œ ๊ฐ•์„ธ๋ฅผ ๋ณด์ธ๋‹ค.

โžก๏ธ BM25 ๊ธฐ๋ฐ˜์˜ RAG๋“ค์€ semantic matching์— ์•ฝํ•ด ์„ฑ๋Šฅ์ด ์ „๋ฐ˜์ ์œผ๋กœ ๋‚ฎ๊ฒŒ ๋‚˜ํƒ€๋‚œ๋‹ค. ํŠนํžˆ ์ƒ์„ฑ ๊ด€๋ จ ์ง€ํ‘œ(BLEU, ROUGE ๋“ฑ)์—์„œ ๊ทธ ์ฐจ์ด๊ฐ€ ๋” ๋„๋“œ๋ผ์ง„๋‹ค.

โžก๏ธ Frozen retriever๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, retriever๋ฅผ ํ•™์Šต์‹œํ‚ค์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์„ฑ๋Šฅ์ด ํ•˜์ด๋ธŒ๋ฆฌ๋“œ(end-to-end fine-tuned) RAG๋ณด๋‹ค ๋–จ์–ด์ง€์ง€๋งŒ, BM25๋ณด๋‹ค๋Š” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ธ๋‹ค.


์ƒ์„ฑ ์‘๋‹ต์˜ ํ’ˆ์งˆ์„ ์ง์ ‘ ๋ถ„์„ํ•ด๋ณด๋ฉด

  • ๋” ๋‹ค์–‘ํ•˜๊ณ  ์ž์—ฐ์Šค๋Ÿฌ์šด ํ‘œํ˜„
  • ์งˆ๋ฌธ๊ณผ ์ง์ ‘์ ์œผ๋กœ ์—ฐ๊ด€๋œ ๋ฌธ์žฅ ์ธ์šฉ
  • ์žฅ๋ฌธ ์‘๋‹ต์—์„œ๋„ ๊ตฌ์กฐ๊ฐ€ ์•ˆ์ •์ ์ž„

Q : Why is the sky blue?
BART : Because of light.
RAG : The blue color of the sky is caused by Rayleigh scattering of sunlight by the atmosphere.

  • PT, BERT, T5, BART ๋“ฑ์€ ๋ชจ๋‘ parametricํ•œ ์ง€์‹ ๊ตฌ์กฐ

  • OpenBookQA, REALM, ORQA ๋“ฑ์€ retrieval์„ ๋„์ž…ํ–ˆ์œผ๋‚˜ generator์™€ ๊ฒฐํ•ฉ ๋ฐฉ์‹์ด ์ œํ•œ์ 

  • RAG๋Š” search + generation + end-to-end ํ•™์Šต ๊ฐ€๋Šฅ์ด๋ผ๋Š” ์ ์—์„œ ์ฐจ๋ณ„ํ™”๋จ


6. Discussion

  • Open-domain QA ํƒœ์Šคํฌ์—์„œ SOTA ์„ฑ๋Šฅ ๋‹ฌ์„ฑ

  • ์‚ฌ๋žŒ ํ‰๊ฐ€์—์„œ BART๋ณด๋‹ค RAG์˜ ์ถœ๋ ฅ์ด ๋” ์‚ฌ์‹ค์ (factual)์ด๊ณ  ๊ตฌ์ฒด์ (specific)์ด๋ผ๋Š” ๊ฒฐ๊ณผ

  • retrieval ๋ชจ๋“ˆ์˜ ์œ ํšจ์„ฑ์„ ์ •๋Ÿ‰์ ยท์ •์„ฑ์ ์œผ๋กœ ๋ถ„์„

  • ๋ชจ๋ธ ์žฌํ•™์Šต ์—†์ด ์ง€์‹ ์—…๋ฐ์ดํŠธ๊ฐ€ ๊ฐ€๋Šฅํ•จ์„ ์ž…์ฆ

Broader Impact

  • ๊ธ์ •์  ํšจ๊ณผ

    • Wikipedia ๊ธฐ๋ฐ˜ factual generation์„ ํ†ตํ•ด, "hallucination" ํ˜„์ƒ(์‚ฌ์‹ค์ด ์•„๋‹Œ ํ—ˆ์œ„ ์ƒ์„ฑ) ๊ฐ์†Œ
    • ์ถœ๋ ฅ์— ๋Œ€ํ•ด ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ(interpretable) ๋ฐ ์ถœ์ฒ˜ ๊ธฐ๋ฐ˜ ํ†ต์ œ๋ ฅ ํ–ฅ์ƒ
    • ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ์— ํ™•์žฅ ๊ฐ€๋Šฅ ex) ์˜๋ฃŒ ์ธ๋ฑ์Šค๋ฅผ ์—ฐ๊ฒฐํ•œ ์˜๋ฃŒ QA ์‹œ์Šคํ…œ
    • ์ง์—… ์˜์—ญ์—์„œ ์ง€์‹ ์ž‘์—… ๋ณด์กฐ ๋„๊ตฌ๋กœ ํ™œ์šฉ ๊ฐ€๋Šฅ
  • ์ž ์žฌ์  ์œ„ํ—˜

    • ์™ธ๋ถ€ ์ง€์‹ ์†Œ์Šค(Wikipedia ํฌํ•จ)๋Š” ํŽธํ–ฅ ๋˜๋Š” ์˜ค๋ฅ˜ ๊ฐ€๋Šฅ์„ฑ ์กด์žฌ
    • RAG๋„ ๊ธฐ์กด ์–ธ์–ด ๋ชจ๋ธ๊ณผ ์œ ์‚ฌํ•œ ์˜ค๋‚จ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๊ฐ€์ง ex) GPT-2์ฒ˜๋Ÿผ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์•…์šฉ ๊ฐ€๋Šฅ์„ฑ(ํ—ˆ์œ„ ์ •๋ณด, ์กฐ์ž‘ ์ฝ˜ํ…์ธ  ์ƒ์„ฑ ๋“ฑ)
    • ๊ณ ๋„ํ™”๋œ ์–ธ์–ด ๋ชจ๋ธ์€ ์žฅ๊ธฐ์ ์œผ๋กœ ์ผ์ž๋ฆฌ ์ž๋™ํ™” ์œ„ํ—˜๋„ ํฌํ•จ๋จ
  • ๋Œ€์‘ ์ œ์•ˆ

    • ์˜ค๋‚จ์šฉ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•ด, AI ์‹œ์Šคํ…œ์œผ๋กœ AI ๊ธฐ๋ฐ˜ ์ŠคํŒธ/ํ—ˆ์œ„ ์ฝ˜ํ…์ธ ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฉ”ํƒ€-์‹œ์Šคํ…œ ๊ฐœ๋ฐœ ํ•„์š”

7. Conclusion

  • RAG๋Š” Knowledge-Intensive Task์— ์ตœ์ ํ™”๋œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ชจ๋ธ

  • Retrieval + Generation์˜ ๊ฒฐํ•ฉ์œผ๋กœ

    • ์ •ํ™•์„ฑ, ๋‹ค์–‘์„ฑ, explainability ๋™์‹œ ๊ฐœ์„ 
    • ์ง€์‹ ์—…๋ฐ์ดํŠธ ๊ฐ€๋Šฅ (retriever๋งŒ ๋ฐ”๊พธ๋ฉด ๋จ)
    • ํŒŒ์ธํŠœ๋‹์„ ํ†ตํ•œ task๋ณ„ ์ตœ์ ํ™” ๊ฐ€๋Šฅ
  • NLP์˜ ์ถ”๋ก  ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์‹ค์„ฑ์„ ํ™•๋ณดํ•˜๋Š” ์ค‘์š”ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ์ž๋ฆฌ์žก์Œ


๐Ÿ’ญ My Thoughts

  • ๋ชจ๋“  ์ง€์‹์„ ๋ชจ๋ธ ์•ˆ์— ๋•Œ๋ ค๋„ฃ๋Š” ๋ฐฉ์‹์ด ์•„๋‹ˆ๋ผ, ์™ธ๋ถ€ ์ง€์‹์„ ์ฐพ์•„์™€์„œ ์“ฐ๋Š” ๊ตฌ์กฐ๋ผ๋Š” ์ ์—์„œ "ํ•™์Šต ๋น„์šฉ์„ ์ค„์ด๋ฉด์„œ๋„ ์„ฑ๋Šฅ์€ ์œ ์ง€ํ•˜๋ ค๋Š”" ๊ณ ๋ฏผ์ด ๋А๊ปด์กŒ๋‹ค. ์•ž์œผ๋กœ ๋” ์ปค์ง€๋Š” ๋ชจ๋ธ ์‹œ๋Œ€์— ์ด๋Ÿฐ ๊ตฌ์กฐ๋Š” ๊ฝค ํ˜„์‹ค์ ์ธ ์„ ํƒ์ฒ˜๋Ÿผ ๋ณด์˜€๋‹ค.
  • ์‹ค์ œ๋กœ ํ”„๋กœ์ ํŠธ๋ฅผ ํ•˜๋ฉด์„œ RAG๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ฑ—๋ด‡์„ ๋งŒ๋“  ์ ์ด ์žˆ๋Š”๋ฐ, ์ด๋ ‡๊ฒŒ ์™ธ๋ถ€ ๋ฌธ์„œ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด ์œ ์šฉํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์—ฌ์ „ํžˆ hallucination์ด ์กด์žฌํ•ด์„œ prompt engineering์— ํž˜์ผ๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€