Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

임재석·2025년 1월 2일

paper-study

목록 보기

20/23

PLMs learn a substantial amount of in-depth knowledge from data
- it can't expand or revise their memory
- can't straightforwardly provide insight into their predictions
- hallucination
Hybrid Models (REALM, ORQA)
- parametric + non-parametric (retrieval-based)
- seq2seq transformer + vector index + pre-trained neural retriever $\rightarrow$ RAG
- per-sequence bases vs. per-token basis
- This can be fine-tuned on any seq2seq task (generator and retriever are jointly learned)

Enrich systems with non-parametric memory
- parametric and non-parametric components are pretrained and pre-loaded
- using pre-trained access mechanisms, accessing knowledge without additional training is possible
Works well with Knowledge-Intensive Tasks
- Humans could not reasonably be expected to perform without access to an external knowledge source

$x$ (input sequence) $\rightarrow$ $z$ (text documents) $\rightarrow$ $y$ (target sequence)
- $p_{\eta} (z | x)$ : retriever (returns top-K distributions)
- $p_{\theta}(y_i | x, z, y_{1:i-1})$ : generator
- $z$ as a latent variable

$p_{\text{RAG-Sequence}} (y|x) = \displaystyle \sum_{z \in \text{top-}k(p(\cdot|x))} p_{\eta}(z|x) p_{\theta}(y | x, z)$
uses the same retrieved document to generate the complete sequence

$p_{\text{RAG-Token}} (y|x) = \displaystyle \prod _i ^N \sum_{z \in \text{top-}k(p(\cdot|x))} p_{\eta}(z|x) p_{\theta}(y_i | x, z_i, y_{1:i-1})$
draw a different latent document for each target token
generator to choose content form several documents when producing an answer
computes a distribution for the next output token for each document
used for sequence classification $\rightarrow$ target class as a length-one sequence

jointly train retriever and generator without any direct supervision on the document
NLL Loss, Adam, SGD
only trained query encoder and generator

Wikipedia as document index (100 token chunk, 21M documents)
FAISS, HNSW
k = 5 or 10
Open Domain QA, Abstractive QA, Jeopardy QA (non-standard QA format, fact to entity), Fact Verification (retrieve from Wikipedia and reason whether the given claim is true)
Natural Questions / TriviaQA / WebQuestions / CuratedTrec $\rightarrow$ Exact Match Scores
MSMARCO NLG task v2.1 (only question and answer)
SearchQA $\rightarrow$ SQuAD-tuned Q-BLEU-1
FEVER $\rightarrow$ label accuracy

Retrieval Ablations
Index Hop-swapping
- Changed from Wikipedia 2018 to DrQA Wikipedia dump
Retrieving more documents
- didn't observe significant differences and performances