Enrich systems with non-parametric memory
Works well with Knowledge-Intensive Tasks
uses the same retrieved document to generate the complete sequence
draw a different latent document for each target token
generator to choose content form several documents when producing an answer
computes a distribution for the next output token for each document
used for sequence classification target class as a length-one sequence
used BERT as and
MIPS: Maximum Inner Product Search Problem
document index: non parametric memory
BART-Large 400M (seq2seq transformer)
simply concatenate and
jointly train retriever and generator without any direct supervision on the document
NLL Loss, Adam, SGD
only trained query encoder and generator
RAG-Token uses standard beam-decoder
RAG-Sequence performs beam-search for each doeument
Through Decoding vs. Fast Decoding
Wikipedia as document index (100 token chunk, 21M documents)
FAISS, HNSW
k = 5 or 10
Open Domain QA, Abstractive QA, Jeopardy QA (non-standard QA format, fact to entity), Fact Verification (retrieve from Wikipedia and reason whether the given claim is true)
Natural Questions / TriviaQA / WebQuestions / CuratedTrec Exact Match Scores
MSMARCO NLG task v2.1 (only question and answer)
SearchQA SQuAD-tuned Q-BLEU-1
FEVER label accuracy
RAG is more diverse thatn BART, less hallucinative
SotA models access gold passages while RAG is not
many questions are unanswerable without gold passages
not all questions are answerable from Wikipedia alone
Retrieval Ablations
Index Hop-swapping
Retrieving more documents