LangGraph: UseCase (Self RAG)

김지우·2025년 1월 29일

목차

작성 개요

할루시네이션 평가 체인

질문 재작성 체인

상태 정의

노드 정의

조건부 엣지 정의

그래프 정의 및 생성

1. 작성 개요

다음 게시물은 RAG의 다양한 방법 중 SELF RAG를 이해하고 적용하기 위해 정리하는 문서다. Self RAG의 경우 LLM의 응답 품질을 검토해, 쿼리와 결과를 개선하는 방법이다.

RAG의 성능이나 품질이 좋으려면 어떻게 해야할까? 어떻게 하면 RAG의 품질이 안좋을 지를 생각해보면 알 수 있을 것 같다.

나는 2가지 상황에서 RAG의 품질이 안좋아진다고 생각한다. 첫번째는 retriever에서 가져온 문서에 쿼리와 관련된 문서가 없는 경우다. 두번째는 적절한 문서를 LLM에 참조시켰음에도 불구하고 LLM이 적절한 답안을 하지 못하는 경우다.

개인적으로 self-RAG는 이와 같은 상황들을 가장 종합적으로 해결할 수 있는 RAG 구조라고 생각한다 .

해당 문서는 테디노트님의 유료강의를 듣고 정리한 문서로 해당 문서의 내용이 부족하다는 생각이 든다면 유료강의를 직접 정리해서 들으시면 된다.

2. 할루시네이션 평가 체인

우선 기본 Retrieval Chain을 추출하는 것은 추출하는 것은 이전에 작성한 CRAG와 같은 방식으로 진행하면 된다.

그것과 더불어 참고해야 하는 부분은 문서 검색 평가기(Retrieval Grader)를 만드는 부분이다.쿼리를 통해 Retrieval에서 LLM에 참조시킬 문서들을 반환하는데, 적절한 문서를 반환 했는지 아닌지 확인하는 평가기가 필요하다. 평가기는 디자인하기에 따라 정수형의 점수를 반환하는 평가기로 디자인 할 수 있다. 여기서는 관련성 여부에 따라 yes 혹은 no를 반환한다.

from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_teddynote.models import get_model_name, LLMs

# 최신모델 이름 설정
MODEL_NAME = get_model_name(LLMs.GPT4o)


# 데이터 모델 정의: 검색된 문서의 관련성을 이진 점수로 평가하기 위한 데이터 모델
class GradeDocuments(BaseModel):
    """A binary score to determine the relevance of the retrieved documents."""

    # 문서가 질문에 관련이 있는지 여부를 'yes' 또는 'no'로 나타내는 필드
    binary_score: str = Field(
        description="Documents are relevant to the question, 'yes' or 'no'"
    )


# LLM 초기화
llm = ChatOpenAI(model=MODEL_NAME, temperature=0)

# GradeDocuments 데이터 모델을 사용하여 LLM의 구조화된 출력 생성
structured_llm_grader = llm.with_structured_output(GradeDocuments)

# 시스템 프롬프트 정의: 검색된 문서가 사용자 질문에 관련이 있는지 평가하는 시스템 역할 정의
system = """You are a grader assessing relevance of a retrieved document to a user question. \n 
    It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
    If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""

# 채팅 프롬프트 템플릿 생성
grade_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),
    ]
)

# 검색 평가기 생성
retrieval_grader = grade_prompt | structured_llm_grader

그러면 할루시네이션 여부를 판단하는 체인도 필요하다. 해당 체인 역시 차후에 특정 노드에 활용될 예정이다.

여기서 확인하고자 하는 것은 LLM이 참고하는 문서와 LLM이 반환한 결과 사이의 연관성을 파악하고자 하는 것이다. 둘 사이의 연관성이 없고, 답변이 문서에 드러난 사실을 기반해 생성한 것이 아니라면 할루시네이션이 일어난 것으로 볼 수 있다.

from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI


# 데이터 모델 정의: 생성된 답변이 사실에 기반하고 있는지 여부를 이진 점수로 평가하기 위한 데이터 모델
class Groundednesss(BaseModel):
    """A binary score indicating whether the generated answer is grounded in the facts."""

    # 답변이 사실에 기반하고 있는지 여부를 'yes' 또는 'no'로 나타내는 필드
    binary_score: str = Field(
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )


# LLM 초기화
llm = ChatOpenAI(model=MODEL_NAME, temperature=0)

# 구조화된 출력과 함께 LLM 설정
structured_llm_grader = llm.with_structured_output(Groundednesss)

# 시스템 프롬프트 정의
system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n 
Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""

# 채팅 프롬프트 템플릿 생성
groundedness_checking_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Set of facts: \n\n {documents} \n\n LLM generation: {generation}"),
    ]
)

# 답변의 할루시네이션 평가기 생성
groundedness_grader = groundedness_checking_prompt | structured_llm_grader

3. 질문 재작성 체인

또한 생성된 결과가 쿼리와도 관련이 있어야 한다. 그래서 답변을 평가하는 체인도 하나 필요하다.

from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI


class GradeAnswer(BaseModel):
    """A binary score indicating whether the question is addressed."""

    # 답변의 관련성 평가: 'yes' 또는 'no'로 표기(yes: 관련성 있음, no: 관련성 없음)
    binary_score: str = Field(
        description="Answer addresses the question, 'yes' or 'no'"
    )


llm = ChatOpenAI(model=MODEL_NAME, temperature=0)

# llm 에 GradeAnswer 바인딩
structured_llm_grader = llm.with_structured_output(GradeAnswer)

# 시스템 프롬프트 정의
system = """You are a grader assessing whether an answer addresses / resolves a question \n 
     Give a binary score 'yes' or 'no'. Yes' means that the answer resolves the question."""

# 프롬프트 생성
answer_grader_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "User question: \n\n {question} \n\n LLM generation: {generation}"),
    ]
)

# 답변 평가기 생성
answer_grader = answer_grader_prompt | structured_llm_grader

처음에 받은 질문과 답변 사이에 연관성이 여부를 바탕으로 yes혹은 no를 반환한다. 관련성이 없다면 다시 LLM이 결과를 도출하게끔하거나, Task에 따라 쿼리도 새로 생성할 수 있어야 한다.

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser


llm = ChatOpenAI(model=MODEL_NAME, temperature=0)

# 시스템 프롬프트 정의
# 입력 질문을 벡터스토어 검색에 최적화된 형태로 변환하는 시스템 역할 정의
system = """You a question re-writer that converts an input question to a better version that is optimized \n 
     for vectorstore retrieval. Look at the input and try to reason about the underlying semantic intent / meaning."""

# 시스템 메시지와 초기 질문을 포함한 프롬프트 템플릿 생성
re_write_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        (
            "human",
            "Here is the initial question: \n\n {question} \n Formulate an improved question.",
        ),
    ]
)

# 질문 재작성기 생성
question_rewriter = re_write_prompt | llm | StrOutputParser()

위는 혹시 필요에 따라 쿼리를 다시 만들기 위한 체인을 만드는 것이다.

4. 상태 정의

from typing import List
from typing_extensions import TypedDict, Annotated


# 그래프의 상태를 나타내는 클래스 정의
class GraphState(TypedDict):
    # 질문을 나타내는 문자열
    question: Annotated[str, "Question"]
    # LLM에 의해 생성된 응답을 나타내는 문자열
    generation: Annotated[str, "LLM Generation"]
    # 문서의 목록을 나타내는 문자열 리스트
    documents: Annotated[List[str], "Retrieved Documents"]

LangGraph니까 당연히 상태를 정의해야 하고, documents만 여러개의 문서를 반환받아야 하기 때문에, List[Str]임을 확인한다.

5. 노드 정의

# 문서 검색
def retrieve(state):
    print("==== [RETRIEVE] ====")
    question = state["question"]

    # 검색 수행
    documents = pdf_retriever.invoke(question)
    return {"documents": documents}


# 답변 생성
def generate(state):
    print("==== [GENERATE] ====")
    question = state["question"]
    documents = state["documents"]

    # RAG 생성
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {"generation": generation}


# 검색된 문서의 관련성 평가
def grade_documents(state):
    print("==== [GRADE DOCUMENTS] ====")
    question = state["question"]
    documents = state["documents"]

    # 각 문서 점수 평가
    filtered_docs = []
    for d in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": d.page_content}
        )
        grade = score.binary_score
        if grade == "yes":
            print("==== GRADE: DOCUMENT RELEVANT ====")
            filtered_docs.append(d)
        else:
            print("==== GRADE: DOCUMENT NOT RELEVANT ====")
            continue
    return {"documents": filtered_docs}


# 질문 변환
def transform_query(state):
    print("==== [TRANSFORM QUERY] ====")
    question = state["question"]

    # 질문 재작성
    better_question = question_rewriter.invoke({"question": question})
    return {"question": better_question}

또 LangGraph를 만드는 것이기 때문에 자연스럽게 노드를 위한 함수를 만들어준다. 여기서는
문서 검색, 답변 생성, 관련성 평가, 질문 변환 노드를 만든다.

6. 조건부 엣지 정의

# 답변 생성 여부 결정
def decide_to_generate(state):
    print("==== [ASSESS GRADED DOCUMENTS] ====")
    state["question"]
    filtered_documents = state["documents"]

    if not filtered_documents:
        # 모든 문서가 관련성이 없는 경우
        # 새로운 쿼리 생성
        print(
            "==== [DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY] ===="
        )
        return "transform_query"
    else:
        # 관련 문서가 있는 경우 답변 생성
        print("==== [DECISION: GENERATE] ====")
        return "generate"


# 생성된 답변의 문서 및 질문과의 관련성 평가
def grade_generation_v_documents_and_question(state):
    print("==== [CHECK HALLUCINATIONS] ====")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    score = groundedness_grader.invoke(
        {"documents": documents, "generation": generation}
    )
    grade = score.binary_score

    # 환각 여부 확인
    if grade == "yes":
        print("==== [DECISION: GENERATION IS GROUNDED IN DOCUMENTS] ====")
        # 질문 해결 여부 확인
        print("==== [GRADE GENERATION vs QUESTION] ====")
        score = answer_grader.invoke({"question": question, "generation": generation})
        grade = score.binary_score
        if grade == "yes":
            print("==== [DECISION: GENERATION ADDRESSES QUESTION] ====")
            return "relevant"
        else:
            print("==== [DECISION: GENERATION DOES NOT ADDRESS QUESTION] ====")
            return "not relevant"
    else:
        print("==== [DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY] ====")
        return "hallucination"

조건부 엣지는 두개가 필요하고, 첫번째는 쿼리와 문서의 관련성을 파악하여, 관련성이 있으면 바로 답변을 생성하고, 관련성이 없다면 쿼리를 수정해 관련 있는 문서가 나올 수 있도록 하는 분기이다.

두번째 조건부 엣지는 생성된 문서와 LLM이 반환한 답변 사이의 관련성 여부에 따라 할루시네이션인지, 할루시네이션은 아니지만 관련이 없는지, 제대로 된 답변이 생성되었는지를 확인하는 조건이다.

7. 그래프 정의 및 생성

from langgraph.graph import END, StateGraph, START
from langgraph.checkpoint.memory import MemorySaver

# 그래프 상태 초기화
workflow = StateGraph(GraphState)

# 노드 정의
workflow.add_node("retrieve", retrieve)  # retrieve
workflow.add_node("grade_documents", grade_documents)  # grade documents
workflow.add_node("generate", generate)  # generatae
workflow.add_node("transform_query", transform_query)  # transform_query

# 엣지 정의
workflow.add_edge(START, "retrieve")
workflow.add_edge("retrieve", "grade_documents")

# 문서 평가 노드에서 조건부 엣지 추가
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "transform_query": "transform_query",
        "generate": "generate",
    },
)

# 엣지 정의
workflow.add_edge("transform_query", "retrieve")

# 답변 생성 노드에서 조건부 엣지 추가
workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "hallucination": "generate",
        "relevant": END,
        "not relevant": "transform_query",
    },
)

# 그래프 컴파일
app = workflow.compile(checkpointer=MemorySaver())

from langchain_core.runnables import RunnableConfig
from langchain_teddynote.messages import stream_graph, invoke_graph, random_uuid

# config 설정(재귀 최대 횟수, thread_id)
config = RunnableConfig(recursion_limit=10, configurable={"thread_id": random_uuid()})

# 질문 입력
inputs = {
    "question": "삼성전자가 개발한 생성형 AI 의 이름은?",
}

# 그래프 실행
invoke_graph(
    app, inputs, config, ["retrieve", "transform_query", "grade_documents", "generate"]
)

엣지의 경우 문서 검색 노드, 문서 평가 노드, 답변 생성 노드, 쿼리 수정 노드를 중심으로 이루어진다.
우선 문서 검색 노드에서 관련 문서를 검색하고 문서 평가를 한다. 문서가 관련이 없으면 쿼리를 바꿔 다시 문서를 검색하고, 관련이 있으면 답변 생성 노드로 이동해 답변을 생성한다.

답변 생성 노드에서는 결과가 할루시네이션이 있다면 답변 생성을 반복하고, 그저 쿼리와 관련이 없다면 다시 쿼리를 수정하여 문서를 다시 검색한다. 관련이 있는 적절한 결과가 반환되면 그냥 결과를 반환한다.

김지우

프로그래밍 기록 + 공부 기록

이전 포스트

LangGraph : UseCase (CRAG)

다음 포스트