챗봇의 멀티턴

Sirius·2024년 8월 1일

LLM

싱글턴 vs 멀티턴

1) 싱글턴

정의: 한 번의 질문과 한 번의 응답으로 이루어진 대화 방식이다.
사용자가 질문을 하고 챗봇이 이에 대한 응답을 제공하면 대화가 종료된다.

즉 대화의 상태나 맥락을 유지하지 않는다.

2) 멀티턴

정의: 여러 번의 질문과 응답이 연속적으로 이루어지는 대화 방식이다. 챗봇은 대화의 맥락을 유지하며 여러 턴에 걸쳐 대화를 이어간다.

챗봇은 이전 대화의 맥락이나 상태를 기억하고 유지해야 한다.
챗봇은 대화의 맥락을 이해하고 이에 따라 적절한 응답을 제공한다.

실제구현(LlamaIndex)

대화형 AI 시스템에서 사용되는 ChatSummaryMemoryBuffer를 사용하여, 대화 기록을 일정한 토큰 길이로 제한하면서 요약하는 방법이다.

이 접근법은 비용과 지연 시간을 줄이기 위해 대화 내역의 일부를 요약하여 메모리 버퍼에 적재하는 방식이다.

1) 초기 대화 기록 생성

chat_history = [
    ChatMessage(role="user", content="What is LlamaIndex?"),
    ChatMessage(
        role="assistant",
        content="LlamaaIndex is the leading data framework for building LLM applications",
    ),
    ChatMessage(role="user", content="Can you give me some more details?"),
    ChatMessage(
        role="assistant",
        content="""LlamaIndex is a framework for building context-augmented LLM applications. Context augmentation refers to any use case that applies LLMs on top of your private or domain-specific data. Some popular use cases include the following: 
        Question-Answering Chatbots (commonly referred to as RAG systems, which stands for "Retrieval-Augmented Generation"), Document Understanding and Extraction, Autonomous Agents that can perform research and take actions
        LlamaIndex provides the tools to build any of these above use cases from prototype to production. The tools allow you to both ingest/process this data and implement complex query workflows combining data access with LLM prompting.""",
    ),
]

대화 기록이 메모리 버퍼에 전부 들어가지 않도록 일부 대화 기록을 생성합니다.

2) ChatSummaryMemoryBuffer 인스턴스 생성

LLM과 요약을 위한 토큰 제한을 설정하여 ChatSummaryMemoryBuffer 인스턴스를 생성합니다.

model = "gpt-4-0125-preview"
summarizer_llm = OpenAiLlm(model_name=model, max_tokens=256)
tokenizer_fn = tiktoken.encoding_for_model(model).encode
memory = ChatSummaryMemoryBuffer.from_defaults(
    chat_history=chat_history,
    llm=summarizer_llm,
    token_limit=2,  # 매우 작은 토큰 한도 설정
    tokenizer_fn=tokenizer_fn,
)

3) 4. 대화 기록 요약

대화 기록을 출력하면 이전 메시지들이 요약된 것을 확인할 수 있다.

history = memory.get()
print(history)

[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, content='The user inquired about LlamaIndex, a leading data framework for developing LLM applications. The assistant explained that LlamaIndex is used for building context-augmented LLM applications, giving examples such as Question-Answering Chatbots, Document Understanding and Extraction, and Autonomous Agents. It was mentioned that LlamaIndex provides tools for ingesting and processing data, as well as implementing complex query workflows combining data access with LLM prompting.', additional_kwargs={})]

4) 새로운 대화 기록 추가

new_chat_history = [
    ChatMessage(role="user", content="Why context augmentation?"),
    ChatMessage(
        role="assistant",
        content="LLMs offer a natural language interface between humans and data. Widely available models come pre-trained on huge amounts of publicly available data. However, they are not trained on your data, which may be private or specific to the problem you're trying to solve. It's behind APIs, in SQL databases, or trapped in PDFs and slide decks. LlamaIndex provides tooling to enable context augmentation. A popular example is Retrieval-Augmented Generation (RAG) which combines context with LLMs at inference time. Another is finetuning.",
    ),
    ChatMessage(role="user", content="Who is LlamaIndex for?"),
    ChatMessage(
        role="assistant",
        content="LlamaIndex provides tools for beginners, advanced users, and everyone in between. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. For more complex applications, our lower-level APIs allow advanced users to customize and extend any module—data connectors, indices, retrievers, query engines, reranking modules—to fit their needs.",
    ),
]

memory.put(new_chat_history[0])
memory.put(new_chat_history[1])
memory.put(new_chat_history[2])
memory.put(new_chat_history[3])
history = memory.get()

새로운 대화 기록을 추가하고, 메모리 버퍼를 업데이트한다.

Sirius

이전 포스트

인덱싱 & 임베딩

다음 포스트

챗봇의 멀티턴

싱글턴 vs 멀티턴

1) 싱글턴

2) 멀티턴

실제구현(LlamaIndex)

1) 초기 대화 기록 생성

2) ChatSummaryMemoryBuffer 인스턴스 생성

3) 4. 대화 기록 요약

4) 새로운 대화 기록 추가

인덱싱 & 임베딩

챗봇의 스트리밍

0개의 댓글