[LangChain] RAG 기법 Map-Reduce

pysun·2024년 12월 11일

LangChain

목록 보기
11/13

Map-Reduce

출처: https://pkgpl.org/wp-content/uploads/2023/10/image-3.png?w=2048

질문에 대한 관련 문서 각각을 처리(Map)하고, 결과를 결합해 최종 출력을 생성(Reduce)하는 방법

✨ 장점
1. 긴 문서에 적합: 각각의 문서를 요약하므로 검색한 문서가 많거나 길 때 유용
2. 정보 보전: 각각의 문서를 독립적으로 처리하므로 정보 보존에 유리

☠️ 단점
1. 토큰 사용 ↑: 문서별로 처리하고 최종적으로 출력을 수행하는 과정에서 AI 모델을 계속 사용해야하므로 토큰 사용이 많아질 수 있음
2. 속도: 처리해야하는 과정이 많아지기 때문에 Stuff에 비해 상대적으로 느림


from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.storage import LocalFileStore
from langchain.embeddings import CacheBackedEmbeddings
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS, Chroma
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda

############ 초기설정 ############ 

# LLM 생성
llm = ChatOpenAI()

# 임베딩 캐시 저장 경로
cache_dir = LocalFileStore('/Users/sunbok/workspace/fullstack_gpt/Rag/.cache')

# 데이터 로드
loader = UnstructuredFileLoader('/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf')

# Splitter 로드
splitter = CharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=100,
    separator='\n'
)

# split된 데이터 로드
docs = loader.load_and_split(text_splitter=splitter)

# 임베딩 모델 생성
embeddings = OpenAIEmbeddings()

# 캐시 임베딩
cache_embeddings = CacheBackedEmbeddings.from_bytes_store(
    embeddings,
    cache_dir
)

# 임베딩된 데이터 로드 (벡터 저장소)
vectorstore = FAISS.from_documents(docs, cache_embeddings)

# 벡터 저장소를 retriever로 사용
retriever = vectorstore.as_retriever()
retriever.invoke('윈스턴이 근무하는 곳은?')
[Document(page_content='Winston kept his back turned to the telescreen. It was safer, though, as he well knew, even a back can be revealing. A kilometre away the Ministry of Truth, his place of work, towered vast and white above the grimy landscape. This, he thought with a sort of vague distaste—this was London, chief city of Airstrip One, itself the third most populous of the provinces of Oceania. He tried to squeeze out some childhood memory that should tell him whether London had always been quite like this. Were there always these vis- tas of rotting nineteenth-century houses, their sides shored up with baulks of timber, their windows patched with card- board and their roofs with corrugated iron, their crazy', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'}),
 Document(page_content='Winston turned round abruptly. He had set his features into the expression of quiet optimism which it was advis- able to wear when facing the telescreen. He crossed the room into the tiny kitchen. By leaving the Ministry at this time of day he had sacrificed his lunch in the canteen, and he was aware that there was no food in the kitchen except a hunk of dark-coloured bread which had got to be saved for tomorrow’s breakfast. He took down from the shelf a bottle of colourless liquid with a plain white label marked VICTORY GIN. It gave off a sickly, oily smell, as of Chinese rice-spirit. Winston poured out nearly a teacupful, nerved', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'}),
 Document(page_content='ing thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from enter- ing along with him.', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'}),
 Document(page_content='ures which had something to do with the production of pig-iron. The voice came from an oblong metal plaque like a dulled mirror which formed part of the surface of the right-hand wall. Winston turned a switch and the voice sank somewhat, though the words were still distinguish- able. The instrument (the telescreen, it was called) could be dimmed, but there was no way of shutting it off complete- ly. He moved over to the window: a smallish, frail figure, the meagreness of his body merely emphasized by the blue overalls which were the uniform of the party. His hair was very fair, his face naturally sanguine, his skin roughened by coarse soap and blunt razor blades and the cold of the win- ter that had just ended.', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'})]

############ Map reduce 방식 Chain 구성 ############

# 문서별로 질의와 관련있는 내용 추출하는 프롬프트 템플릿과 Chain
map_doc_prompt = ChatPromptTemplate.from_messages([
    ('system', 'Use the following portion of a long document to see if any of the text is relevant to answer the question.\
     Return any relavant text verbatim.\
     -----\
     {context}'),
     ('human', '{question}')
])

map_doc_chain = map_doc_prompt | llm


# 관련 문서별로 질의에 해당하는 결과가 있는지 llm에 물어보고 해당 결과 저장
# return 값은 최종 Chain의 context에 매핑됨
def map_docs(inputs):
    documents = inputs['documents']
    question = inputs['question']
    print(documents)

    results = []
    for document in documents:
        result = map_doc_chain.invoke({
            'context': document.page_content,
            'question':question
        }).content
        results.append(result)

    results = '\n\n'.join(results)
    return results

# 최종 Chain의 context를 만들기 위한 중간단계 Chain
map_chain = {'documents': retriever, 'question':RunnablePassthrough()} | RunnableLambda(map_docs) # RunnableLambda: chain 내부에서 함수 호출을 도와줌
map_chain.invoke('윈스턴이 근무하는 곳은?')
[Document(page_content='Winston kept his back turned to the telescreen. It was safer, though, as he well knew, even a back can be revealing. A kilometre away the Ministry of Truth, his place of work, towered vast and white above the grimy landscape. This, he thought with a sort of vague distaste—this was London, chief city of Airstrip One, itself the third most populous of the provinces of Oceania. He tried to squeeze out some childhood memory that should tell him whether London had always been quite like this. Were there always these vis- tas of rotting nineteenth-century houses, their sides shored up with baulks of timber, their windows patched with card- board and their roofs with corrugated iron, their crazy', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'}), Document(page_content='Winston turned round abruptly. He had set his features into the expression of quiet optimism which it was advis- able to wear when facing the telescreen. He crossed the room into the tiny kitchen. By leaving the Ministry at this time of day he had sacrificed his lunch in the canteen, and he was aware that there was no food in the kitchen except a hunk of dark-coloured bread which had got to be saved for tomorrow’s breakfast. He took down from the shelf a bottle of colourless liquid with a plain white label marked VICTORY GIN. It gave off a sickly, oily smell, as of Chinese rice-spirit. Winston poured out nearly a teacupful, nerved', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'}), Document(page_content='ing thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from enter- ing along with him.', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'}), Document(page_content='ures which had something to do with the production of pig-iron. The voice came from an oblong metal plaque like a dulled mirror which formed part of the surface of the right-hand wall. Winston turned a switch and the voice sank somewhat, though the words were still distinguish- able. The instrument (the telescreen, it was called) could be dimmed, but there was no way of shutting it off complete- ly. He moved over to the window: a smallish, frail figure, the meagreness of his body merely emphasized by the blue overalls which were the uniform of the party. His hair was very fair, his face naturally sanguine, his skin roughened by coarse soap and blunt razor blades and the cold of the win- ter that had just ended.', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'})]
'윈스턴이 근무하는 곳은 진실성부(Ministry of Truth)입니다.\n\n윈스턴이 근무하는 곳은 "Ministry"입니다.\n\n윈스턴 스미스가 근무하는 곳은 "승리 아파트먼트"입니다.\n\n윈스턴이 근무하는 곳은 철광석 생산과 관련된 측면을 가진 일종의 시설이었습니다. 텍스트에서는 "pig-iron production"에 관련이 있다고 언급되어 있습니다.'

# 최종 프롬프트 템플릿
final_prompt = ChatPromptTemplate.from_messages([
    ('system', "Given the following extracted parts of a long document and a question, create a final answer.\
     If you don't know the answer, just say that you don't know. Don't try to make up an answer.\
     -----\
     {context}"),
    ('human', "{question}")
])

# 최종 Chain
chain = {'context': map_chain, 'question':RunnablePassthrough()} | final_prompt | llm

chain.invoke('윈스턴이 근무하는 곳은 어디야?')
[Document(page_content='Winston kept his back turned to the telescreen. It was safer, though, as he well knew, even a back can be revealing. A kilometre away the Ministry of Truth, his place of work, towered vast and white above the grimy landscape. This, he thought with a sort of vague distaste—this was London, chief city of Airstrip One, itself the third most populous of the provinces of Oceania. He tried to squeeze out some childhood memory that should tell him whether London had always been quite like this. Were there always these vis- tas of rotting nineteenth-century houses, their sides shored up with baulks of timber, their windows patched with card- board and their roofs with corrugated iron, their crazy', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'}), Document(page_content='Winston turned round abruptly. He had set his features into the expression of quiet optimism which it was advis- able to wear when facing the telescreen. He crossed the room into the tiny kitchen. By leaving the Ministry at this time of day he had sacrificed his lunch in the canteen, and he was aware that there was no food in the kitchen except a hunk of dark-coloured bread which had got to be saved for tomorrow’s breakfast. He took down from the shelf a bottle of colourless liquid with a plain white label marked VICTORY GIN. It gave off a sickly, oily smell, as of Chinese rice-spirit. Winston poured out nearly a teacupful, nerved', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'}), Document(page_content='ing thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from enter- ing along with him.', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'}), Document(page_content='ures which had something to do with the production of pig-iron. The voice came from an oblong metal plaque like a dulled mirror which formed part of the surface of the right-hand wall. Winston turned a switch and the voice sank somewhat, though the words were still distinguish- able. The instrument (the telescreen, it was called) could be dimmed, but there was no way of shutting it off complete- ly. He moved over to the window: a smallish, frail figure, the meagreness of his body merely emphasized by the blue overalls which were the uniform of the party. His hair was very fair, his face naturally sanguine, his skin roughened by coarse soap and blunt razor blades and the cold of the win- ter that had just ended.', metadata={'source': '/Users/sunbok/workspace/fullstack_gpt/files/1984_chapter_one.pdf'})]
AIMessage(content='윈스턴이 근무하는 곳은 진실성부(Ministry of Truth)입니다.', response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 190, 'total_tokens': 218, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-1466206b-be06-4262-9519-4cff5f2f32f1-0')
profile
배움의 흔적이 성장으로 이어지는 공간

0개의 댓글