AI 부트캠프 - 29일차

Cookie Baking·2024년 11월 12일

AI 부트 캠프 TIL

목록 보기

22/42

알고리즘 - 피보나치 수열 (출처 : 프로그래머스)

First Attempt

def solution(n):
    answer = 0
    
    if n == 0:
        answer = 0
    elif n == 1 or n == 2:
        answer = 1
    else:
        answer = 2 * solution(n-2) + solution(n-3)
        
    return answer

-> 반복을 택했어야 시간 복잡도가 낮아짐

The Answer

def solution(n):
    answer = 0
    cnt = 0
    forward = 1
    s_forward = 1
    
    if n == 0:
        answer = 0
    elif n == 1:
        answer = 1
    else:
        for i in range(2, n):
            answer = forward + s_forward
            
            s_forward = forward
            forward = answer
            
            
    answer %= 1234567     
    return answer

LangChain

일종의 모듈화가 잘 된 라이브러리라고 생각하면 됨

주요 개념
1. 언어 모델 (LLM)
언어 모델은 주어진 입력을 바탕으로 텍스트를 생성함 LangChain은 OpenAI의 GPT 모델을 포함해 다양한 언어 모델과의 통합을 지원함

프롬프트 템플릿
프롬프트 템플릿은 프롬프트를 동적으로 생성하는 데 사용됨
특정 입력 값에 따라 템플릿이 채워져 모델에 전달되므로 반복적인 작업을 단순화함
체인
여러 단계를 거치는 워크플로우를 하나로 묶어주는 기능임
예를 들ㅇ러, 사용자의 질문을 분석해 필요한 데이터를 겁색하고, 검색 결과를 기반으로 응답을 생성하는 일련의 과정을 체인으로 구성할 수 있음
에이전트
에이전트는 동적으로 필요한 작업을 결정하고 수행하는 컴포넌트임
질문에 따라 답변하기 위해 API 호출이 필요한지, 또는 단순히 텍스트 생성을 해야 하는지를 판단해 작업을 실행함
벡터 데이터베이스
벡터 데이터베이스는 텍스트를 벡터로 변환해 저장하고, 이후 유사한 벡터를 빠르게 검색할 수 있도록 도움
이를 통해 저장된 데이터와 유사한 질문에 빠르게 응답할 수 있음

langChain의 정보를 알고 싶다면?
from langchain_openai import ChatOpenAI
help(ChatOpenAI)

API key 설정

import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
os.environ["OPENAI_API_KEY"] = api_key

LangChain 기본 모델 생성 실습

model = ChatOpenAI(model = "gpt-4")
# 모델에 메시지 전달
response = model.invoke([HumanMessage(content="안녕하세요, 무엇을 도와드릴까요?")])
print(response.content)

프롬프트 탬플릿 사용하기
프롬프트 탬플릿은 다양한 입력을 받아 메시지를 생성하는데 도움을 줌
입력을 전달받아 원하는 형태로 바꿔줄 수 있음
langChain에서의 프롬프트 템플릿은 질문을 쉽게 반복하거나 다양한 변수를 사용해 입력을 유연하게 조정하는 데 사용함
from langchain_core.prompts import ChatPromptTemplate
system_template = "Translate the following sentence from English to {language}:"
prompt_template = ChatPromptTemplate.from_messages([
{"system", system_template},
{"user", "{text}"}
])
result = prompte_template.invoke({"language":"French", "text":"How are you?"})
print(resukt.to_messages())

LangChain Experssion Language(LCEL)로 체인 연결
여러 컴포넌트를 체인으로 연결하여 데이터 흐름을 통제하는 LCEL을 사용함

위에서 ChatGPT model과 프롬프트 템플릿인 prompt_template 을 생성했음

parser을 추가함 (응답을 파싱하는 파서)
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()
chain = prompt_template | model | parser
response = chain.invoke({"language" : "Spanish", "text":"Where is the library?"})
print(response)

FAISS를 활용한 벡터 데이터베이스 구성 및 쿼리

FAISS는 벡터 유사성 검색을 위한 라이브러리임
OpenAIEMbeddings로 텍스트를 벡터로 변환해 FAISS인덱스에 저장함

OpenAI 임베딩 모델로 벡터 임베딩 생성

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

FAISS 인덱스 초기화

import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore

# FAISS 인덱스 생성
index = faiss.IndexFlatL2(len(embeddings.embed_query("hello world")))
vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={}
)

벡터 데이터베이스에 문서 추가

from langchain_core.documents import Document
from uuid import uuid4

# 문서 생성
documents = [
    Document(page_content="LangChain을 사용해 프로젝트를 구축하고 있습니다!", metadata={"source": "tweet"}),
    Document(page_content="내일 날씨는 맑고 따뜻할 예정입니다.", metadata={"source": "news"}),
    Document(page_content="오늘 아침에는 팬케이크와 계란을 먹었어요.", metadata={"source": "personal"}),
    Document(page_content="주식 시장이 경기 침체 우려로 하락 중입니다.", metadata={"source": "news"}),
]

# 고유 ID 생성 및 문서 추가
uuids = [str(uuid4()) for _ in range(len(documents))]
vector_store.add_documents(documents=documents, ids=uuids)

벡터 데이터베이스 쿼리

# 기본 유사성 검색
results = vector_store.similarity_search("내일 날씨는 어떨까요?", k=2, filter={"source": "news"})
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

# 점수와 함께 유사성 검색
results_with_scores = vector_store.similarity_search_with_score("LangChain에 대해 이야기해주세요.", k=2, filter={"source": "tweet"})
for res, score in results_with_scores:
    print(f"* [SIM={score:.3f}] {res.page_content} [{res.metadata}]")

결과

* 내일 날씨는 맑고 따뜻할 예정입니다. [{'source': 'news'}]
* 주식 시장이 경기 침체 우려로 하락 중입니다. [{'source': 'news'}]
* [SIM=0.159] LangChain을 사용해 프로젝트를 구축하고 있습니다! [{'source': 'tweet'}]

RAG 체인에 FAISS 통합

FAISS를 retriever로 변환해 RAG 체인에서 사용함

retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 1})

RAG 체인 생성
LangChain의 모델과 프롬프트를 연결하여 RAG 체인을 구성함

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# 프롬프트 템플릿 정의
contextual_prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer the question using only the following context."),
    ("user", "Context: {context}\\n\\nQuestion: {question}")
])


class DebugPassThrough(RunnablePassthrough):
    def invoke(self, *args, **kwargs):
        output = super().invoke(*args, **kwargs)
        print("Debug Output:", output)
        return output
# 문서 리스트를 텍스트로 변환하는 단계 추가
class ContextToText(RunnablePassthrough):
    def invoke(self, inputs, config=None, **kwargs):  # config 인수 추가
        # context의 각 문서를 문자열로 결합
        context_text = "\n".join([doc.page_content for doc in inputs["context"]])
        return {"context": context_text, "question": inputs["question"]}

# RAG 체인에서 각 단계마다 DebugPassThrough 추가
rag_chain_debug = {
    "context": retriever,                    # 컨텍스트를 가져오는 retriever
    "question": DebugPassThrough()        # 사용자 질문이 그대로 전달되는지 확인하는 passthrough
}  | DebugPassThrough() | ContextToText()|   contextual_prompt | model

# 질문 실행 및 각 단계 출력 확인
response = rag_chain_debug.invoke("강사이름은?")
print("Final Response:")
print(response.content)

FAISS 인덱스의 저장 및 로드

# 인덱스 저장
vector_store.save_local("faiss_index")

# 저장된 인덱스 로드
new_vector_store = FAISS.load_local("faiss_index", embeddings)

FAISS 데이터베이스 병합

db1 = FAISS.from_texts(["문서 1 내용"], embeddings)
db2 = FAISS.from_texts(["문서 2 내용"], embeddings)

# 병합
db1.merge_from(db2)

Cookie Baking

이전 포스트

AI 부트캠프 - 28일차

다음 포스트