3_Summarization_sol

Jacob Kim·2024년 2월 1일

Naver project

Naver Project Week5

목록 보기

4/12

Use case

일련의 문서(PDF, 노션 페이지, 고객 질문 등)가 있고 그 내용을 요약하고 싶다고 가정해 보겠습니다.

텍스트를 이해하고 종합하는 데 능숙한 LLM은 이를 위한 훌륭한 도구입니다.

이 안내서에서는 LLM을 사용하여 문서 요약을 수행하는 방법을 살펴보겠습니다.

Overview

Summarization model을 구축할 때 가장 중요한 질문은 문서를 LLM의 컨텍스트 창으로 어떻게 전달할 것인가 하는 것입니다. 이를 위한 두 가지 일반적인 접근 방식이 있습니다:

stuff: 모든 문서를 하나의 프롬프트에 '채우기'만 하면 됩니다. 이 방법은 가장 간단한 접근 방식입니다.
Map-reduce: '맵' 단계에서 각 문서를 자체적으로 요약한 다음 그 요약을 최종 요약으로 '축소'합니다.

Quickstart

미리 보기를 위해 두 파이프라인을 load_summarize_chain이라는 단일 객체로 래핑할 수 있습니다.

블로그 게시물을 요약하고 싶다고 가정해 봅시다. 이를 몇 줄의 코드로 만들 수 있습니다.

먼저 환경 변수를 설정하고 패키지를 설치합니다:

!pip install openai tiktoken chromadb langchain

Collecting openai
  Downloading openai-1.10.0-py3-none-any.whl (225 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 225.1/225.1 kB 2.9 MB/s eta 0:00:00
Collecting tiktoken
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 11.3 MB/s eta 0:00:00

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()
# 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

특히 다음과 같이 더 큰 컨텍스트 창 모델을 사용하는 경우 chain_type="stuff"를 사용할 수 있습니다:

16k 토큰 OpenAI gpt-3.5-turbo-1106
100k 토큰 Anthropic Claude-2

chain_type="map_reduce" 또는 chain_type="refine"을 제공할 수도 있습니다.

from langchain.chains.summarize import load_summarize_chain
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-1106")
chain = load_summarize_chain(llm, chain_type="stuff")

chain.run(docs)

The article discusses the concept of LLM-powered autonomous agents, with a focus on the components of planning, memory, and tool use. It includes case studies and proof-of-concept examples, as well as challenges and references. The author explores the potential of using large language models (LLMs) as the core controller for autonomous agents, highlighting the capabilities and limitations of this approach.

Option 1. Stuff

load_summarize_chain을 chain_type="stuff"와 함께 사용하면
StuffDocumentsChain을 사용하게 됩니다.
이 체인은 문서 목록을 가져와 프롬프트에 모두 삽입하고 프롬프트를 LLM에 전달합니다:

from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate

# Define prompt
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

# Define LLM chain
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
llm_chain = LLMChain(llm=llm, prompt=prompt)
print(llm_chain)

# Define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")

docs = loader.load() # Web Base Loader
print(stuff_chain.run(docs))

The article discusses the concept of building autonomous agents powered by large language models (LLMs). It explores the components of such agents, including planning, memory, and tool use. The article provides case studies and proof-of-concept examples of LLM-powered agents in various domains. It also highlights the challenges and limitations of using LLMs in autonomous agents.

load_summarize_chain를 사용하여 앞의 결과를 재현한 것을 볼 수 있습니다.

Option 2. Map-Reduce

Map-Reduce 접근 방식을 풀어보겠습니다. 이를 위해 먼저 LLMChain을 사용하여 각 문서를 개별 요약에 매핑합니다. 그런 다음 ReduceDocumentsChain을 사용하여 이러한 요약을 하나의 전역 요약으로 결합합니다.

먼저, 각 문서를 개별 요약에 매핑하는 데 사용할 LLMChain을 지정합니다:

from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.text_splitter import CharacterTextSplitter

llm = ChatOpenAI(temperature=0)

# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

프롬프트 허브를 사용하여 프롬프트를 저장하고 가져올 수도 있습니다.

Prompthub with Langchain

!pip install langchainhub

Collecting langchainhub
  Downloading langchainhub-0.1.14-py3-none-any.whl (3.4 kB)
Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.10/dist-packages (from langchainhub) (2.31.0)
Collecting types-requests<3.0.0.0,>=2.31.0.2 (from langchainhub)
  Downloading types_requests-2.31.0.10-py3-none-any.whl (14 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchainhub) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchainhub) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchainhub) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchainhub) (2023.11.17)
Collecting urllib3<3,>=1.21.1 (from requests<3,>=2->langchainhub)
  Downloading urllib3-2.1.0-py3-none-any.whl (104 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 104.6/104.6 kB 3.0 MB/s eta 0:00:00
Installing collected packages: urllib3, types-requests, langchainhub
  Attempting uninstall: urllib3

from langchain import hub

map_prompt = hub.pull("rlm/map-prompt")
map_chain = LLMChain(llm=llm, prompt=map_prompt)

map_prompt

#ChatPromptTemplate(input_variables=['docs'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['docs'], template='The following is a set of documents:\n{docs}\nBased on this list of docs, please identify the main themes \nHelpful Answer:'))])

ReduceDocumentsChain은 문서 매핑 결과를 가져와서 단일 출력으로 줄이는 작업을 처리합니다. 이 함수는 일반 CombineDocumentsChain(StuffDocumentsChain과 같은)을 감싸지만, 누적 크기가 token_max를 초과하는 경우 CombineDocumentsChain으로 전달하기 전에 문서를 축소하는 기능을 추가합니다. 이 예시에서는 실제로 문서를 결합하는 데 체인을 재사용하여 문서를 축소할 수도 있습니다.

따라서 매핑된 문서의 누적 토큰 수가 4000 토큰을 초과하면 4000 토큰 미만의 문서를 StuffDocumentsChain에 재귀적으로 전달하여 일괄 요약을 생성합니다. 그리고 이러한 일괄 요약이 누적적으로 4000 토큰 미만이 되면 마지막으로 모든 문서를 StuffDocumentsChain에 전달하여 최종 요약을 생성합니다.

# Reduce
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes.
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)

# Note we can also get this from the prompt hub, as noted above
reduce_prompt = hub.pull("rlm/map-prompt")

reduce_prompt

#ChatPromptTemplate(input_variables=['docs'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['docs'], template='The following is a set of 
#documents:\n{docs}\nBased on this list of docs, please identify the main themes \nHelpful Answer:'))])

# Run chain
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="docs"
)

# Combines and iteravely reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=4000,
)

map and reduce chain을 이용해서 하나로 줄입니다.:

# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=False,
)

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs) # loader.load()

WARNING:langchain.text_splitter:Created a chunk of size 1003, which is longer than the specified 1000

print(map_reduce_chain.run(split_docs))

Based on the list of documents provided, the main themes can be identified as follows:

1. Large Language Models (LLMs): This theme focuses on the concept and capabilities of LLMs as discussed in the documents. It includes discussions on LLM-powered autonomous agents and their potential beyond generating written content.

2. Prompting and Prompt Engineering: This theme explores the use of prompts and prompt engineering techniques in working with LLMs. It may include discussions on how to effectively prompt LLMs to achieve desired outputs or behaviors.

3. Autonomous Agents: This theme discusses the concept of autonomous agents and their components within an LLM-powered system. It covers topics such as planning, memory, tool use, and the agent's ability to solve problems and perform tasks.

4. Steerability: This theme focuses on the concept of steerability in LLMs and autonomous agents. It may include discussions on how to control or guide the behavior of LLMs and agents to achieve specific outcomes.

5. NLP and Language Model Applications: This theme explores the applications of natural language processing (NLP) and language models in various domains. It may include discussions on specific use cases, case studies, or examples of LLM-powered agents in action.

6. Resources and Tools: This theme covers the availability of resources and tools related to LLMs and autonomous agents. It may include references to datasets, APIs, libraries, or frameworks that are useful for working with LLMs and building autonomous agents.

These main themes provide an overview of the topics covered in the list of documents and highlight the key areas of focus within the set of documents.

Option 3. Refine

Refine은 map-reduce와 유사합니다:

문서 구체화 체인은 입력 문서를 반복하고 반복적으로 응답을 업데이트하여 응답을 구성합니다. 각 문서에 대해 문서가 아닌 모든 입력, 현재 문서, 최신 중간 답변을 LLM 체인으로 전달하여 새로운 답변을 얻습니다.

이 기능은 chain_type="refine"을 지정하여 쉽게 실행할 수 있습니다.

chain = load_summarize_chain(llm, chain_type="refine")
chain.run(split_docs)

This article discusses the concept of building autonomous agents powered by large language models (LLMs) and explores their potential beyond generating written content. It covers various components and techniques used in building effective autonomous agents, such as planning, memory, self-reflection, and tool use. The article also introduces the GPT-Engineer project, which aims to create a repository of code given a task specified in natural language. It provides a sample conversation for task clarification using GPT-Engineer and highlights its practical application in code writing tasks. The article concludes by discussing the limitations of LLMs and the potential for LLM-empowered agents in scientific discovery. Overall, it provides insights into the advancements and challenges in developing autonomous agents powered by LLMs and their potential impact on various domains, including software development. The challenges include limitations in LLM-centered agents and the need to reason s

프롬프트를 제공하고 중간 단계를 반환하는 것도 가능합니다.

prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary in Italian"
    "If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate.from_template(refine_template)
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    question_prompt=prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key="output_text",
)
result = chain({"input_documents": split_docs}, return_only_outputs=True) # split docs

print(result["output_text"])

L'articolo discute il concetto di costruire agenti autonomi alimentati da grandi modelli di linguaggio (LLM) e fornisce dettagli sui loro componenti, come la pianificazione, la memoria e l'uso degli strumenti. Introduce il concetto di Chain of Hindsight (CoH) per l'auto-miglioramento e l'Algorithm Distillation (AD) per il reinforcement learning. L'articolo presenta dimostrazioni di proof-of-concept e discute le sfide associate alla costruzione di agenti alimentati da LLM. Esplora diversi tipi di memoria e spiega come la memoria esterna possa superare le limitazioni dell'attenzione limitata. L'articolo introduce anche l'uso di strumenti esterni da parte degli agenti LLM per estendere le loro capacità. Vengono presentati vari modelli, come MRKL, TALM, Toolformer e HuggingGPT, insieme a esperimenti ed esempi di LLM che utilizzano strumenti esterni. Viene introdotto il framework HuggingGPT come pianificatore di attività per selezionare modelli in base alle descrizioni e riassumere le risposte. Vengono evidenziate le sfide nell'utilizzo di LLM in applicazioni reali, insieme al benchmark API-Bank e all'esempio di ChemCrow, un agente nella scoperta scientifica. È importante affrontare le sfide identificate per utilizzare in modo efficace LLM in contesti pratici. Tuttavia, ci sono alcune limitazioni comuni, come la capacità limitata di contesto, le difficoltà nella pianificazione a lungo termine e nella decomposizione delle attività, nonché l'affidabilità dell'interfaccia di linguaggio naturale. L'articolo fa riferimento a diverse fonti che approfondiscono ulteriormente gli argomenti trattati.

print("\n\n".join(result["intermediate_steps"][:3]))

This article discusses the concept of building autonomous agents powered by large language models (LLM). It explores the different components of such agents, including planning, memory, and tool use. The potential of LLM extends beyond generating written content and can be used as a general problem solver. The article also provides examples of proof-of-concept demos and discusses the challenges associated with building LLM-powered agents.

Questo articolo discute il concetto di costruire agenti autonomi alimentati da grandi modelli di linguaggio (LLM). Esplora i diversi componenti di tali agenti, tra cui la pianificazione, la memoria e l'uso degli strumenti. Il potenziale di LLM si estende oltre la generazione di contenuti scritti e può essere utilizzato come risolutore di problemi generale. L'articolo fornisce anche esempi di dimostrazioni di proof-of-concept e discute le sfide associate alla costruzione di agenti alimentati da LLM.

Questo articolo discute il concetto di costruire agenti autonomi alimentati da grandi modelli di linguaggio (LLM). Esplora i diversi componenti di tali agenti, tra cui la pianificazione, la memoria e l'uso degli strumenti. Il potenziale di LLM si estende oltre la generazione di contenuti scritti e può essere utilizzato come risolutore di problemi generale. L'articolo fornisce anche esempi di dimostrazioni di proof-of-concept e discute le sfide associate alla costruzione di agenti alimentati da LLM. Inoltre, viene presentato il concetto di Chain of Hindsight (CoH) che permette al modello di migliorare i propri output attraverso un processo di auto-riflessione basato su feedback passati. Viene anche introdotto l'Algorithm Distillation (AD) che applica lo stesso principio alle traiettorie di apprendimento per compiti di reinforcement learning. Questi approcci dimostrano di poter migliorare le prestazioni degli agenti autonomi e di apprendere in modo più rapido rispetto ad altri metodi.

YouTube Loader

YouTube 자막 분석

!pip install youtube-transcript-api pytube faiss-cpu tiktoken

Collecting youtube-transcript-api
  Downloading youtube_transcript_api-0.6.1-py3-none-any.whl (24 kB)
Collecting pytube
  Downloading pytube-15.0.0-py3-none-any.whl (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.6/57.6 kB 2.3 MB/s eta 0:00:00
Collecting faiss-cpu
  Downloading faiss_cpu-1.7.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.6/17.6 MB 34.5 MB/s eta 0:00:00

# From: https://towardsdatascience.com/getting-started-with-langchain-a-beginners-guide-to-building-llm-powered-applications-95fc8898732c

from langchain.document_loaders import YoutubeLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

embeddings = OpenAIEmbeddings()
loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=HsonXuJs8-s")  # Cold showers FTW!
documents = loader.load()

# create the vectorestore to use as the index
db = FAISS.from_documents(documents, embeddings)
retriever = db.as_retriever()
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True)

query = "Should children do a cold shower"
result = qa({"query": query})

print(result['result'])

There is limited research on the effects of cold showers specifically for children. It is generally recommended to consult with a healthcare professional before making any changes to a child's bathing routine.

Pandas Data load agent

졍형 데이터 분석

!pip install langchain-experimental

Requirement already satisfied: pip in /usr/local/lib/python3.10/dist-packages (23.1.2)
Requirement already satisfied: install in /usr/local/lib/python3.10/dist-packages (1.3.5)
Collecting langchain-experimental
  Downloading langchain_experimental-0.0.45-py3-none-any.whl (162 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 162.8/162.8 kB 2.8 MB/s eta 0:00:00

from langchain.agents.agent_types import AgentType
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain_experimental.agents.agent_toolkits import create_csv_agent
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")

agent = create_csv_agent(
    OpenAI(temperature=0),
    "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv", # multi = [] list type
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

agent.run("how many rows are there?")



> Entering new AgentExecutor chain...
Thought: I need to count the number of rows
Action: python_repl_ast
Action Input: df.shape[0]
Observation: 891
Thought: I now know the final answer
Final Answer: There are 891 rows.

> Finished chain.
There are 891 rows.

# OpenAI

# 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

# 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

# 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

# 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

연습문제

외부 웹페이지 로더를 이용한 텍스트 요약

LangChain 라이브러리를 사용하여 주어진 문서들을 요약하는 파이프라인을 만들어야 합니다. 이 파이프라인은 문서 로딩, 임베딩 생성, 요약 과정을 포함해야 합니다.

WebBaseLoader를 사용하여 웹페이지의 텍스트 콘텐츠를 불러옵니다.

불러온 텍스트 콘텐츠의 임베딩을 생성합니다.

생성된 임베딩을 기반으로 텍스트를 요약하는 함수를 구현합니다.

stuff, map-reduce, refine으로 요약하는 모델을 구현합니다.

# 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

loader = WebBaseLoader("https://www.bbc.com/news/technology-67630454")
docs = loader.load()
# 분리되는 파트
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-1106")
chain = load_summarize_chain(llm, chain_type="stuff")

chain.run(docs)

Google has released a new AI model called Gemini, which it claims has advanced reasoning capabilities and can "think more carefully" when answering difficult questions. The AI was tested on problem-solving and knowledge in various subject areas and is said to be the most capable AI model yet. It can recognize and generate text, images, and audio, and will be integrated into Google\'s existing tools. Gemini is said to outperform human experts in intelligence tests and has the ability to learn from sources other than text, such as pictures. However, it faces competition from other AI products and concerns about the potential risks and dangers of AI technology.

# map-reduce
# Reduce
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes.
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)

# Run chain
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="docs"
)

# Combines and iteravely reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=3000,
)

# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=reduce_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=False,
)

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs) # loader.load()

print(map_reduce_chain.run(split_docs))

The main themes in this set of documents revolve around the advancements in AI technology, particularly Google's new AI model, Gemini, and its advanced reasoning capabilities. There is also a comparison between Gemini and OpenAI's platform GPT-4, as well as discussions about competition in the AI industry, including Elon Musk's xAI and Baidu. Concerns and discussions about the potential risks and regulations of AI are also highlighted, along with other AI-related news and developments, such as AI funding for mental health diagnosis and AI tools being abused by cyber-criminals.

# refine
chain = load_summarize_chain(llm, chain_type="refine")
chain.run(docs)

Google has released a new AI model called Gemini, which it claims has advanced reasoning capabilities and can "think more carefully" when answering difficult questions. The AI was tested on problem-solving and knowledge in various subject areas and is said to be the most capable AI model yet. It can recognize and generate text, images, and audio, and will be integrated into Google\'s existing tools. Gemini is said to outperform human experts in intelligence tests and can learn from sources other than text, such as pictures. However, it faces competition from other AI products and concerns about the potential harm of AI technology.

2. YouTube 요약

주어진 YouTube 비디오의 자막을 Langchain API를 사용하여 요약합니다.
QA 함수를 이용해서 영상에 있는 내용들을 정리해보세요.
질문들을 만들어서 영상의 내용들을 요약해 결과를 출력해보세요.

embeddings = OpenAIEmbeddings()
loader = YoutubeLoader.from_youtube_url("https://youtu.be/LBudghsdByQ?si=aJfjubFgDOThIBP9")
documents = loader.load()

# create the vectorestore to use as the index
db = FAISS.from_documents(documents, embeddings)
retriever = db.as_retriever()
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True)

query = "what's the matter with korea?"
result = qa({"query": query})

print(result['result'])

South Korea is experiencing a significant decline in fertility rates, with the country's fertility rate being at 0.8 children per woman in 2022, the lowest in the world. This means that the population is shrinking rapidly, and if this trend continues, South Korea will see a population implosion. The median age in South Korea has also increased from 18 in 1950 to 45 in 2023, and it is projected to be 59 by 2100, making it a country with a predominantly senior population. This shift in demographics poses significant challenges for the country's economy, healthcare system, and overall societal structure.

query = "한국에 무슨 문제가 있어?"
result = qa({"query": query})

print(result['result'])

한국의 인구 구성이 빠르게 노화하고 있으며, 출산율이 매우 낮아지고 있습니다. 이러한 추세는 장래에 사회적, 경제적 문제를 야기할 수 있습니다. 이러한 문제들은 많은 나라에서 발생하고 있으며, 세계적인 문제로 여겨집니다.

정형 데이터 분석

CSV 파일을 로드하고 기본적인 데이터 분석을 수행한 후, Langchain API를 사용하여 분석 결과를 요약합니다.

1. 파이썬 스크립트를 작성하여 CSV 파일을 입력으로 받습니다.
2. 파일을 분석하여 행 수, 특정 열의 평균 등 기본적인 통계를 계산합니다.
3. Langchain API를 통해 이 분석 결과를 요약하여 출력합니다.
4. california_housing_train 데이터셋을 이용해서 각 데이터 특징과 집값의 관계에 대해 분석해보세요.

agent = create_csv_agent(
    OpenAI(temperature=0),
    "/content/sample_data/california_housing_train.csv", # multi = [] list type
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

agent.run("how many data are there?")



> Entering new AgentExecutor chain...
Thought: I need to count the number of rows in the dataframe
Action: python_repl_ast
Action Input: len(df)
Observation: 17000
Thought: I now know the final answer
Final Answer: There are 17000 data in the dataframe.

> Finished chain.
There are 17000 data in the dataframe.

agent.run("give me the average of the longitude")



> Entering new AgentExecutor chain...
Thought: I need to calculate the average of the longitude column
Action: python_repl_ast
Action Input: df['longitude'].mean()
Observation: -119.5621082352941
Thought: I now know the final answer
Final Answer: The average of the longitude is -119.5621082352941.

> Finished chain.
The average of the longitude is -119.5621082352941.

agent.run("predict the house price.")



> Entering new AgentExecutor chain...
Thought: I need to use the data to make a prediction.
Action: python_repl_ast
Action Input: df.head()
Observation:    longitude  latitude  housing_median_age  total_rooms  total_bedrooms  \
0    -114.31     34.19                15.0       5612.0          1283.0   
1    -114.47     34.40                19.0       7650.0          1901.0   
2    -114.56     33.69                17.0        720.0           174.0   
3    -114.57     33.64                14.0       1501.0           337.0   
4    -114.57     33.57                20.0       1454.0           326.0   

   population  households  median_income  median_house_value  
0      1015.0       472.0         1.4936             66900.0  
1      1129.0       463.0         1.8200             80100.0  
2       333.0       117.0         1.6509             85700.0  
3       515.0       226.0         3.1917             73400.0  
4       624.0       262.0         1.9250             65500.0  
Thought: I need to use a machine learning algorithm to make a prediction.
Action: python_repl_ast
Action Input: from sklearn.linear_model import LinearRegression
Observation: 
Thought: I need to create a model and fit it to the data.
Action: python_repl_ast
Action Input: model = LinearRegression()
model.fit(df[['longitude', 'latitude', 'housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households', 'median_income']], df['median_house_value'])
Observation: LinearRegression()
Thought: I now know the final answer.
Final Answer: The predicted house price is the output of the model.

> Finished chain.
The predicted house price is the output of the model.

agent.run("위도와 경도에 따른 집값의 차이를 분석해줘")



> Entering new AgentExecutor chain...
Thought: 데이터프레임의 열을 사용해서 집값의 차이를 분석해야 한다.
Action: python_repl_ast
Action Input: df.groupby(['longitude', 'latitude'])['median_house_value'].mean()
Observation: longitude  latitude
-124.35    40.54        94600.0
-124.30    41.80        85800.0
           41.84       103600.0
-124.27    40.69        79000.0
-124.26    40.58       111400.0
                         ...   
-114.57    33.57        65500.0
           33.64        73400.0
-114.56    33.69        85700.0
-114.47    34.40        80100.0
-114.31    34.19        66900.0
Name: median_house_value, Length: 11054, dtype: float64
Thought: 위도와 경도에 따른 집값의 평균을 구할 수 있다.
Final Answer: 위도와 경도에 따른 집값의 평균을 구할 수 있다.

> Finished chain.
위도와 경도에 따른 집값의 평균을 구할 수 있다.

Jacob Kim

AI, Information and Communication, Electronics, Computer Science, Bio, Algorithms