[Langchain] Vectorstore (1) Chroma

Hunie_07ยท2025๋…„ 3์›” 29์ผ
0

Langchain

๋ชฉ๋ก ๋ณด๊ธฐ
10/35

๐Ÿ“Œ ๋ฒกํ„ฐ ์ €์žฅ์†Œ (Vector Store)

  • ๊ฐœ๋…:

    • ๋ฒกํ„ฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰ํ•˜๊ธฐ ์œ„ํ•œ ํŠน์ˆ˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‹œ์Šคํ…œ
    • ํ…์ŠคํŠธ๋‚˜ ์ด๋ฏธ์ง€ ๋“ฑ์˜ ๋น„์ •ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ฐจ์› ๋ฒกํ„ฐ ๊ณต๊ฐ„์— ๋งคํ•‘ํ•˜์—ฌ ์ €์žฅ
    • ์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ์˜๋ฏธ์ ์œผ๋กœ ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ
  • LangChain์˜ ๋ฒกํ„ฐ ์ €์žฅ์†Œ ์ข…๋ฅ˜:

    • Chroma: ๊ฒฝ๋Ÿ‰ํ™”๋œ ์ž„๋ฒ ๋”ฉ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋กœ ๋กœ์ปฌ ๊ฐœ๋ฐœ์— ์ ํ•ฉ
    • FAISS: Facebook AI๊ฐ€ ๊ฐœ๋ฐœํ•œ ๊ณ ์„ฑ๋Šฅ ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
    • Pinecone: ์™„์ „ ๊ด€๋ฆฌํ˜• ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์„œ๋น„์Šค
    • Milvus: ๋ถ„์‚ฐ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋กœ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์— ์ ํ•ฉ
    • PostgreSQL: pgvector ํ™•์žฅ์„ ํ†ตํ•ด ๋ฒกํ„ฐ ์ €์žฅ ๋ฐ ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ์„ ์ œ๊ณต
  • ์ฃผ์š” ๊ธฐ๋Šฅ:

    • ๋ฒกํ„ฐ ์ƒ‰์ธํ™”: ํšจ์œจ์ ์ธ ๊ฒ€์ƒ‰์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐํ™”๋ฅผ ์ˆ˜ํ–‰
    • ๊ทผ์ ‘ ์ด์›ƒ ๊ฒ€์ƒ‰: ์ฃผ์–ด์ง„ ์ฟผ๋ฆฌ์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ ๋ฒกํ„ฐ๋“ค์„ ๊ฒ€์ƒ‰
    • ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ: ๋ฒกํ„ฐ์™€ ๊ด€๋ จ๋œ ๋ถ€๊ฐ€ ์ •๋ณด๋ฅผ ํ•จ๊ป˜ ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰
  • ์‚ฌ์šฉ ์‚ฌ๋ก€:

    • ์‹œ๋งจํ‹ฑ ๋ฌธ์„œ ๊ฒ€์ƒ‰: ๋ฌธ์„œ์˜ ์˜๋ฏธ๋ฅผ ์ดํ•ดํ•˜์—ฌ ๊ฒ€์ƒ‰
    • ์ถ”์ฒœ ์‹œ์Šคํ…œ: ์œ ์‚ฌํ•œ ์•„์ดํ…œ์„ ์ถ”์ฒœ
    • ์ค‘๋ณต ๋ฐ์ดํ„ฐ ๊ฐ์ง€: ์œ ์‚ฌํ•œ ์ฝ˜ํ…์ธ ๋ฅผ ๊ฒ€์ƒ‰
    • ์งˆ์˜์‘๋‹ต ์‹œ์Šคํ…œ: ๊ด€๋ จ ๋ฌธ์„œ์—์„œ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ๊ทผ๊ฑฐ๋ฅผ ๊ฒ€์ƒ‰

1๏ธโƒฃ Chroma

  • ์‚ฌ์šฉ์ž ํŽธ์˜์„ฑ์ด ์šฐ์ˆ˜ํ•œ ์˜คํ”ˆ์†Œ์Šค ๋ฒกํ„ฐ ์ €์žฅ์†Œ
  • ์„ค์น˜: pip install langchain-chroma / poetry add langchain-chroma

1. ๋ฒกํ„ฐ ์ €์žฅ์†Œ ์ดˆ๊ธฐํ™”

# ๋ฒกํ„ฐ ์ €์žฅ์†Œ์— ๋ฌธ์„œ๋ฅผ ์ €์žฅํ•  ๋•Œ ์ ์šฉํ•  ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

embeddings_model = HuggingFaceEmbeddings(model_name="BAAI/bge-m3")
# ๋ฒกํ„ฐ ์ €์žฅ์†Œ ์ƒ์„ฑ
from langchain_chroma import Chroma

chroma_db = Chroma(
    collection_name="ai_sample_collection", # ๋ฒกํ„ฐ์ €์žฅ์†Œ ์ด๋ฆ„
    embedding_function=embeddings_model,	# ์‚ฌ์šฉํ•  ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ
    persist_directory="./chroma_db",		# ์ €์žฅ ๊ฒฝ๋กœ
)
  • ์‹คํ–‰ ํ›„์— chroma_db ๋ผ๋Š” ํด๋”๊ฐ€ ์ƒ์„ฑ๋˜๋ฉด์„œ uuid ๋ฌธ์ž์—ด๋กœ ์ €์žฅ์†Œ๊ฐ€ ์ƒ์„ฑ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ฒกํ„ฐ ์ €์žฅ์†Œ์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ ํ™•์ธ

# ํ˜„์žฌ ์ €์žฅ๋œ ์ปฌ๋ ‰์…˜ ๋ฐ์ดํ„ฐ ํ™•์ธ
chroma_db.get()

- ์ถœ๋ ฅ

{'ids': [],
 'embeddings': None,
 'documents': [],
 'uris': None,
 'data': None,
 'metadatas': [],
 'included': [<IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

2. ๋ฒกํ„ฐ ์ €์žฅ์†Œ ๊ด€๋ฆฌ

  • ๋ฌธ์„œ ์ถ”๊ฐ€ : vector_store.add_documents(documents, ids)
from langchain_core.documents import Document

# ๋ฌธ์„œ ๋ฐ์ดํ„ฐ - (ํ…์ŠคํŠธ, ์†Œ์Šค)
documents = [
    ("์ธ๊ณต์ง€๋Šฅ์€ ์ปดํ“จํ„ฐ ๊ณผํ•™์˜ ํ•œ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.", "AI ๊ฐœ๋ก "),
    ("๋จธ์‹ ๋Ÿฌ๋‹์€ ์ธ๊ณต์ง€๋Šฅ์˜ ํ•˜์œ„ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.", "AI ๊ฐœ๋ก "),
    ("๋”ฅ๋Ÿฌ๋‹์€ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ํ•œ ์ข…๋ฅ˜์ž…๋‹ˆ๋‹ค.", "๋”ฅ๋Ÿฌ๋‹ ์ž…๋ฌธ"),
    ("์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋Š” ์ปดํ“จํ„ฐ๊ฐ€ ์ธ๊ฐ„์˜ ์–ธ์–ด๋ฅผ ์ดํ•ดํ•˜๊ณ  ์ƒ์„ฑํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.", "AI ๊ฐœ๋ก "),
    ("์ปดํ“จํ„ฐ ๋น„์ „์€ ์ปดํ“จํ„ฐ๊ฐ€ ๋””์ง€ํ„ธ ์ด๋ฏธ์ง€๋‚˜ ๋น„๋””์˜ค๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์—ฐ๊ตฌํ•ฉ๋‹ˆ๋‹ค.", "๋”ฅ๋Ÿฌ๋‹ ์ž…๋ฌธ")
]

# Document ๊ฐ์ฒด ์ƒ์„ฑ
doc_objects = []
for content, source in documents:
    doc = Document(
        page_content=content,
        metadata={"source": source},
    )
    doc_objects.append(doc)


# ์ˆœ์ฐจ์  ID ๋ฆฌ์ŠคํŠธ ์ƒ์„ฑ
doc_ids = [f"DOC_{i}" for i in range(1, len(doc_objects) + 1)]

# ๋ฌธ์„œ๋ฅผ ๋ฒกํ„ฐ ์ €์žฅ์†Œ์— ์ €์žฅ
added_doc_ids = chroma_db.add_documents(documents=doc_objects, ids=doc_ids)

# ๋ฒกํ„ฐ ์ €์žฅ์†Œ์— ์ €์žฅ๋œ ๋ฌธ์„œ๋ฅผ ํ™•์ธ
print(f"{len(added_doc_ids)}๊ฐœ์˜ ๋ฌธ์„œ๊ฐ€ ์„ฑ๊ณต์ ์œผ๋กœ ๋ฒกํ„ฐ ์ €์žฅ์†Œ์— ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.")
print(added_doc_ids)

- ์ถœ๋ ฅ

5๊ฐœ์˜ ๋ฌธ์„œ๊ฐ€ ์„ฑ๊ณต์ ์œผ๋กœ ๋ฒกํ„ฐ ์ €์žฅ์†Œ์— ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
['DOC_1', 'DOC_2', 'DOC_3', 'DOC_4', 'DOC_5']

์ €์žฅ๋œ ๋ฌธ์„œ ๊ฒ€์ƒ‰

# ์ €์žฅ๋œ ๋ฌธ์„œ ๊ฒ€์ƒ‰
query = "์ธ๊ณต์ง€๋Šฅ๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ๊ด€๊ณ„๋Š”?"
results = chroma_db.similarity_search(query, k=2)

print(f"\n์ฟผ๋ฆฌ: {query}")
print("๊ฐ€์žฅ ์œ ์‚ฌํ•œ ๋ฌธ์„œ:")
for doc in results:
    print(f"- {doc.page_content} [์ถœ์ฒ˜: {doc.metadata['source']}]")

- ์ถœ๋ ฅ

์ฟผ๋ฆฌ: ์ธ๊ณต์ง€๋Šฅ๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ๊ด€๊ณ„๋Š”?
๊ฐ€์žฅ ์œ ์‚ฌํ•œ ๋ฌธ์„œ:
- ๋จธ์‹ ๋Ÿฌ๋‹์€ ์ธ๊ณต์ง€๋Šฅ์˜ ํ•˜์œ„ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค. [์ถœ์ฒ˜: AI ๊ฐœ๋ก ]
- ๋”ฅ๋Ÿฌ๋‹์€ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ํ•œ ์ข…๋ฅ˜์ž…๋‹ˆ๋‹ค. [์ถœ์ฒ˜: ๋”ฅ๋Ÿฌ๋‹ ์ž…๋ฌธ]

๋ฌธ์„œ ์ถ”๊ฐ€ ํ›„ ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ ํ™•์ธ

# ํ˜„์žฌ ์ €์žฅ๋œ ์ปฌ๋ ‰์…˜ ๋ฐ์ดํ„ฐ ํ™•์ธ
chroma_db.get()

- ์ถœ๋ ฅ

{'ids': ['DOC_1', 'DOC_2', 'DOC_3', 'DOC_4', 'DOC_5'],
 'embeddings': None,
 'documents': ['์ธ๊ณต์ง€๋Šฅ์€ ์ปดํ“จํ„ฐ ๊ณผํ•™์˜ ํ•œ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.',
  '๋จธ์‹ ๋Ÿฌ๋‹์€ ์ธ๊ณต์ง€๋Šฅ์˜ ํ•˜์œ„ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.',
  '๋”ฅ๋Ÿฌ๋‹์€ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ํ•œ ์ข…๋ฅ˜์ž…๋‹ˆ๋‹ค.',
  '์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋Š” ์ปดํ“จํ„ฐ๊ฐ€ ์ธ๊ฐ„์˜ ์–ธ์–ด๋ฅผ ์ดํ•ดํ•˜๊ณ  ์ƒ์„ฑํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.',
  '์ปดํ“จํ„ฐ ๋น„์ „์€ ์ปดํ“จํ„ฐ๊ฐ€ ๋””์ง€ํ„ธ ์ด๋ฏธ์ง€๋‚˜ ๋น„๋””์˜ค๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์—ฐ๊ตฌํ•ฉ๋‹ˆ๋‹ค.'],
 'uris': None,
 'data': None,
 'metadatas': [{'source': 'AI ๊ฐœ๋ก '},
  {'source': 'AI ๊ฐœ๋ก '},
  {'source': '๋”ฅ๋Ÿฌ๋‹ ์ž…๋ฌธ'},
  {'source': 'AI ๊ฐœ๋ก '},
  {'source': '๋”ฅ๋Ÿฌ๋‹ ์ž…๋ฌธ'}],
 'included': [<IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

  • ๋ฌธ์„œ ์ˆ˜์ •: vector_store.update_document(document_id, document)
# ์—…๋ฐ์ดํŠธํ•  ๋ฌธ์„œ ์ƒ์„ฑ
updated_document_1 = Document(
    page_content="์ธ๊ณต์ง€๋Šฅ์€ ์ปดํ“จํ„ฐ ๊ณผํ•™์˜ ํ•ต์‹ฌ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ธฐ๊ณ„ํ•™์Šต๊ณผ ๋”ฅ๋Ÿฌ๋‹์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.",
    metadata={"source": "AI ๊ฐœ๋ก "},
)

updated_document_2 = Document(
    page_content="๋จธ์‹ ๋Ÿฌ๋‹์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•˜์—ฌ ์˜ˆ์ธก๊ณผ ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ์ธ๊ณต์ง€๋Šฅ์˜ ํ•˜์œ„ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.",
    metadata={"source": "AI ๊ฐœ๋ก "},
)

updated_document_3 = Document(
    page_content="๋”ฅ๋Ÿฌ๋‹์€ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ํ•œ ์ข…๋ฅ˜๋กœ, ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.",
    metadata={"source": "๋”ฅ๋Ÿฌ๋‹ ์ž…๋ฌธ"},
)


# ๋‹จ์ผ ๋ฌธ์„œ ์—…๋ฐ์ดํŠธ
chroma_db.update_document(document_id="DOC_1", document=updated_document_1)

# ์—ฌ๋Ÿฌ ๋ฌธ์„œ ํ•œ ๋ฒˆ์— ์—…๋ฐ์ดํŠธ
chroma_db.update_documents(
    ids=["DOC_2", "DOC_3"],
    documents=[updated_document_2, updated_document_3]
)

  • ๋ฌธ์„œ ์‚ญ์ œ: vectorstore.delete(ids)
# ๋ฌธ์„œ id๋ฅผ ์ง€์ •ํ•˜์—ฌ ์‚ญ์ œ
chroma_db.delete(ids=["DOC_5"])
# ์ปฌ๋ ‰์…˜ ํ™•์ธ
chroma_db.get()

- ์ถœ๋ ฅ

{'ids': ['DOC_1', 'DOC_2', 'DOC_3', 'DOC_4'],
 'embeddings': None,
 'documents': ['์ธ๊ณต์ง€๋Šฅ์€ ์ปดํ“จํ„ฐ ๊ณผํ•™์˜ ํ•ต์‹ฌ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ธฐ๊ณ„ํ•™์Šต๊ณผ ๋”ฅ๋Ÿฌ๋‹์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.',
  '๋จธ์‹ ๋Ÿฌ๋‹์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•˜์—ฌ ์˜ˆ์ธก๊ณผ ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ์ธ๊ณต์ง€๋Šฅ์˜ ํ•˜์œ„ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.',
  '๋”ฅ๋Ÿฌ๋‹์€ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ํ•œ ์ข…๋ฅ˜๋กœ, ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.',
  '์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋Š” ์ปดํ“จํ„ฐ๊ฐ€ ์ธ๊ฐ„์˜ ์–ธ์–ด๋ฅผ ์ดํ•ดํ•˜๊ณ  ์ƒ์„ฑํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.'],
 'uris': None,
 'data': None,
 'metadatas': [{'source': 'AI ๊ฐœ๋ก '},
  {'source': 'AI ๊ฐœ๋ก '},
  {'source': '๋”ฅ๋Ÿฌ๋‹ ์ž…๋ฌธ'},
  {'source': 'AI ๊ฐœ๋ก '}],
 'included': [<IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

3. ๋ฌธ์„œ ๊ฒ€์ƒ‰

  • ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ similarity_search
    • ์ฃผ์–ด์ง„ ์ฟผ๋ฆฌ์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ ๋ฌธ์„œ๋ฅผ ๋ฐ˜ํ™˜
    • k=2๋Š” ์ƒ์œ„ 2๊ฐœ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋„๋ก ์ง€์ •
    • filter๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ์ถœ์ฒ˜์˜ ๋ฌธ์„œ๋งŒ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ
query = "์ธ๊ณต์ง€๋Šฅ๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ฐจ์ด์ ์€ ๋ฌด์—‡์ธ๊ฐ€์š”?"
results = chroma_db.similarity_search(
    query,
    k=2,
    filter={"source": "AI ๊ฐœ๋ก "}
)

print("์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ:")
for doc in results:
    print(f"- {doc.page_content} [์ถœ์ฒ˜: {doc.metadata['source']}]")

- ์ถœ๋ ฅ

์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ:
- ๋จธ์‹ ๋Ÿฌ๋‹์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•˜์—ฌ ์˜ˆ์ธก๊ณผ ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ์ธ๊ณต์ง€๋Šฅ์˜ ํ•˜์œ„ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค. [์ถœ์ฒ˜: AI ๊ฐœ๋ก ]
- ์ธ๊ณต์ง€๋Šฅ์€ ์ปดํ“จํ„ฐ ๊ณผํ•™์˜ ํ•ต์‹ฌ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ธฐ๊ณ„ํ•™์Šต๊ณผ ๋”ฅ๋Ÿฌ๋‹์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. [์ถœ์ฒ˜: AI ๊ฐœ๋ก ]

  • ์œ ์‚ฌ๋„ ์ ์ˆ˜๊ฐ€ ํฌํ•จ๋œ ๊ฒ€์ƒ‰ similarity_search_with_score
    • ์œ ์‚ฌ๋„ ์ ์ˆ˜๋ฅผ ํ•จ๊ป˜ ๋ฐ˜ํ™˜
    • ์ ์ˆ˜๊ฐ€ ๋‚ฎ์„์ˆ˜๋ก ๋” ์œ ์‚ฌํ•œ ๊ฒƒ์„ ์˜๋ฏธ (๊ฑฐ๋ฆฌ ๊ธฐ์ค€์œผ๋กœ ์ ์ˆ˜๊ฐ€ ์‚ฐ์ •๋˜๊ธฐ ๋•Œ๋ฌธ)
query = "๋”ฅ๋Ÿฌ๋‹์€ ์–ด๋–ค ๋ถ„์•ผ์—์„œ ์‚ฌ์šฉ๋˜๋‚˜์š”?"
results = chroma_db.similarity_search_with_score(
    query,
    k=2,
    filter={"source": "AI ๊ฐœ๋ก "}
)

print("์ ์ˆ˜๊ฐ€ ํฌํ•จ๋œ ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ:\n")
for doc, score in results:
    print(f"- ์ ์ˆ˜: {score:.4f}")
    print(f"  ๋‚ด์šฉ: {doc.page_content}")
    print(f"  [์ถœ์ฒ˜: {doc.metadata['source']}]")
    print()

- ์ถœ๋ ฅ

์ ์ˆ˜๊ฐ€ ํฌํ•จ๋œ ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ:

- ์ ์ˆ˜: 0.7292
  ๋‚ด์šฉ: ์ธ๊ณต์ง€๋Šฅ์€ ์ปดํ“จํ„ฐ ๊ณผํ•™์˜ ํ•ต์‹ฌ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ธฐ๊ณ„ํ•™์Šต๊ณผ ๋”ฅ๋Ÿฌ๋‹์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
  [์ถœ์ฒ˜: AI ๊ฐœ๋ก ]

- ์ ์ˆ˜: 0.8394
  ๋‚ด์šฉ: ๋จธ์‹ ๋Ÿฌ๋‹์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•˜์—ฌ ์˜ˆ์ธก๊ณผ ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ์ธ๊ณต์ง€๋Šฅ์˜ ํ•˜์œ„ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.
  [์ถœ์ฒ˜: AI ๊ฐœ๋ก ]

  • ๊ด€๋ จ์„ฑ ์ ์ˆ˜๊ฐ€ ํฌํ•จ๋œ ๊ฒ€์ƒ‰ similarity_search_with_relevance_scores
    • ๋ฌธ์„œ์™€ ํ•จ๊ป˜ 0์—์„œ 1 ์‚ฌ์ด์˜ ๊ด€๋ จ์„ฑ ์ ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜
    • 0์€ ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ์ด ๋‚ฎ๊ณ , 1์€ ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ์ด ๋†’์Œ์„ ์˜๋ฏธ
query = "๋”ฅ๋Ÿฌ๋‹์€ ์–ด๋–ค ๋ถ„์•ผ์—์„œ ์‚ฌ์šฉ๋˜๋‚˜์š”?"
results = chroma_db.similarity_search_with_relevance_scores(
    query,
    k=2,
    filter={"source": "AI ๊ฐœ๋ก "}
)

print(f"์ฟผ๋ฆฌ: {query}")
print("\n๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ (๊ด€๋ จ์„ฑ ์ ์ˆ˜ ํฌํ•จ):")
for doc, score in results:
    print(f"- ๊ด€๋ จ์„ฑ ์ ์ˆ˜: {score:.4f}")
    print(f"  ๋‚ด์šฉ: {doc.page_content}")
    print(f"  [์ถœ์ฒ˜: {doc.metadata['source']}]")
    print()

- ์ถœ๋ ฅ

์ฟผ๋ฆฌ: ๋”ฅ๋Ÿฌ๋‹์€ ์–ด๋–ค ๋ถ„์•ผ์—์„œ ์‚ฌ์šฉ๋˜๋‚˜์š”?

๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ (๊ด€๋ จ์„ฑ ์ ์ˆ˜ ํฌํ•จ):
- ๊ด€๋ จ์„ฑ ์ ์ˆ˜: 0.4844
  ๋‚ด์šฉ: ์ธ๊ณต์ง€๋Šฅ์€ ์ปดํ“จํ„ฐ ๊ณผํ•™์˜ ํ•ต์‹ฌ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ธฐ๊ณ„ํ•™์Šต๊ณผ ๋”ฅ๋Ÿฌ๋‹์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
  [์ถœ์ฒ˜: AI ๊ฐœ๋ก ]

- ๊ด€๋ จ์„ฑ ์ ์ˆ˜: 0.4065
  ๋‚ด์šฉ: ๋จธ์‹ ๋Ÿฌ๋‹์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•˜์—ฌ ์˜ˆ์ธก๊ณผ ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ์ธ๊ณต์ง€๋Šฅ์˜ ํ•˜์œ„ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.
  [์ถœ์ฒ˜: AI ๊ฐœ๋ก ]

4. ๋ฒกํ„ฐ ์ €์žฅ์†Œ ๋กœ๋“œ

chroma_db2 = Chroma(
    collection_name="ai_sample_collection",
    embedding_function=embeddings_model,
    persist_directory="./chroma_db",
)
  • persist_directory ๋‚ด chroma.sqlite3 ํŒŒ์ผ์—์„œ collections ๋ฅผ ํ™•์ธํ•˜๋ฉด id, name, dimension ๋“ฑ ํ™•์ธ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€