[03] Vector DB

heering·2024년 1월 21일

Vector DB

목록 보기

5/5

Index

Q. What role does a pod play in a vector database index?
A. It represents pre-configured hardware for efficient data retrieval

Collection

Static copy of a index
collection에서 query 할 수 없음
index 백업에 유용함

Create collection

pinecone.create_collection(name = "my-collection", source = "test")

List, Describe, Delete collection

List

pinecone.list_collections() # list all collections in database

Describe

res = pinecone.describe_collection("my-collection")
res.name # 결과는 'my-collection'
res.size / 10**6 # MB단위, 결과는 3.112836

Delete

pinecone.delete_collection("my-collection")

Namespace

Index는 한 개 이상의 namespace를 가질 수 있음
모든 vector는 무조건 한 개의 namespace에 있어야 함 (Pinecone의 namespace default 값은 "")
namespace 값은 index 안에서 고유해야 함

'''
upsert할 때 namespace를 지정하는 예시
'''
idx = pinecone.Index("my-collection-index") # connect to index

### 기타 코드 생략 ###

idx.upsert(vectors_subj, namespace='subject')

'''
query할 때 namespace를 사용하는 예시
'''

### 기타 코드 생략 ###

idx.query(vector = list(np.random.rand(3)), 
          top_k=3, 
          namespace='',
          include_values=True)

Metadata

vector와 관련된 추가 정보 개념. 필터링, 정렬할 때 활용할 수 있음

Upsert

여기서 {"topic": ~~~~, "year": ~~~~} 부분이 metadata.

idx.upsert([
    ("1", [0.1, 0.1, 0.1 ], {"topic": "subject", "year": 2023}),
    ("2", [0.2, 0.2, 0.2], {"topic": "other", "year": 2024}),
    ("3", [0.3, 0.3, 0.3], {"topic": "body", "year": 2023}),
    ("4", [0.4, 0.4, 0.4], {"topic": "body"}),
    ("5", [0.5, 0.5, 0.5], {"topic": "subject"})
])

Query

idx.query(vector =[0,0,0], 
          top_k=2, 
          include_metadata=True, 
          include_values=True,
          filter={
             "topic" : {"$eq": "subject"},
              "year" : 2023
         })

heering