문서 중에 Enbedding의 중요성 내용이 있어서 정리!
영어 내용을 chatgpt로 번역 시킨 내용이라 어색한 부분이 있을 수 있음 😅
임베딩은 텍스트, 음성, 이미지, 비디오와 같은 현실 세계 데이터를 숫자로 표현한 것입니다.
이들은 저차원의 벡터로 표현되며, 벡터 공간에서 두 벡터 간의 거리 관계는 이들이 대표하는 현실 세계 객체 간의 관계를 투영한 결과입니다. 즉, 다양한 종류의 데이터를 간결하게 표현할 수 있도록 도와줄 뿐만 아니라, 두 데이터 객체가 얼마나 유사하거나 다른지를 수치적으로 비교할 수 있게 해줍니다.
예를 들어, '컴퓨터'라는 단어는 컴퓨터 사진이나 '노트북'이라는 단어와 유사한 의미를 가지지만, '자동차'라는 단어와는 유사하지 않습니다. 이렇게 현실 세계 데이터를 저차원 벡터로 표현하면 데이터의 중요한 특성은 유지하면서 원본 데이터를 손실 압축 방식으로 표현하게 되어 대규모 데이터 처리와 저장이 훨씬 효율적으로 이루어질 수 있습니다.
임베딩의 주요 응용 중 하나는 검색 및 추천 시스템입니다.
여기서 결과는 대규모 검색 공간에서 추출됩니다. 예를 들어, Google 검색은 인터넷 전체를 검색 공간으로 하는 검색 시스템입니다. 현재의 검색 및 추천 시스템의 성공은 다음과 같은 요소에 달려 있습니다.
1. 검색 공간의 수십억 개 아이템에 대해 임베딩을 미리 계산하기
2. 쿼리 임베딩을 동일한 임베딩 공간으로 매핑하기
3. 검색 공간에서 쿼리 임베딩의 가장 가까운 이웃을 효율적으로 계산하고 검색하기
임베딩은 또한 멀티모달리티의 세계에서 빛을 발합니다. 대부분의 응용 프로그램은 텍스트, 음성, 이미지, 비디오와 같은 다양한 형식의 방대한 데이터를 처리합니다. 각 개체나 객체는 고유한 형식으로 표현되기 때문에, 이러한 객체들을 정보가 풍부하고 컴팩트한 동일 벡터 공간에 투영하는 것은 매우 어렵습니다. 이상적으로는 이러한 표현이 원래 객체의 특성을 최대한 잘 포착해야 합니다.
임베딩은 객체를 입력 공간에서 상대적으로 저차원의 벡터 공간으로 투영한 벡터를 의미하며, 각 벡터는 실수형 숫자 목록으로 구성됩니다.
자료는 문서에서 캡쳐
텍스트 -> 토크나이저 -> 인덱싱 -> 임베딩 과정
텍스트는 사용자의 텍스트
토크나이저는 텍스트를 조각으로 분리하는 것
인덱싱은 토크나이저로 분리한 조각들에 번호를 부여하는 것
임베딩은 위에서 말한 저차원의 벡터 공간에 매핑하는 것
Retrieval Augmented Generation로 약어로 외부 문서에서 검색해서 오는 방법을 의미함
LLM에서 발생하는 할루시네이션을 방지하는 방법으로 많이 알려져있음
LLM은 큰 단점 2개는 아래와 같음
1. 학습된 정보만 '알고' 있다는 점
2. 입력 컨텍스트 창이 제한되어 있다는 점
이를 해결하고자 RAG를 사용하는데 RAG는 아래와 같은 순서로 진행
그러나, 이 과정이 진행 되기 전에 외부에 관련 데이터가 저장된 DB가 있어야 함
DB가 구축되어 있다면, DB에서 쿼리를 검색하고 결과를 LLM 프롬프트에 추가하여 답변을 생성함
외부 DB는 다양한 DB가 있지만, 이번에는 벡터 DB 중 하나인 Chroma DB를 사용
벡터 DB의 한 종류
설치는 아래 코드로 가능
%pip install -U -q "google-generativeai>=0.8.3" chromadb
임베딩을 위해선 임베딩 모델이 필요
gemini에서 사용가능한 임베딩 모델 보기
for m in genai.list_models(): if "embedContent" in m.supported_generation_methods: print(m.name)
아래와 같은 모델 목록이 나옴 text-embedding-004가 최신이므로 이거 사용
models/embedding-001
models/text-embedding-004
임시로 데이터 생성
DOCUMENT1 = "Operating the Climate Control System Your Googlecar has a climate control system that allows you to adjust the temperature and airflow in the car. To operate the climate control system, use the buttons and knobs located on the center console. Temperature: The temperature knob controls the temperature inside the car. Turn the knob clockwise to increase the temperature or counterclockwise to decrease the temperature. Airflow: The airflow knob controls the amount of airflow inside the car. Turn the knob clockwise to increase the airflow or counterclockwise to decrease the airflow. Fan speed: The fan speed knob controls the speed of the fan. Turn the knob clockwise to increase the fan speed or counterclockwise to decrease the fan speed. Mode: The mode button allows you to select the desired mode. The available modes are: Auto: The car will automatically adjust the temperature and airflow to maintain a comfortable level. Cool: The car will blow cool air into the car. Heat: The car will blow warm air into the car. Defrost: The car will blow warm air onto the windshield to defrost it." DOCUMENT2 = 'Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon. For example, you can touch the "Navigation" icon to get directions to your destination or touch the "Music" icon to play your favorite songs.' DOCUMENT3 = "Shifting Gears Your Googlecar has an automatic transmission. To shift gears, simply move the shift lever to the desired position. Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions." documents = [DOCUMENT1, DOCUMENT2, DOCUMENT3]
from chromadb import Documents, EmbeddingFunction, Embeddings from google.api_core import retry class GeminiEmbeddingFunction(EmbeddingFunction): # Specify whether to generate embeddings for documents, or queries document_mode = True def __call__(self, input: Documents) -> Embeddings: if self.document_mode: embedding_task = "retrieval_document" else: embedding_task = "retrieval_query" retry_policy = {"retry": retry.Retry(predicate=retry.if_transient_error)} response = genai.embed_content( model="models/text-embedding-004", content=input, task_type=embedding_task, request_options=retry_policy, ) return response["embedding"]
class로 정의하며, genai의 embed_content 사용
document_mode에 따라 task를 retrieval_document와 retrieval_query로 분리
문서를 임베딩 할 때는 retrieval_document
사용자 질문 쿼리를 임베딩 할 때는 retrieval_query
import chromadb DB_NAME = "googlecardb" embed_fn = GeminiEmbeddingFunction() embed_fn.document_mode = True chroma_client = chromadb.Client() db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn) db.add(documents=documents, ids=[str(i) for i in range(len(documents))])
DB 이름은 googlecardb로 설정
embed_fn에 객체 생성
embed_fn 객체 내 document_mode는 True 설정
chroma_client 변수에 Client 생성
db에 client의 create_collention 저장
chromadb는 collection 단위로 저장
db.add를 통해서 문서 저장
db.peek(1)
아래처럼 데이터도 확인 가능
{'ids': ['0'],
'embeddings': array([[ 1.89996641e-02, 7.50530604e-03, -2.69203484e-02,
-9.78849642e-03, -8.69853329e-03, -1.27060898e-02,
2.98559777e-02, 6.81265350e-03, -5.14574721e-03,
3.52885611e-02, -8.06997046e-02, 7.53531009e-02,
8.58849138e-02, 1.30070690e-02, -2.77891336e-03,
-1.05936602e-01, -3.98958661e-03, -7.90926255e-03,
-6.78334385e-02, 9.37942939e-04, -3.31661627e-02,
2.91897804e-02, -4.37331311e-02, -2.03247666e-02,
-4.03934792e-02, -4.03673872e-02, 4.37083244e-02,
4.35296260e-02, -5.30568957e-02, -7.60380086e-03,
1.10596254e-01, 2.93870177e-02, -2.69055367e-03,
-2.77709514e-02, 3.56521569e-02, 8.06248095e-03,
-6.98892726e-03, -4.19588238e-02, -1.22357225e-02,
-7.43979216e-02, -8.68393779e-02, 1.45487059e-02,
1.64316818e-02, 4.95274737e-02, 5.96513413e-03,
-3.29070538e-02, -4.59746048e-02, 6.05700687e-02,
3.69714126e-02, -3.67843956e-02, 3.23379971e-02,
5.57416072e-03, -2.60148998e-02, 3.72896157e-02,
1.09929522e-03, 2.22460013e-02, -3.41857560e-02,
-1.33831445e-02, 3.84481438e-02, 1.01510370e-02,
-4.13358212e-02, 1.50308516e-02, -4.16338220e-02,
2.77164355e-02, 1.77828148e-02, -5.60168698e-02,
-1.28984069e-02, -4.63326536e-02, -4.68938565e-03,
-1.61614176e-02, 4.64589000e-02, 4.97067757e-02,
-1.76815037e-02, 2.75055915e-02, -4.77083847e-02,
-3.02048232e-02, -3.58368531e-02, -1.49208391e-02,
1.64871980e-02, 5.08249328e-02, -2.80048344e-02,
6.99736923e-02, 5.95696792e-02, 3.21656168e-02,
-9.67071112e-03, -2.01593228e-02, 6.69856966e-02,
1.51496977e-02, -7.96246305e-02, -4.13303310e-03,
7.46189281e-02, 6.72254059e-03, -1.84513796e-02,
2.87920292e-02, 7.01305121e-02, 1.55893695e-02,
-1.16371922e-01, -7.99548775e-02, 5.50818779e-02,
7.43377358e-02, 2.68951990e-02, 5.65322861e-02,
-1.09314676e-02, -4.97194529e-02, 2.15398204e-02,
2.49629170e-02, 9.70174186e-03, -2.00729957e-03,
-5.23507595e-02, 5.45086190e-02, 2.20851172e-02,
-5.07799238e-02, 1.51937539e-02, 3.40860412e-02,
2.94148624e-02, -2.27203965e-02, -4.44913059e-02,
4.19270061e-02, 1.33883394e-02, -8.03010166e-03,
-3.02138049e-02, -1.52768716e-02, -1.79337766e-02,
4.12679724e-02, 4.28794511e-02, 1.86120179e-02,
6.62900433e-02, 1.68055836e-02, -7.44678685e-03,
-7.88552612e-02, -1.01826247e-03, -5.30414023e-02,
-1.77297127e-02, 4.33610976e-02, -2.87686586e-02,
4.37279977e-02, 6.51901439e-02, 1.42448116e-02,
3.04472726e-02, -1.51222004e-02, 9.45043378e-03,
-1.39130075e-02, -7.50348866e-02, 7.15902681e-03,
-2.02401206e-02, 1.58404314e-03, 2.50295680e-02,
-2.19853669e-02, -5.25463633e-02, 2.95491908e-02,
2.80410163e-02, 1.59156360e-02, 4.83962148e-03,
3.42256613e-02, 2.61947587e-02, -1.79085378e-02,
-9.59200598e-03, -1.30515965e-02, 5.45655526e-02,
4.18885499e-02, 1.15598358e-01, -2.15165876e-02,
1.55694981e-03, 4.04374786e-02, -4.02314290e-02,
7.93419555e-02, -6.48735513e-05, -4.36600596e-02,
-1.67760327e-02, -5.54723442e-02, -1.18653001e-02,
-5.03605045e-02, 2.53493730e-02, -7.12577999e-02,
9.61738545e-03, -8.39261059e-03, -2.53669359e-02,
-3.75324562e-02, -2.25138683e-02, 2.07773894e-02,
9.74554345e-02, -3.56078707e-02, -9.07747075e-03,
-3.51976268e-02, 5.00649167e-03, 8.92841443e-03,
-2.23056767e-02, 1.60871912e-02, 4.47834916e-02,
1.94993690e-02, 4.22408767e-02, 4.36633378e-02,
4.17801514e-02, 6.61993539e-03, 3.28581221e-03,
1.34587772e-02, 4.60674651e-02, -5.68813039e-03,
2.91601587e-02, -3.89970653e-02, 2.51733679e-02,
2.41864398e-02, -2.06118114e-02, -8.49481300e-03,
-7.96157196e-02, 3.14108934e-03, -3.03156907e-04,
-7.82509372e-02, -5.01308031e-03, 5.65630989e-03,
1.24495151e-02, -1.11163808e-02, -2.71381363e-02,
-2.70434972e-02, -6.62122518e-02, 1.71623491e-02,
1.83970556e-02, -3.10235117e-02, 2.75832936e-02,
-3.41743082e-02, 7.28364475e-03, -5.92893362e-02,
1.31034762e-01, 2.93862680e-03, 3.27316076e-02,
2.23102737e-02, -4.07483578e-02, 1.35542676e-02,
-4.49327976e-02, -6.06717588e-03, -3.24808666e-03,
2.87276991e-02, -2.08841208e-02, 3.97731829e-03,
-7.79294968e-03, 4.93008830e-02, 4.95140441e-02,
-3.70474011e-02, 3.40877101e-02, 7.44339498e-03,
2.74706036e-02, 4.58842888e-02, 4.23083864e-02,
2.34148689e-02, 1.48451021e-02, -2.37723533e-02,
6.47631735e-02, 1.57277603e-02, -4.26457748e-02,
-8.36548060e-02, -3.28711458e-02, -3.68208289e-02,
-2.07035076e-02, 4.69575636e-03, -3.04315165e-02,
-2.19081286e-02, -3.13257449e-03, 3.33950296e-02,
3.21173221e-02, -1.72844790e-02, 3.07742245e-02,
3.91557366e-02, -3.57035585e-02, 5.66983642e-03,
1.98001713e-02, -1.09939367e-01, -3.80607173e-02,
9.85682476e-03, -5.44885881e-02, -2.41438188e-02,
2.44294759e-02, 1.77268439e-03, 2.96395691e-03,
-2.12335661e-02, -1.25241298e-02, 1.70172658e-02,
-6.21883534e-02, -1.07621728e-02, -5.09412214e-03,
-1.35658048e-02, 3.85766551e-02, 5.11570498e-02,
2.18521468e-02, -4.14995439e-02, 3.34418379e-02,
-3.93004157e-02, 6.15507318e-03, 3.12885921e-03,
-6.00040518e-02, 1.04700611e-03, 3.84333506e-02,
-1.19297365e-02, -3.65912318e-02, -4.38246727e-02,
5.30170575e-02, -9.58478730e-03, 4.97351848e-02,
1.51532618e-02, 1.89138837e-02, -4.18482982e-02,
1.52532328e-02, 4.77849133e-02, 6.88652974e-03,
2.96370517e-02, 4.28295061e-02, -4.20004539e-02,
5.03814965e-03, -2.89589725e-02, -9.47450101e-03,
-7.19804608e-04, -2.04819646e-02, 1.71097741e-02,
-1.69946812e-02, -1.51602775e-02, 1.40648400e-02,
4.33577523e-02, -1.28116727e-01, 1.96245406e-02,
-9.79382778e-04, -1.49972932e-02, 3.11941318e-02,
-3.98269072e-02, -3.17095779e-02, -1.03539908e-02,
2.90332790e-02, 1.71464738e-02, -2.18943600e-02,
-1.20792107e-03, 2.02216711e-02, -5.82920201e-02,
8.70636450e-06, -2.70486567e-02, -8.37526023e-02,
-3.73691204e-03, -7.00314045e-02, 3.47714312e-02,
-1.65883601e-02, 3.71062458e-02, 8.21944419e-03,
3.54773812e-02, 1.61994863e-02, 7.05077574e-02,
8.83539952e-03, 2.18494236e-02, -5.50299250e-02,
1.19913522e-04, 3.50320153e-02, 4.99191508e-02,
2.08908767e-02, -1.21140964e-02, 3.34130600e-02,
3.32010910e-02, 4.11024615e-02, 1.52024766e-02,
-2.07796833e-03, -4.94461246e-02, 5.46362549e-02,
-1.91538725e-02, 4.89460416e-02, -3.06148790e-02,
-1.46947559e-02, 2.59972401e-02, 1.89346366e-03,
-8.85396544e-03, 1.43837566e-02, -2.83453707e-02,
2.60757376e-02, -7.98581075e-03, 1.98935997e-02,
1.55275883e-02, -8.87474231e-03, 2.31279954e-02,
3.03729586e-02, 2.14788988e-02, 3.26980092e-03,
3.87281664e-02, -1.32220760e-02, 1.12496624e-02,
-5.03361458e-03, -4.75973040e-02, -1.78676229e-02,
-5.71023077e-02, 3.11246980e-02, -5.31051680e-03,
1.57550387e-02, 5.20151109e-02, -5.72069474e-02,
5.15301060e-03, 3.29970457e-02, 2.26406157e-02,
-3.07078045e-02, 1.87925640e-02, -1.86993480e-02,
2.33180430e-02, -1.82738602e-02, -2.46863026e-04,
-8.52621868e-02, 2.53148209e-02, 2.06176797e-03,
-3.89578417e-02, -2.23115413e-03, -4.10256907e-02,
4.55508120e-02, -7.24424720e-02, 5.08268876e-03,
4.42190096e-02, 1.73661846e-03, 1.63513031e-02,
9.09441058e-03, 1.04114031e-02, -4.86889202e-03,
-2.63646524e-02, 7.87991844e-03, 8.30337685e-03,
-1.01753809e-02, -2.47611739e-02, 6.67762756e-02,
4.89272997e-02, 2.38461569e-02, -6.42249808e-02,
8.06081109e-03, 4.79382835e-02, 6.00473769e-02,
2.56416928e-02, -1.06949676e-02, -3.17716524e-02,
-3.00298259e-02, 3.23185213e-02, -1.56965274e-02,
3.98179255e-02, 1.11237010e-02, 3.77057530e-02,
-5.90020716e-02, 6.65883161e-03, 1.55508593e-02,
-3.98168378e-02, -2.19614618e-03, -3.17364447e-02,
9.28087812e-03, -1.57921314e-02, -3.60821970e-02,
2.23480631e-02, 7.26326481e-02, 8.54933541e-03,
-1.96508467e-02, 4.01913226e-02, -3.06365900e-02,
-1.96762756e-02, -3.83199602e-02, 1.48206614e-02,
-2.02410016e-02, -1.89087112e-02, 3.50414440e-02,
3.49851511e-02, -1.53734237e-02, -8.05087294e-03,
-7.98325636e-04, 5.38129210e-02, 4.28347513e-02,
-2.33207271e-02, 1.76745448e-02, -3.91262732e-02,
2.53158361e-02, -5.43349935e-03, 3.27506177e-02,
1.16547355e-02, 2.72344295e-02, -4.21163514e-02,
1.98197179e-02, -3.02518159e-02, 6.00851811e-02,
-3.92581001e-02, 5.69727384e-02, 4.16435599e-02,
-5.45447841e-02, -8.62797443e-03, 5.73354736e-02,
-8.94314330e-03, -3.72394882e-02, 4.12784889e-03,
-1.29805785e-02, 5.85994422e-02, 4.10515368e-02,
-7.88140856e-03, 6.91415817e-02, 1.82892084e-02,
-7.59132132e-02, 3.91482785e-02, 1.00310231e-02,
2.27377769e-02, 3.80710373e-03, 2.31498890e-02,
2.42068712e-02, -1.37068238e-02, -5.82525041e-03,
1.76540092e-02, 5.13952859e-02, -4.77824695e-02,
-5.54873943e-02, 1.18202306e-02, 6.27043992e-02,
1.87041499e-02, -6.80633560e-02, -4.02098186e-02,
-1.18157398e-02, 3.17377560e-02, -4.04558517e-02,
-2.29886267e-02, 1.09081238e-05, 7.27420002e-02,
-1.16472011e-02, -2.37533189e-02, -3.38588320e-02,
-2.19844095e-02, -6.56050304e-03, -1.48762893e-02,
-4.41998839e-02, 5.02602272e-02, 3.84675451e-02,
1.98185090e-02, -6.06090948e-02, 2.10781377e-02,
4.61731991e-03, 4.19402868e-02, -5.42518981e-02,
2.91183051e-02, 4.73365746e-02, 1.81997810e-02,
1.52857509e-02, -2.08034776e-02, -5.36849052e-02,
6.35842606e-02, 2.73762550e-02, 4.96339686e-02,
3.09747737e-02, 1.01277744e-02, -3.44962217e-02,
4.85864840e-02, 1.51261436e-02, 1.49459867e-02,
4.13929597e-02, -3.76431942e-02, 1.56074986e-02,
1.15082236e-02, 2.90358942e-02, 2.45141797e-02,
2.80922279e-02, -1.28367553e-02, 3.14410753e-03,
6.41218573e-02, -2.22793669e-02, -5.03757261e-02,
4.78862412e-02, 3.36291753e-02, -5.38712293e-02,
-8.94516893e-03, 2.78146621e-02, -1.16451690e-02,
-5.78200072e-03, 1.86787676e-02, 9.93121322e-03,
-2.65693059e-03, -1.70143209e-02, -1.55063514e-02,
2.41588745e-02, -1.78989361e-03, 6.64451048e-02,
-1.14727626e-03, -3.28906067e-02, -1.49553595e-03,
-2.82162777e-03, -7.60558620e-02, -1.55637471e-03,
-4.91528492e-03, -2.19545607e-02, 2.75564697e-02,
-1.25497794e-02, -1.68638136e-02, -2.07204279e-02,
-2.36812923e-02, -3.07043232e-02, 6.31876988e-03,
3.43244486e-02, 1.70262139e-02, 4.92697842e-02,
2.93082576e-02, 4.10187989e-03, 2.09337920e-02,
7.19857663e-02, 1.05966022e-03, -1.47689087e-02,
-3.47245783e-02, -2.16487832e-02, -9.95141827e-03,
1.29853487e-02, -5.48050972e-03, -7.70190209e-02,
1.12167513e-02, -1.51839443e-02, -1.83910392e-02,
-1.97323561e-02, -1.48972301e-02, 8.46249908e-02,
6.35140855e-03, 2.03219000e-02, -1.94084048e-02,
-1.08773485e-02, -4.57313657e-02, -4.48526070e-02,
2.56617926e-02, 1.93536896e-02, 1.00775696e-02,
7.18023628e-03, -3.55099514e-02, -4.55002636e-02,
-7.88334943e-03, -4.93443757e-02, 2.36826204e-02,
5.81833301e-03, -7.79057108e-03, -1.31063825e-02,
2.89645623e-02, 9.53624845e-02, 4.62271878e-03,
-2.28673667e-02, -3.00150272e-02, 8.50995071e-03,
2.95229964e-02, -2.00281832e-02, -3.06392740e-02,
3.70190083e-03, 5.17040044e-02, 3.78289409e-02,
-4.52865958e-02, -3.04144323e-02, -6.33255765e-02,
-1.21121779e-02, -2.26585940e-03, 4.00145315e-02,
-7.68749863e-02, -3.96612063e-02, -2.40458176e-02,
-1.22921094e-02, -3.65987513e-03, -5.63489683e-02,
-2.04586294e-02, 2.61900318e-03, -4.19872394e-03,
2.29546800e-02, -3.81762721e-02, -4.06912453e-02,
5.58966659e-02, -2.28307825e-02, -7.99906347e-03,
-6.71147089e-03, 2.39457283e-02, -2.71236263e-02,
3.32449190e-02, -3.31203314e-03, -1.45904701e-02,
-1.48993256e-02, -5.38930371e-02, 8.52978975e-03,
2.41371319e-02, 2.83823945e-02, 2.75515486e-02,
-9.44455899e-03, 2.91343611e-02, 7.87780806e-03,
-3.50229479e-02, -1.73738748e-02, 6.11178856e-03,
-5.17762974e-02, 4.22114553e-03, -7.85780549e-02,
-1.67391486e-02, 2.43430994e-02, -1.91075541e-02,
6.80573098e-03, -8.85819551e-03, 1.04752835e-02,
3.03049795e-02, 2.26538684e-02, -8.99459701e-03,
3.97454314e-02, -1.19835865e-02, 5.36519326e-02,
-1.73529275e-02, 1.60130672e-02, 5.74031807e-02,
1.70067605e-02, -4.71418053e-02, -1.67144754e-03,
5.65098748e-02, 2.57398672e-02, 4.74409908e-02,
-6.82199048e-03, 2.76359431e-02, -3.98197398e-02,
-7.07716541e-03, 5.33328727e-02, -2.15630159e-02,
-6.25305101e-02, -6.47531971e-02, 9.41584911e-03,
3.55411954e-02, 5.78559935e-02, 1.71727333e-02,
-1.82200577e-02, 9.01891442e-04, -3.21503207e-02,
3.91423702e-03, 3.67967710e-02, 2.71635167e-02,
1.97215788e-02, 2.11602226e-02, 4.15352061e-02,
-6.92324042e-02, -2.75356527e-02, 9.73042194e-03,
-7.08991960e-02, 9.29580908e-03, -1.97941475e-02,
2.92060953e-02, 4.80757952e-02, 1.80260055e-02,
-6.51646480e-02, -2.98081860e-02, -2.91780792e-02,
7.60703087e-02, 4.87935953e-02, -3.31642926e-02,
1.61458123e-02, -1.31451205e-04, -2.67255958e-02,
-6.42088661e-03, -1.48142735e-02, 2.20720638e-02,
2.81758467e-03, -2.31219199e-03, -1.48174623e-02,
4.40655202e-02, -9.12187621e-02, 6.04377761e-02,
-1.04248282e-02, 1.06799621e-02, 7.33398050e-02,
-1.46769052e-02, -1.51104974e-02, -1.87099688e-02,
-2.51277480e-02, -1.45520177e-02, 2.16571912e-02,
2.54967492e-02, 2.15571374e-02, -5.59970737e-03,
-7.36506842e-03, 1.23904524e-02, -1.86853316e-02,
8.60909838e-03, -2.25182343e-02, -9.64536238e-03,
-1.17721073e-02, -4.08401042e-02, -2.52085626e-02,
6.12639822e-03, 2.72095632e-02, 1.04894964e-02]]),
'documents': ['Operating the Climate Control System Your Googlecar has a climate control system that allows you to adjust the temperature and airflow in the car. To operate the climate control system, use the buttons and knobs located on the center console. Temperature: The temperature knob controls the temperature inside the car. Turn the knob clockwise to increase the temperature or counterclockwise to decrease the temperature. Airflow: The airflow knob controls the amount of airflow inside the car. Turn the knob clockwise to increase the airflow or counterclockwise to decrease the airflow. Fan speed: The fan speed knob controls the speed of the fan. Turn the knob clockwise to increase the fan speed or counterclockwise to decrease the fan speed. Mode: The mode button allows you to select the desired mode. The available modes are: Auto: The car will automatically adjust the temperature and airflow to maintain a comfortable level. Cool: The car will blow cool air into the car. Heat: The car will blow warm air into the car. Defrost: The car will blow warm air onto the windshield to defrost it.'],
'uris': None,
'data': None,
'metadatas': [None],
'included': [<IncludeEnum.embeddings: 'embeddings'>,
<IncludeEnum.documents: 'documents'>,
<IncludeEnum.metadatas: 'metadatas'>]}
embed_fn.document_mode = False query = "How do you use the touchscreen to play music?" result = db.query(query_texts=[query], n_results=1) [[passage]] = result["documents"] Markdown(passage)
embed_fn의 document_mode를 False로 바꾸어 쿼리 검색 모드로 변경
query에 질문 문장 저장
result에 db.query로 쿼리 검색 결과 저장
n_results=1로 결과는 1개만 출력하도록 설정
RAG는 위에서 언급했던 대로 질문과 유사한 내용을 검색하고 이를 프롬프트에 추가하는 방식임
그렇기에 프롬프트를 조합하는 과정이 필요
아래는 조합 예시
prompt = f"""You are a helpful and informative bot that answers questions using text from the reference passage included below. Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. However, you are talking to a non-technical audience, so be sure to break down complicated concepts and strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it. QUESTION: {query_oneline} PASSAGE: {passage_oneline} """ print(prompt)
You are a helpful and informative bot that answers questions using text from the reference passage included below.
Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.
However, you are talking to a non-technical audience, so be sure to break down complicated concepts and
strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.
QUESTION: How do you use the touchscreen to play music?
PASSAGE: Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon. For example, you can touch the "Navigation" icon to get directions to your destination or touch the "Music" icon to play your favorite songs.
처음에는 프롬프트를 추가하고
다음으로는 사용자의 질문을 추가하고 (QUESTION)
마지막으로 유사 내용을 추가하는 방식 (PASSAGE)
model = genai.GenerativeModel("gemini-1.5-flash-latest") answer = model.generate_content(prompt) Markdown(answer.text)
model에 gemini를 저장해두고
model.generate_content로 답변 생성
generate_conten()의 괄호 안에는 질문이 들어가야 하는데
RAG에서는 9번에서 만든 Prompt로 입력을 대체
생성 결과를 answer에 저장
아래처럼 질문과 쿼리 결과를 참고하여 답변 생성
To play music, you can simply touch the "Music" icon on your Googlecar's touchscreen display.
유사도 검색을 진행할 예정이므로 task_type을 semantic_similarity로 설정
코사인 유사도 검색
texts = [ 'The quick brown fox jumps over the lazy dog.', 'The quick rbown fox jumps over the lazy dog.', 'teh fast fox jumps over the slow woofer.', 'a quick brown fox jmps over lazy dog.', 'brown fox jumping over dog', 'fox > dog', # Alternative pangram for comparison: 'The five boxing wizards jump quickly.', # Unrelated text, also for comparison: 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus et hendrerit massa. Sed pulvinar, nisi a lobortis sagittis, neque risus gravida dolor, in porta dui odio vel purus.', ] response = genai.embed_content(model='models/text-embedding-004', content=texts, task_type='semantic_similarity')
def truncate(t: str, limit: int = 50) -> str: """Truncate labels to fit on the chart.""" if len(t) > limit: return t[:limit-3] + '...' else: return t truncated_texts = [truncate(t) for t in texts]
import pandas as pd import seaborn as sns df = pd.DataFrame(response['embedding'], index=truncated_texts) sim = df @ df.T sns.heatmap(sim, vmin=0, vmax=1);
sim['The quick brown fox jumps over the lazy dog.'].sort_values(ascending=False)
아래 처럼 결과가 나옴
The quick brown fox jumps over the lazy dog. 0.999999
The quick rbown fox jumps over the lazy dog. 0.975623
a quick brown fox jmps over lazy dog. 0.939730
brown fox jumping over dog 0.894507
teh fast fox jumps over the slow woofer. 0.842152
fox > dog 0.776455
The five boxing wizards jump quickly. 0.635346
Lorem ipsum dolor sit amet, consectetur adipisc... 0.472174
Name: The quick brown fox jumps over the lazy dog., dtype: float64
3번 미션 분류 모델은 생략 !
끗