Optimizing vector search using Cohere compressed embeddings

Cloud_ Ghost·2025년 9월 8일

opensearch

목록 보기

11/23

https://docs.opensearch.org/latest/tutorials/vector-search/vector-operations/optimize-compression/

Cohere 압축 임베딩을 사용한 벡터 검색 최적화

이 튜토리얼에서는 Cohere 압축 임베딩을 사용하여 벡터 검색을 최적화하는 방법을 설명합니다. 이러한 임베딩은 벡터 표현의 더 효율적인 저장과 빠른 검색을 가능하게 하여 대규모 검색 애플리케이션에 이상적입니다.

이 튜토리얼은 버전 2.17 이상과 호환되며, 단계 4의 "템플릿 쿼리 및 검색 파이프라인 사용" 부분은 버전 2.19 이상이 필요합니다.

이 튜토리얼에서는 Amazon Bedrock의 Cohere Embed Multilingual v3 모델을 사용합니다. Amazon Bedrock에서 Cohere 압축 임베딩 사용에 대한 자세한 내용은 이 블로그 게시물을 참조하세요.

이 튜토리얼에서는 다음 OpenSearch 구성 요소를 사용합니다:

ML 추론 수집 프로세서
ML 추론 검색 요청 프로세서
검색 템플릿 쿼리
벡터 인덱스 및 바이트 벡터

your_로 시작하는 플레이스홀더를 본인의 값으로 교체하세요.

단계 1: 임베딩 모델 구성

Cohere Embed 모델에 액세스하기 위해 Amazon Bedrock에 대한 커넥터를 생성하는 단계를 따르세요.

단계 1.1: 커넥터 생성

이 블루프린트를 사용하여 임베딩 모델용 커넥터를 생성합니다. 커넥터 생성에 대한 자세한 내용은 커넥터 문서를 참조하세요.

이 튜토리얼에서는 ML 추론 프로세서를 사용하므로 커넥터에서 전처리 또는 후처리 함수를 지정할 필요가 없습니다.

커넥터를 생성하려면 다음 요청을 보내세요. "embedding_types": ["int8"] 매개변수는 Cohere 모델에서 8비트 정수 양자화된 임베딩을 지정합니다. 이 설정은 임베딩을 32비트 부동소수점에서 8비트 정수로 압축하여 저장 공간을 줄이고 계산 속도를 향상시킵니다. 정밀도에서 약간의 손실이 있지만 검색 작업에서는 일반적으로 무시할 수 있습니다. 이러한 양자화된 임베딩은 바이트 벡터를 지원하는 OpenSearch의 knn_index와 호환됩니다:

POST _plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Connector: Cohere embed-multilingual-v3",
  "description": "Test connector for Amazon Bedrock Cohere embed-multilingual-v3",
  "version": 1,
  "protocol": "aws_sigv4",
  "credential": {
    "access_key": "your_aws_access_key",
    "secret_key": "your_aws_secret_key",
    "session_token": "your_aws_session_token"
  },
  "parameters": {
    "region": "your_aws_region",
    "service_name": "bedrock",
    "truncate": "END",
    "input_type": "search_document",
    "model": "cohere.embed-multilingual-v3",
    "embedding_types": ["int8"]
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "headers": {
        "x-amz-content-sha256": "required",
        "content-type": "application/json"
      },
      "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
      "request_body": "{ \"texts\": ${parameters.texts}, \"truncate\": \"${parameters.truncate}\", \"input_type\": \"${parameters.input_type}\", \"embedding_types\":  ${parameters.embedding_types} }"
    }
  ]
}

모델 매개변수에 대한 자세한 내용은 Cohere 문서 및 Amazon Bedrock 문서를 참조하세요.

응답에는 커넥터 ID가 포함됩니다:

{
  "connector_id": "AOP0OZUB3JwAtE25PST0"
}

다음 단계에서 사용할 커넥터 ID를 기록해 두세요.

단계 1.2: 모델 등록

다음으로, 이전 단계에서 생성한 커넥터를 사용하여 모델을 등록합니다. interface 매개변수는 선택사항입니다. 모델이 특정 인터페이스 구성을 필요로 하지 않는 경우, 이 매개변수를 빈 객체로 설정하세요: "interface": {}:

POST _plugins/_ml/models/_register?deploy=true
{
  "name": "Bedrock Cohere embed-multilingual-v3",
  "version": "1.0",
  "function_name": "remote",
  "description": "Bedrock Cohere embed-multilingual-v3",
  "connector_id": "AOP0OZUB3JwAtE25PST0",
  "interface": {
    "input": "{\n    \"type\": \"object\",\n    \"properties\": {\n        \"parameters\": {\n            \"type\": \"object\",\n            \"properties\": {\n                \"texts\": {\n                    \"type\": \"array\",\n                    \"items\": {\n                        \"type\": \"string\"\n                    }\n                },\n                \"embedding_types\": {\n                    \"type\": \"array\",\n                    \"items\": {\n                        \"type\": \"string\",\n                        \"enum\": [\"float\", \"int8\", \"uint8\", \"binary\", \"ubinary\"]\n                    }\n                },\n                \"truncate\": {\n                    \"type\": \"array\",\n                    \"items\": {\n                        \"type\": \"string\",\n                        \"enum\": [\"NONE\", \"START\", \"END\"]\n                    }\n                },\n                \"input_type\": {\n                    \"type\": \"string\",\n                    \"enum\": [\"search_document\", \"search_query\", \"classification\", \"clustering\"]\n                }\n            },\n            \"required\": [\"texts\"]\n        }\n    },\n    \"required\": [\"parameters\"]\n}",
    "output": "{\n    \"type\": \"object\",\n    \"properties\": {\n        \"inference_results\": {\n            \"type\": \"array\",\n            \"items\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"output\": {\n                        \"type\": \"array\",\n                        \"items\": {\n                            \"type\": \"object\",\n                            \"properties\": {\n                                \"name\": {\n                                    \"type\": \"string\"\n                                },\n                                \"dataAsMap\": {\n                                    \"type\": \"object\",\n                                    \"properties\": {\n                                        \"id\": {\n                                            \"type\": \"string\",\n                                            \"format\": \"uuid\"\n                                        },\n                                        \"texts\": {\n                                            \"type\": \"array\",\n                                            \"items\": {\n                                                \"type\": \"string\"\n                                            }\n                                        },\n                                        \"embeddings\": {\n                                            \"type\": \"object\",\n                                            \"properties\": {\n                                                \"binary\": {\n                                                    \"type\": \"array\",\n                                                    \"items\": {\n                                                        \"type\": \"array\",\n                                                        \"items\": {\n                                                            \"type\": \"number\"\n                                                        }\n                                                    }\n                                                },\n                                                \"float\": {\n                                                    \"type\": \"array\",\n                                                    \"items\": {\n                                                        \"type\": \"array\",\n                                                        \"items\": {\n                                                            \"type\": \"number\"\n                                                        }\n                                                    }\n                                                },\n                                                \"int8\": {\n                                                    \"type\": \"array\",\n                                                    \"items\": {\n                                                        \"type\": \"array\",\n                                                        \"items\": {\n                                                            \"type\": \"number\"\n                                                        }\n                                                    }\n                                                },\n                                                \"ubinary\": {\n                                                    \"type\": \"array\",\n                                                    \"items\": {\n                                                        \"type\": \"array\",\n                                                        \"items\": {\n                                                            \"type\": \"number\"\n                                                        }\n                                                    }\n                                                },\n                                                \"uint8\": {\n                                                    \"type\": \"array\",\n                                                    \"items\": {\n                                                        \"type\": \"array\",\n                                                        \"items\": {\n                                                            \"type\": \"number\"\n                                                        }\n                                                    }\n                                                }\n                                            }\n                                        },\n                                        \"response_type\": {\n                                            \"type\": \"string\"\n                                        }\n                                    },\n                                    \"required\": [\"embeddings\"]\n                                }\n                            },\n                            \"required\": [\"name\", \"dataAsMap\"]\n                        }\n                    },\n                    \"status_code\": {\n                        \"type\": \"integer\"\n                    }\n                },\n                \"required\": [\"output\", \"status_code\"]\n            }\n        }\n    },\n    \"required\": [\"inference_results\"]\n}"
  }
}

자세한 내용은 모델 인터페이스 문서를 참조하세요.

응답에는 모델 ID가 포함됩니다:

{
  "task_id": "COP0OZUB3JwAtE25yiQr",
  "status": "CREATED",
  "model_id": "t64OPpUBX2k07okSZc2n"
}

모델을 테스트하려면 다음 요청을 보내세요:

POST _plugins/_ml/models/t64OPpUBX2k07okSZc2n/_predict
{
  "parameters": {
    "texts": ["Say this is a test"],
    "embedding_types": ["int8"]
  }
}

응답에는 생성된 임베딩이 포함됩니다:

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "id": "db07a08c-283d-4da5-b0c5-a9a54ef35d01",
            "texts": [
              "Say this is a test"
            ],
            "embeddings": {
              "int8": [
                [
                  -26.0,
                  31.0,
                  ...
                ]
              ]
            },
            "response_type": "embeddings_by_type"
          }
        }
      ],
      "status_code": 200
    }
  ]
}

단계 2: 수집 파이프라인 생성

수집 파이프라인을 사용하면 문서를 인덱싱하기 전에 처리할 수 있습니다. 이 경우 데이터의 title 및 description 필드에 대한 임베딩을 생성하는 데 사용합니다.

파이프라인을 설정하는 두 가지 방법이 있습니다:

title과 description에 대해 모델을 개별적으로 호출: 이 옵션은 각 필드에 대해 별도의 요청을 보내 독립적인 임베딩을 생성합니다.
title과 description을 결합하여 모델을 한 번 호출: 이 옵션은 필드를 단일 입력으로 연결하고 한 번의 요청을 보내 두 필드를 모두 나타내는 단일 임베딩을 생성합니다.

옵션 1: title과 description에 대해 모델을 개별적으로 호출

PUT _ingest/pipeline/ml_inference_pipeline_cohere
{
  "processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run ml inference during ingest request",
        "model_id": "t64OPpUBX2k07okSZc2n",
        "input_map": [
          {
            "texts": "$..title"
          },
          {
            "texts": "$..description"
          }
        ],
        "output_map": [
          {
            "title_embedding": "embeddings.int8[0]"
          },
          {
            "description_embedding": "embeddings.int8[0]"
          }
        ],
        "model_config": {
          "embedding_types": ["int8"]
        },
        "ignore_failure": false
      }
    }
  ]
}

옵션 2: title과 description을 결합하여 모델을 한 번 호출

PUT _ingest/pipeline/ml_inference_pipeline_cohere
{
  "description": "Concatenate title and description fields",
  "processors": [
    {
      "set": {
        "field": "title_desc_tmp",
        "value": [
          "",
          ""
        ]
      }
    },
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run ml inference during ingest request",
        "model_id": "t64OPpUBX2k07okSZc2n",
        "input_map": [
          {
            "texts": "title_desc_tmp"
          }
        ],
        "output_map": [
          {
            "title_embedding": "embeddings.int8[0]",
            "description_embedding": "embeddings.int8[1]"
          }
        ],
        "model_config": {
          "embedding_types": ["int8"]
        },
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "field": "title_desc_tmp"
      }
    }
  ]
}

다음 시뮬레이션 요청을 보내 파이프라인을 테스트합니다:

POST _ingest/pipeline/ml_inference_pipeline_cohere/_simulate
{
  "docs": [
    {
      "_index": "books",
      "_id": "1",
      "_source": {
        "title": "The Great Gatsby",
        "author": "F. Scott Fitzgerald",
        "description": "A novel of decadence and excess in the Jazz Age, exploring themes of wealth, love, and the American Dream.",
        "publication_year": 1925,
        "genre": "Classic Fiction"
      }
    }
  ]
}

응답에는 생성된 임베딩이 포함됩니다:

{
  "docs": [
    {
      "doc": {
        "_index": "books",
        "_id": "1",
        "_source": {
          "publication_year": 1925,
          "author": "F. Scott Fitzgerald",
          "genre": "Classic Fiction",
          "description": "A novel of decadence and excess in the Jazz Age, exploring themes of wealth, love, and the American Dream.",
          "title": "The Great Gatsby",
          "title_embedding": [
            18,
            33,
            ...
          ],
          "description_embedding": [
            -21,
            -14,
            ...
          ]
        },
        "_ingest": {
          "timestamp": "2025-02-25T09:11:32.192125042Z"
        }
      }
    }
  ]
}

단계 3: 벡터 인덱스 생성 및 데이터 수집

다음으로 벡터 인덱스를 생성합니다:

PUT books
{
  "settings": {
    "index": {
      "default_pipeline": "ml_inference_pipeline_cohere",
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "title_embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "data_type": "byte",
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "parameters": {
            "ef_construction": 100,
            "m": 16
          }
        }
      },
      "description_embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "data_type": "byte",
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "parameters": {
            "ef_construction": 100,
            "m": 16
          }
        }
      }
    }
  }
}

인덱스에 테스트 데이터를 수집합니다:

POST _bulk
{"index":{"_index":"books"}}
{"title":"The Great Gatsby","author":"F. Scott Fitzgerald","description":"A novel of decadence and excess in the Jazz Age, exploring themes of wealth, love, and the American Dream.","publication_year":1925,"genre":"Classic Fiction"}
{"index":{"_index":"books"}}
{"title":"To Kill a Mockingbird","author":"Harper Lee","description":"A powerful story of racial injustice and loss of innocence in the American South during the Great Depression.","publication_year":1960,"genre":"Literary Fiction"}
{"index":{"_index":"books"}}
{"title":"Pride and Prejudice","author":"Jane Austen","description":"A romantic novel of manners that follows the character development of Elizabeth Bennet as she learns about the repercussions of hasty judgments and comes to appreciate the difference between superficial goodness and actual goodness.","publication_year":1813,"genre":"Romance"}

단계 4: 인덱스 검색

다음과 같은 방법으로 인덱스에서 벡터 검색을 실행할 수 있습니다:

템플릿 쿼리 및 검색 파이프라인 사용
검색 파이프라인에서 쿼리 재작성

템플릿 쿼리 및 검색 파이프라인 사용

먼저 검색 파이프라인을 생성합니다:

PUT _search/pipeline/ml_inference_pipeline_cohere_search
{
  "request_processors": [
    {
      "ml_inference": {
        "model_id": "t64OPpUBX2k07okSZc2n",
        "input_map": [
          {
            "texts": "$..ext.ml_inference.text"
          }
        ],
        "output_map": [
          {
            "ext.ml_inference.vector": "embeddings.int8[0]"
          }
        ],
        "model_config": {
          "input_type": "search_query",
          "embedding_types": ["int8"]
        }
      }
    }
  ]
}

다음으로 템플릿 쿼리를 사용하여 검색을 실행합니다:

GET books/_search?search_pipeline=ml_inference_pipeline_cohere_search&verbose_pipeline=false
{
  "query": {
    "template": {
      "knn": {
        "description_embedding": {
          "vector": "${ext.ml_inference.vector}",
          "k": 10
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "text": "American Dream"
    }
  },
  "_source": {
    "excludes": [
      "title_embedding", "description_embedding"
    ]
  },
  "size": 2
}

각 검색 프로세서의 입력과 출력을 보려면 요청에 &verbose_pipeline=true를 추가하세요. 이는 디버깅 및 검색 파이프라인이 쿼리를 어떻게 수정하는지 이해하는 데 유용합니다. 자세한 내용은 검색 파이프라인 디버깅을 참조하세요.

검색 파이프라인에서 쿼리 재작성

쿼리를 재작성하는 다른 검색 파이프라인을 생성합니다:

PUT _search/pipeline/ml_inference_pipeline_cohere_search2
{
  "request_processors": [
    {
      "ml_inference": {
        "model_id": "t64OPpUBX2k07okSZc2n",
        "input_map": [
          {
            "texts": "$..match.description.query"
          }
        ],
        "output_map": [
          {
            "query_vector": "embeddings.int8[0]"
          }
        ],
        "model_config": {
          "input_type": "search_query",
          "embedding_types": ["int8"]
        },
        "query_template": """
          {
            "query": {
              "knn": {
                "description_embedding": {
                  "vector": ${query_vector},
                  "k": 10
                }
              }
            },
            "_source": {
              "excludes": [
                "title_embedding",
                "description_embedding"
              ]
            },
            "size": 2
          }
        """
      }
    }
  ]
}

이제 이 파이프라인을 사용하여 벡터 검색을 실행합니다:

GET books/_search?search_pipeline=ml_inference_pipeline_cohere_search2
{
  "query": {
    "match": {
      "description": "American Dream"
    }
  }
}

응답에는 일치하는 문서가 포함됩니다:

{
  "took": 96,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 7.271585e-7,
    "hits": [
      {
        "_index": "books",
        "_id": "U640PJUBX2k07okSEMwy",
        "_score": 7.271585e-7,
        "_source": {
          "publication_year": 1925,
          "author": "F. Scott Fitzgerald",
          "genre": "Classic Fiction",
          "description": "A novel of decadence and excess in the Jazz Age, exploring themes of wealth, love, and the American Dream.",
          "title": "The Great Gatsby"
        }
      },
      {
        "_index": "books",
        "_id": "VK40PJUBX2k07okSEMwy",
        "_score": 6.773544e-7,
        "_source": {
          "publication_year": 1960,
          "author": "Harper Lee",
          "genre": "Literary Fiction",
          "description": "A powerful story of racial injustice and loss of innocence in the American South during the Great Depression.",
          "title": "To Kill a Mockingbird"
        }
      }
    ]
  }
}

단계 5 (선택사항): 바이너리 임베딩 사용

이 섹션에서는 더욱 효율적인 저장과 빠른 검색을 제공하는 바이너리 임베딩을 지원하도록 설정을 확장합니다. 바이너리 임베딩은 저장 요구사항을 크게 줄이고 검색 속도를 향상시켜 대규모 애플리케이션에 이상적입니다.

커넥터나 모델을 수정할 필요는 없으며 벡터 인덱스, 수집 파이프라인, 검색 파이프라인만 업데이트하면 됩니다.

단계 5.1: 수집 파이프라인 생성

단계 2와 동일한 구성을 사용하되 int8을 모두 binary로 교체하여 ml_inference_pipeline_cohere_binary라는 새 수집 파이프라인을 생성합니다.

단계 5.2: 벡터 인덱스 생성 및 데이터 수집

바이너리 벡터 필드가 포함된 새 벡터 인덱스를 생성합니다:

PUT books_binary_embedding
{
  "settings": {
    "index": {
      "default_pipeline": "ml_inference_pipeline_cohere_binary",
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "title_embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "data_type": "binary",
        "space_type": "hamming",
        "method": {
          "name": "hnsw",
          "engine": "faiss"
        }
      },
      "description_embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "data_type": "binary",
        "space_type": "hamming",
        "method": {
          "name": "hnsw",
          "engine": "faiss"
        }
      }
    }
  }
}

단계 5.3: 검색 파이프라인 생성

단계 2와 동일한 구성을 사용하되 int8을 모두 binary로 교체하여 ml_inference_pipeline_cohere_search_binary라는 새 검색 파이프라인을 생성합니다.

embeddings.int8[0]을 embeddings.binary[0]으로 변경
"embedding_types": ["int8"]을 "embedding_types": ["binary"]로 변경

그런 다음 단계 4에서 설명한 대로 검색 파이프라인을 사용하여 벡터 검색을 실행할 수 있습니다.

Cloud_ Ghost

행복합시다~

이전 포스트

Semantic search using the OpenAI embedding model

다음 포스트