Milvus GPU Index

vernolog·2024년 12월 8일

Milvus는 검색 성능과 효율성 향상을 위해 다양한 GPU 인덱스 유형을 지원한다. 특히 high-throughput 및 high-recall 시나리오에서 유용하다. 이 글에서는 Milvus에서 지원하는 GPU 인덱스 유형, 적합한 사용 사례, 성능 특성에 대한 개요에 대해 설명한다. GPU를 통한 인덱스 구축에 대한 더 많은 내용은 Index with GPU를 참조

여기서 중요한 점은 GPU 인덱스를 사용한다고 해서 CPU 인덱스보다 반드시 지연 시간이 줄어드는 것은 아니다. 완전한 처리량을 극대화하려면 매우 높은 요청 압력 또는 많은 수의 질의 벡터가 필요하다.
performance

GPU_CAGRA

GPU_CAGRA는 GPU에 최적화된 그래프 기반 인덱스

Index building parameters

Parameter	Description	Default Value
`intermediate_graph_degree`	Affects recall and build time by determining the graph’s degree before pruning. Recommended values are `32` or `64`.	`128`
`graph_degree`	Affects search performance and recall by setting the graph’s degree after pruning. A larger difference between these two degrees results in a longer build time. Its value must be smaller than the value of intermediate_graph_degree.	`64`
`build_algo`	Selects the graph generation algorithm before pruning. Possible values:`IVF_PQ`: Offers higher quality but slower build time.`NN_DESCENT`: Provides a quicker build with potentially lower recall.	`IVF_PQ`
`cache_dataset_on_device`	Decides whether to cache the original dataset in GPU memory. Possible values:`“true”`: Caches the original dataset to enhance recall by refining search results.`“false”`: Does not cache the original dataset to save gpu memory.	`“false”`

GPU_IVF_FLAT

알고리즘은 IVF_FLAT와 같으며 이를 gpu를 태워 실행
검색을 수행할 때, GPU_IVF_FLAT 인덱싱된 컬렉션에 대한 검색에서 top-K 값을 최대 256으로 설정할 수 있음

Index building parameters

Parameter	Description	Range	Default Value
`nlist`	Number of cluster units	[1, 65536]	`128`
`cache_dataset_on_device`	Decides whether to cache the original dataset in GPU memory. Possible values:`“true”`: Caches the original dataset to enhance recall by refining search results.`“false”`: Does not cache the original dataset to save gpu memory.	`"true"` `"flase"`	`"false"`

search parameters
- Common search
  Parameter Description Range Default Value
  nprobe Number of units to query [1, nlist] 8
- Limits on search
  Parameter Range
  limit (top-K) <= 2048

Parameter	Description	Range	Default Value
`nprobe`	Number of units to query	[1, nlist]	`8`

Parameter	Range
`limit` (top-K)	<= `2048`

GPU_IVF_PQ

알고리즘은 GPU_IVF_PQ와 같으며 이를 gpu를 태워 실행
최대 top-k를 8192 값으로 설정 가능
index building parameters

Parameter	Description	Range	Default Value
`nlist`	Number of cluster units	[1, 65536]	`128`
`m`	Number of factors of product quantization,	`dim mod m or = 0`	`0`
`nbits`	[Optional] Number of bits in which each low-dimensional vector is stored.	[1, 16]	`8`
`cache_dataset_on_device`	Decides whether to cache the original dataset in GPU memory. Possible values:`“true”`: Caches the original dataset to enhance recall by refining search results.`“false”`: Does not cache the original dataset to save gpu memory.	`"true"` `"false"`

Search parameters
- common search
  Parameter Description Range Default Value
  nprobe Number of units to query [1, nlist] 8
- Limits on search
  Parameter Range
  limit (top-K) <= 1024