검색결과 순위 상위 노출하기

서아로·2023년 11월 15일

사내 프로젝트로 검색엔진 구축 진행 중에 있다.
검색시 검색결과의 순서를 조정하는 방법을 고심하던 중 python elasticsearch client를 사용하여 검색결과의 순위를 상위로 고정해보자.

방법

검색어가 Boost_index(검색 순위 상위 고정 할 정보를 담은 인덱스)의 Doc_id와 같을 때, 해당하는 doc에 존재하는 boost_id의 문서를 해당 검색어 입력 시 가중치를 주어 상위로 결과를 노출 시킨다.

테스트 검색 인덱스 : doc_index_test
부스팅 할 문서의 정보가 들어있는 인덱스 : boost_index
queryDSL과 그에 맞는 코드를 같이 첨부하였음

예시로 "기능"이라는 검색어를 기준으로 해보자. doc_index_test에 기능 검색시 5개 결과가 나온다.

1. boost_index의 doc_id와 검색어 동일한지 확인하기

QueryDSL

GET boost_index/_search
 {
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "_id": "기능"
          }
        }
      ]
    }
  },
  "size":100
}

python

from fastapi import APIRouter, HTTPException, Request
from engine.elasticsearch_client import es
from typing import Optional, List, Any
from engine.elasticsearch_query import get_search_id, get_search_should_overhead
from engine.elasticsearch_result import handle_search_results
import re

router = APIRouter()

# 카테고리별 인덱스 매핑
CATEGORY_INDEX_MAP = {
    "doc": "doc_index_test"
}

BOOST_INDEX_MAP2 = {
    "doc": "boost_index"
}

#boost_index에 검색어가 포함된 doc id에 있는지 확인하는 함수 
def search_boostkey_id(
       request: Request, 
       query: str,
       category: Optional[str] = "total"
):

    body = get_search_id(query,size=100)
    
    if category == "total":
        for cat, index_pattern in BOOST_INDEX_MAP2.items():
            response = es.search(index=index_pattern, body=body)
            results = handle_search_results(response)
            print("::results::",results)
            print("::response::",response)

        return results
    
    else:
        index_to_search = CATEGORY_INDEX_MAP.get(category, "farm_test_doc_index")  # 기본값 설정
        response = es.search(index=index_to_search, body=body)
        results = handle_search_results(response)
        
        return results

2. doc에 존재하는 boost_id의 문서를 해당 검색어 입력 시 상위로 결과노출 시킨다.

QueryDSL

GET doc_index_test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "기능"
          }
        }
      ],
      "should": [
        {
          "term": {
            "_id": {
              "value": "kIZt-YoB6MtPcT2qgxgS",
              "boost": 1000
            }
          }
        },
        {
          "term": {
            "_id": {
              "value": "k4Zt-YoB6MtPcT2qiBh7",
              "boost": 1000
            }
          }
        }
      ]
    }
  }
}

python

@router.get('/search/doc/overhead')
async def search_overhead(query:str, page: int = 1, size: int = 10):        
    
    #주어진 데이터
    data = search_boostkey_id(Request,query)
    print("::data::",data)

    # documents에서 keyword와 입력한 단어가 일치하는 경우 boost_id를 추출
    boost_ids = []
    doc_keywords = []
    # documents에서 keyword 추출 (keyword= _id)
    for document in data['documents']:
        keyword = document.get('keyword')

        # keyword가 검색어와 일치한다면
        if keyword == query:
            boost_list = document.get('boost', [])
            print("단어일치할때 부스트리스트:",boost_list)

            # boost_list가 비어 있지 않다면
            if boost_list:
                # boost_id를 for문으로 추출하여 리스트에 추가
                boost_ids.extend(boost['boost_id'] for boost in boost_list)

    
    body = get_search_should_overhead(query,boost_ids,size=size)
		print("::body::",body)
    all_results = {}
    
    for cat, index_pattern in CATEGORY_INDEX_MAP.items():
        response = es.search(index=index_pattern, body=body)
        results = handle_search_results(response)
        all_results[cat]={
            **results['meta'],
            'page': page,
            'size': size,
            'documents': results['documents']
        }
    return all_results

서아로

이전 포스트

Elastic 검색을 위한 수집 파이프라인 최적화 -1

다음 포스트

검색결과 순위 상위 노출하기

방법

1. boost_index의 doc_id와 검색어 동일한지 확인하기

QueryDSL

python

2. doc에 존재하는 boost_id의 문서를 해당 검색어 입력 시 상위로 결과노출 시킨다.

QueryDSL

python

Elastic 검색을 위한 수집 파이프라인 최적화 -1

Elastic Cloud의 Snapshot 기능을 이용하여 AWS S3로 마이그레이션 하기

0개의 댓글