Opensearch/Elasticsearch - Rollover로 index 관리하기

brillog·2023년 8월 14일

Opensource

목록 보기

1/4

AWS Opensearch/Elasticsearch의 Rollover 기능에 대해 알아보도록 하겠습니다.

Rollover란?

각 index의 '사이즈' 및 '보관기간'을 관리하기 위한 기능입니다.
index의 '사이즈' or '보관기간'이 미리 설정해 둔 값(ex.50GB or 90일)에 도달하면 기존 index를 Rollover 하고 새 index에 쓰기 시작합니다.

먼저 Rollover를 하지 않는 일반적인 로그 분석용 구성을 생각해 보면 서버의 Fluentd 에이전트에서 Opensearch로 로그를 송신할 때, project-a-2023.06라는 index로 로그를 보냅니다. (7월이 되면 project-a-2023.07 index로 로그를 보내겠죠?)

위 구성에서 Rollover가 수행되려면 project-a-2023.06이 특정 사이즈인 ex.50GB를 넘겼을 때, 다른 index로 교체되어 교체된 index에 로그가 쌓여야 할 겁니다. 이때 새로 교체될 index의 이름은 기존 index 이름인 project-a-2023.06와 같을 수 없기 때문에 새로운 이름(ex.project-a-2023.06-2)으로 생성되어야 합니다.

하지만 Fluentd 에이전트는 여전히 로그를 기존 index인 project-a-2023.06로 보내고 있을 것이기 때문에 위와 같은 일반적인 구성에서 Rollover가 불가능합니다.

따라서 Rollover를 가능하게 하기 위해서는 'alias'에 대해 먼저 알아야 합니다.

Alias란?

단어 뜻 그대로 '별칭', '가명' 등을 의미합니다. index에 별명을 붙여주는 기능이라고 생각하면 될 것 같습니다.

(제 마음대로 예시를 들어보자면) EC2에 접근하기 위해 LB를 타고 들어오는 것처럼, 특정 index에 로그를 쌓기 위해 해당 index 앞단에 있는 alias를 타고 index에 접근한다고 이해해도 될 것 같습니다 :)

유저 → Load balancer → EC2
Fluentd 에이전트 → Alias → index

Rollover 구성해 보기

Opensearch를 기준으로 작성되었으며, Elasticsearch를 사용하시는 분들은 아래 링크를 참고하시면 됩니다. https://www.elastic.co/kr/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management

1. project-a alias를 생성하고 alias에 index를 매핑한다. (index는 '*-[0-9]' 꼴이어야 함)

PUT project-a-000001
{
  "aliases": {
    "project-a": {
      "is_write_index": true
    }
  }
}

2. Rollover를 위한 ISM Policy를 생성한다.

PUT _plugins/_ism/policies/my_rollover_policy_name
{
  "policy": {
    "description": "Example rollover policy.",
    "default_state": "rollover",
    "states": [
      {
        "name": "rollover",
        "actions": [
          {
            "rollover": {
              "min_size": "50gb",  # 49GB까지는 Rollover X (ex.53GB -> Rollover 동작)
              "min_index_age": "90d"
            }
          }
        ],
        "transitions": []
      }
    ],
    "ism_template": {
      "index_patterns": ["project-a"],
      "priority": 100  # 숫자가 클수록 높은 우선순위를 가짐 (priority: 100 > 99)
    }
  }
}

3. ISM Template에 rollover_alias를 설정한다.

PUT _index_template/my_ism_template_for_rollover
{
  "index_patterns": ["project-a-*"],
  "template": {
    "settings": {
      "plugins.index_state_management.rollover_alias": "project-a"
    }
  }
}

Rollover 전·후 비교

Rollover Comparison

Rollover 되는 기존의 index들은 read_write →read_only로 바뀌어 read만 가능한 상태(write X)가 됩니다.

그렇다면 read_only index는 어떤 장점이 있길래 Rollover가 index를 read_only로 바꾸는 작업을 수행하는 걸까요?

`read_only` index의 장점

read_only index는 force merge API를 사용하여 single segment로 병합(merge)이 가능합니다. 이는 segment 수를 줄일 수 있다는 것을 의미합니다.

segment란?
Elasticsearch의 shard는 Lucene index이며, Lucene index는 segment로 나뉩니다.
segment는 index 데이터가 저장되는 internal storage elements이고 변경할 수 없습니다(immutable).
index 크기를 유지하기 위해 작은 segment들은 주기적으로 더 큰 segment들로 병합됩니다.

이상 Rollover에 대해 알아보았습니다.

Opensearch와 Elasticsearch 도큐먼트를 읽어도 직접 테스트해 보기 전까지는 완벽히 이해가 되지는 않더라고요..! 그래서 직접 테스트해 본 것을 토대로 Rollover에 대해 정리해 보았습니다.

index의 크기를 일정하게 관리하여 index 별 적절한 primary shard 수를 유지하고 싶거나 segment 수를 줄여 검색 성능을 높이고 싶으신 분들은 적용하면 좋은 기능인 것 같습니다.

Reference

Rollover 개념

ISM/ILM Rollover Policy 설정

Rollover 조건

https://opensearch.org/docs/latest/im-plugin/ism/error-prevention/index/#rollover

'read_only' index의 force merge

개인적으로 공부하며 작성한 글로, 내용에 오류가 있을 수 있습니다.

brillog

클라우드 엔지니어 ♡

다음 포스트

Opensearch/Elasticsearch - Rollover로 index 관리하기

Opensource

Rollover란?

Alias란?

Rollover 구성해 보기

Rollover 전·후 비교

`read_only` index의 장점

Reference

Nexus Repository로 yum proxy 구성하기 (CentOS)

0개의 댓글

Opensearch/Elasticsearch - Rollover로 index 관리하기

Opensource

Rollover란?

Alias란?

Rollover 구성해 보기

Rollover 전·후 비교

read_only index의 장점

Reference

Nexus Repository로 yum proxy 구성하기 (CentOS)

0개의 댓글

`read_only` index의 장점