Routing and Doc Versioning

이상민·2021년 4월 28일
0
post-thumbnail

1. Routing

how does Elasticsearch know where to store docs?

how are documents found once indexed?

=> through routing

  • Routing : process of resolving a shard for a doc

  • elasticsearch uses hashing to select shard

    • changing routing equation is possible
  • shard_num = hash(_routing) % num_primary_shards


2. How Elasticsearch Reads Data

  1. coordinating node receives request
  2. find which replication goup to look for using routing
  3. select shard using Adaptive replica selection
  4. node returns response


3. How Elasticsearch Writes Data

  1. coordinating node receives request
  2. find shard to write using routing
  3. data is sent to primary shard
  4. primary shard validates data and save
  5. data is locally shared among replication group

  • when replication fails, elasticsearch goes through recovery process

    • ex) when pri shard fail while replication
      • new pri is elected, but data between shards are not the same
      • primary terms and sequence numbers are used to resolve
  • Primary terms : way to distinguish between old and new primary shards

    • counter for how many times the pri shard changed
    • primary terms is appended to write operation
    • can check if pri shard changed since request
  • Sequence number : way to track operation

    • counter incremented for each operation
    • appended to write operation
    • pri shard increases sequence number
  • for large indexes, Elasticsearch uses checkpoints

    • replica has local check point
    • replication group has global checkpoint

4. Document Versioning

  • only stores recent document

  • store version number of document

  • internal vs external versioning

    • internal : default
    • external : when maintaining versions outside of elasticsearch ex. RDBMS
  • above versioning is replaced by a better way

4-1. Optimistic Concurrency Control

prevent old doc overwriting new doc

  • update should fail if data changed, then try again
  • used to be handled by doc version, but now handled using Primary terms and Sequence number

POST /{index name}/_update/{doc name}?if_primary_term={X}&if_seq_no={Y}
  • if primary_term and seq_no are not the same, error is thrown
  • error should be handled in application level and retried
profile
편하게 읽기 좋은 단위의 포스트를 추구하는 개발자입니다

0개의 댓글