Elasticsearch Data Types

이상민·2021년 5월 4일
0
post-thumbnail

1. Data Types of Elasticsearch

There are many data types, some of which are also present in common programming languages ie) long, int, boolean, etc.

  • there are also data types specific to elasticsearch

1-1. object data type

used for any json object

  • objects may be nested

  • "properties" key is added instead of specifying "type" key

  • Apache Lucene doesn't support object data type, so elasticsearch transforms objects to ensure that we can index any valid JSON

  • Objects are flattened to be stored. Hierarchy maintained by dots

  • Array of data are grouped into array of fields
    • If query is run in this format, it goes through all the elements
    • this results in data loss of field relationship (ex. review by Jon Doe that is rated 3.5)

  • To prevent this, data type called "nested" is used
    • purpose is to maintain relationship between fields
    • objects are stored independently
    • objects are stored as hidden docs as Lucene doesn't have object type

1-2. keyword

used for exact matching of values

  • typically used for filtering, aggregations, and sorting
  • ex) searching for articles with a status of "published"
  • for full-text searches, use text data type instead

how keyword data type works

  • keyword fields are analyzed with keyword analyzer
  • keyword analyzer is a no-op analyzer
    • outputs unmodified string as a single token
  • inverted indexing with keyword


3. Type Coercion

  • Data types are inspected when indexing docs
    • through inspection, invalid values are rejected
    • but sometimes, providing wrong data type is ok
PUT /coercion_test/_doc/1
{
    "price": 7.4
}

PUT /corecion_test/_doc/2
{
    "price": "7.4"
}

PUT /corecion_test/_doc/3
{
    "price": "7.4m"
}
  1. when first query is sent, index is automatically mapped to float data type
  2. when second query is sent, coercion checks data type and value is converted to float
  3. when third query is sent, error occurs as coercing is not possible
  • in _source field of 2nd doc, data is still stored as original string
    • _source contains values supplied at index time, this value is not actually indexed
    • within Lucene(index) value is stored as floating num
  • coercion is not used when creating mapping

4. Array

  • there is no such ting as an array data type

  • any field may contain zero or more values

    • array of text is stored by concatenating array values with space in between
    • array of other data types are not processed and stored with appropriate data type
  • array values should be of the same type

  • for mix of data types to be used in the same array, it should be coercible

  • nested arrays will be flattened upon indexing
  • nested data type should be used for arrays of objects to be queried independently
profile
편하게 읽기 좋은 단위의 포스트를 추구하는 개발자입니다

0개의 댓글