1. Data Types of Elasticsearch
There are many data types, some of which are also present in common programming languages ie) long, int, boolean, etc.
- there are also data types specific to elasticsearch
1-1. object data type
used for any json object
-
Apache Lucene doesn't support object data type, so elasticsearch transforms objects to ensure that we can index any valid JSON
-
Objects are flattened to be stored. Hierarchy maintained by dots
- Array of data are grouped into array of fields
- If query is run in this format, it goes through all the elements
- this results in data loss of field relationship (ex. review by Jon Doe that is rated 3.5)
- To prevent this, data type called "nested" is used
- purpose is to maintain relationship between fields
- objects are stored independently
- objects are stored as hidden docs as Lucene doesn't have object type
1-2. keyword
used for exact matching of values
- typically used for filtering, aggregations, and sorting
- ex) searching for articles with a status of "published"
- for full-text searches, use text data type instead
how keyword data type works
- keyword fields are analyzed with keyword analyzer
- keyword analyzer is a no-op analyzer
- outputs unmodified string as a single token
- inverted indexing with keyword
3. Type Coercion
- Data types are inspected when indexing docs
- through inspection, invalid values are rejected
- but sometimes, providing wrong data type is ok
PUT /coercion_test/_doc/1
{
"price": 7.4
}
PUT /corecion_test/_doc/2
{
"price": "7.4"
}
PUT /corecion_test/_doc/3
{
"price": "7.4m"
}
- when first query is sent, index is automatically mapped to float data type
- when second query is sent, coercion checks data type and value is converted to float
- when third query is sent, error occurs as coercing is not possible
- in
_source
field of 2nd doc, data is still stored as original string
_source
contains values supplied at index time, this value is not actually indexed
- within Lucene(index) value is stored as floating num
- coercion is not used when creating mapping
4. Array
-
there is no such ting as an array data type
-
any field may contain zero or more values
- array of text is stored by concatenating array values with space in between
- array of other data types are not processed and stored with appropriate data type
-
array values should be of the same type
-
for mix of data types to be used in the same array, it should be coercible
- nested arrays will be flattened upon indexing
- nested data type should be used for arrays of objects to be queried independently