POST /_analyze
{
"text": "2 guys walk into a bar, but the third... DUCKS! :-)",
"analyzer": "standard"
}
POST /_analyze
{
"text": "2 guys walk into a bar, but the third... DUCKS! :-)"
}
POST /_analyze
{
"text": "2 guys walk into a bar, but the third... DUCKS! :-)",
"char_filter": [],
"tokenizer": "standard",
"filter": ["lowercase"]
}
#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.13/security-minimal-setup.html to enable security.
{
"tokens" : [
{
"token" : "2",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<NUM>",
"position" : 0
},
{
"token" : "guys",
"start_offset" : 2,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "walk",
"start_offset" : 7,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "into",
"start_offset" : 12,
"end_offset" : 16,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "a",
"start_offset" : 19,
"end_offset" : 20,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "bar",
"start_offset" : 21,
"end_offset" : 24,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "but",
"start_offset" : 26,
"end_offset" : 29,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "the",
"start_offset" : 30,
"end_offset" : 33,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "third",
"start_offset" : 34,
"end_offset" : 39,
"type" : "<ALPHANUM>",
"position" : 8
},
{
"token" : "ducks",
"start_offset" : 43,
"end_offset" : 48,
"type" : "<ALPHANUM>",
"position" : 9
}
]
}
Similar to the 'object' data type, but maintains object relationships
Enables us to query objects independently
object가 없는 apache lucene엔 어떻게 object가 저장될까?
1000개의 review가 nested형태로 있는데 product 1개를 indexing한다고 해보자.
POST /_analyze
{
"text": "2 guys walk into a bar, but the third... DUCKS! :-)",
"analyzer": "keyword"
}
{
"tokens" : [
{
"token" : "2 guys walk into a bar, but the third... DUCKS! :-)",
"start_offset" : 0,
"end_offset" : 53,
"type" : "word",
"position" : 0
}
]
}
PUT /coercion_test/_doc/1
{
"price": 7.4
}
PUT /coercion_test/_doc/2
{
"price": "7.4"
}
PUT /coercion_test/_doc/1
{
"price": "7.4m"
}
GET /coercion_test/_doc/1
GET /coercion_test/_doc/2
"_source" : {
"price" : 7.4
}
"_source" : {
"price" : "7.4"
}
}
POST /_analyze
{
"text": ["sting number one", "string number two"],
"analyzer": "standard"
}
#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.13/security-minimal-setup.html to enable security.
{
"tokens" : [
{
"token" : "sting",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "number",
"start_offset" : 6,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "one",
"start_offset" : 13,
"end_offset" : 16,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "string",
"start_offset" : 17,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "number",
"start_offset" : 24,
"end_offset" : 30,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "two",
"start_offset" : 31,
"end_offset" : 34,
"type" : "<ALPHANUM>",
"position" : 5
}
]
}
Remember to use the nested data type for arrays of objects if you need to query the objects independently.
PUT /reviews
{
"mappings": {
"properties": {
"rating": {"type": "float"},
"content": {"type": "text"},
"prouct_id": {"type": "integer"},
"author": {
"properties": {
"first_name": {"type": "text"},
"last_name": {"type": "text"},
"email": {"type": "keyword"}
}
}
}
}
크게 rating, content, product_id, author필드가 있고 author필드는 object이기 때문에 다시 데이터 타입을 정의한다.
text vs keword 어떤 필드를 쓸지 신중하게 고르는 게 좋다.
keyword 필드는 filtering, aggregation, and for exact mathces에 쓰인다.
보통 정확한 이메일을 찾고, 그 메일을 pk로 쓸수도 있고 하니 이메일에 keyword필드를 쓰는게 적절하다.
response
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "reviews"
}
GET /reviews/_mapping
: retrieve mappings for the entire index
GET /reivews/_mapping/field/content
: retrieve mapping for a field
GET /reivews/_mapping/field/author.email
: retrieve mapping for a field(object)
PUT /reviews_dot_notation
{
"mappings": {
"properties": {
"rating": {"type": "float"},
"content": {"type": "text"},
"product_id": {"type": "integer"},
"author.first_name": {"type": "text"},
"author.last_name": {"type": "text"},
"author.email": {"type": "keyword"}
}
}
}
}
PUT /reviews/_mapping
{
"properties": {
"created_at": {
"type": "date"
}
}
}
it's really important that you don't specify a UNIX timestamp here, i.e. the number of seconds since epoch. If you do that, ES won't give you and error because it will just treat the number as the number of milliseconds since the epoch.
You might then think that everyting is okay.
However, when you search for documents within a given date ragne, you won't get any matches becuase the dates are actually way in the past.
If you do have a UNIX timestamp, then be sure to multiply that number by 1,000
PUT /reviews/_doc/2
{
"rating": 4.5,
"content": "Not bad",
"product_id": 123,
"created_at": "2015-03-27",
"author": {
"first_name": "A",
"last_name": "Joe",
"email": "Joe@example.com"
}
}
PUT /reviews/_doc/3
{
"rating": 4.5,
"content": "Not bad",
"product_id": 123,
"created_at": "2015-03-27T13:07:41Z",
"author": {
"first_name": "B",
"last_name": "Boe",
"email": "Boe@example.com"
}
}
PUT /reviews/_doc/4
{
"rating": 4.5,
"content": "Not bad",
"product_id": 123,
"created_at": "2015-03-27T13:07:41+01:00",
"author": {
"first_name": "B",
"last_name": "Boe",
"email": "Boe@example.com"
}
}
PUT /reviews/_doc/5
{
"rating": 4.5,
"content": "Not bad",
"product_id": 123,
"created_at": 1436011284000,
"author": {
"first_name": "B",
"last_name": "Boe",
"email": "Boe@example.com"
}
}
GET /reviews/_search
{
"query": {
"match_all": {}
}
}
date 필드 포맷을 커스텀하는데 사용된다.
웬만하면 디폴트 포맷 쓰는걸 추천한다.
default는 ISO 8601.
legacy 때문에 혹시 저 포맷이 안되면 format param으로 커스텀 하면 된다.
if set false, this data structrue would then not be built and stored on disk.
Storing data in multiple data structures effectively duplicates data with the purpose of fast retrieval, so disk space is traded for speed.
A side benefit of that would be increased indexing speed, because there is naturally
a small overhead of building this data structure when indexing documents.
So when would you want to disable doc values?
If you know that you won’t need to use a field for sorting, aggregations, and scripting,
you can disable doc values and save the disk space required to store this data structure.
For small indices, it might not matter much, but if you are storing hundreds of millions
- doc values 해제하는 법.
PUT /sales { "mappings": { "properties": { "buyer_email": { "type": "keyword", "doc_values": false } } } }
Normalization factors used for relevance scoring
relevance scoring은 연관도다. 구글 검색했을 때 5페이지보다 1페이지에 내가 원하는 내용이 많은데 이게 연관도 순으로 정렬하기 때문이다.
Often we don't just want to filter results, but also rank them.
Norms can be siabled to save disk space
필드를 filtering과 aggregation만 할거라면 norms를 disabled해서 디스크 공간을 아낄 수 있다.
PUT /products
{
"mappings": {
"properties": {
"tags": {
"type": "text",
"norms": false
}
}
}
}
PUT /server-metrics
{
"mappings": {
"properties": {
"tags": {
"type": "integer",
"index": false
}
}
}
}
PUT /sales
{
"mappings": {
"properties": {
"partner_id": {
"type": "keyword",
"null_value": "NULL"
}
}
}
}
PUT /sales
{
"mappings": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
PUT /reivews/_mapping
{
"properties": {
"product_id": {
"type": "keyword"
}
}
}
PUT /reivews/_mapping
{
"properties": {
"author": {
"properties": {
"email": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
PUT /reviews_new
"mappings" : {
"properties" : {
"author" : {
"properties" : {
"email" : {
"type" : "keyword"
},
"first_name" : {
"type" : "text"
},
"last_name" : {
"type" : "text"
}
}
},
"content" : {
"type" : "text"
},
"created_at" : {
"type" : "date"
},
"product_id" : {
"type" : "keyword"
},
"prouct_id" : {
"type" : "integer"
},
"rating" : {
"type" : "float"
}
}
}
POST /_reindex
{
"source": {
"index": "reviews"
},
"dest": {
"index": "reviews_new"
}
}
POST /reviews_new/_delete_by_query
{
"query": {
"match_all": {}
}
}
POST /_reindex
{
"source": {
"index": "reviews"
},
"dest": {
"index": "reviews_new"
},
"script": {
"source": """
if (ctx._source.product_id != null) {
ctx._source.product_id = ctx._source.product_id.toString();
}
"""
}
}
GET 쿼리해보면 데이터 타입이 int가 아닌 string임을 볼 수 있다.
POST /_reindex
{
"source": {
"index": "reviews",
"query": {
"range": {
"rating": {
"gte": 4.0
}
}
}
},
"dest": {
"index": "reviews_new"
}
}
POST /_reindex
{
"source": {
"index": "reviews",
"_source": ["content", "created_at", "rating"]
},
"dest": {
"index": "reviews_new"
}
}
POST /_reindex
{
"source": {
"index": "reviews"
},
"dest": {
"index": "reviews_new"
},
"script": {
"source": """
# Rename "content" field to "comment"
ctx._source.comment = ctx._source.remove("content");
"""
}
}
POST /_reindex
{
"source": {
"index": "reviews"
},
"dest": {
"index": "reviews_new"
},
"script": {
"source": """
if (ctx._source.rating < 4.0) {
ctx.op = "noop"; # Can also be set to "delete"
}
"""
}
}
PUT reviews/_mapping
{
"properties": {
"comment": {
"type": "alias",
"path": "content"
}
}
}
GET /reviews/_search
{"query": {
"match": {
"content": "Not bad"
}
}}
GET /reviews/_search
{"query": {
"match": {
"comment": "Not bad"
}
}}
PUT /multi_field_test
{
"mappings": {
"properties": {
"description": {
"type": "text"
},
"ingredients": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
POST /multi_field_test/_doc
{
"description": "To make this spaghetti carbonara, you first need to...",
"ingredients": ["Spaghetti", "Bacon", "Eggs"]
}
GET /multi_field_test/_search
{
"query": {
"match": {
"ingredients": "Spaghetti"
}
}
}
GET /multi_field_test/_search
{
"query": {
"term": {
"ingredients.keyword": "Spaghetti"
}
}
}
PUT /people
{
"mappings": {
"dynamic": false,
"properties": {
"first_name": {
"type": "text"
}
}
}
}
-GET /people/_mapping
으로 매핑 확인하면 아래와 같다.(first_name만 존재)
{
"people" : {
"mappings" : {
"dynamic" : "false",
"properties" : {
"first_name" : {
"type" : "text"
}
}
}
}
}
POST /people/_doc
{
"first_name": "Bo",
"last_name": "Andersen"
}
GET people/_search
{"query": {
"match": {
"first_name": "Bo"
}
}}
"_source" : {
"first_name" : "Bo",
"last_name" : "Andersen"
}
GET people/_search
{"query": {
"match": {
"last_name": "Andersen"
}
}}
{
"took" : 688,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
제일 좋은건(강사 기준) "dynamic": "strict"으로 설정하는 것이다.
이렇게 하면 ES는 unmapped fields를 reject한다.
DELETE /people
로 위인덱스를 지운 후 아래 put으로 다시 만들자.
PUT /people
{
"mappings": {
"dynamic": "strict",
"properties": {
"first_name": {
"type": "text"
}
}
}
}
이제 아까처럼 Post해보면 400에러가 난다!
POST /people/_doc
{
"first_name": "Bo",
"last_name": "Andersen"
}
생략
pass. 나중에 다시 듣기.
pass. 나중에 다시 듣기.
출처: udemy Bo Andersen의 Complete Guide to Elasticsearch 강의.
https://www.udemy.com/course/elasticsearch-complete-guide/learn/lecture/7585356#overview