참고 : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenfilter.html
Forms an n-gram of a specified length from the beginning of a token.
ㄴ For example, you can use the edge_ngram token filter to change quick to qu.
When not customized, the filter creates 1-character edge n-grams by default.
// example
GET _analyze
{
"tokenizer": "standard",
"filter": [
{ "type": "edge_ngram",
"min_gram": 1,
"max_gram": 2
}
],
"text": "the quick brown fox jumps"
}
// input: the quick brown fox jumps
// output: [ t, th, q, qu, b, br, f, fo, j, ju ]
// Add to an analyzer
PUT edge_ngram_example
{
"settings": {
"analysis": {
"analyzer": {
"standard_edge_ngram": {
"tokenizer": "standard",
"filter": [ "edge_ngram" ]
}
}
}
}
}
// parameter: max_gram, min_gram
PUT edge_ngram_custom_example
{
"settings": {
"filter": {
"3_5_edgegrams": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
}
}
The edge_ngram filter’s max_gram value
limits the character length of tokens.
When the edge_ngram filter is used with an index analyzer,
this means search terms longer than the max_gram length
may not match any indexed terms.
For example,
if the max_gram is 3,
searches for apple won’t match the indexed term app.
ㄴ apple 검색했을 때 인덱싱된 app 으로 매칭되지 않는다는 말인듯
참고 : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lowercase-tokenfilter.html
Changes token text to lowercase.
For example, you can use the lowercase filter
to change THE Lazy DoG to the lazy dog.
GET _analyze
{
"tokenizer" : "standard",
"filter" : ["lowercase"],
"text" : "THE Quick FoX JUMPs"
}
// intput: THE Quick FoX JUMPs
// output: [ the, quick, fox, jumps ]
PUT lowercase_example
{
"settings": {
"analysis": {
"analyzer": {
"whitespace_lowercase": {
"tokenizer": "whitespace",
"filter": [ "lowercase" ]
}
}
}
}
}
참고 : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html
Forms n-grams of specified lengths from a token.
For example,
you can use the ngram token filter
to change fox to [ f, fo, o, ox, x ].
GET _analyze
{
"tokenizer": "standard",
"filter": [ "ngram" ],
"text": "Quick fox"
}
// input: Quick fox
// output: [ Q, Qu, u, ui, i, ic, c, ck, k, f, fo, o, ox, x ]
PUT ngram_example
{
"settings": {
"analysis": {
"analyzer": {
"standard_ngram": {
"tokenizer": "standard",
"filter": [ "ngram" ]
}
}
}
}
}
// parameter: max_gram(2) , min_gram(1)
*You can use the index.max_ngram_diff index-level setting
to control the maximum allowed difference
between the max_gram and min_gram values.