Joining queries
intro
- ES를 primary data storage로 쓰는것은 권장되지 않는다.
- 정규화도 잘 안한다. 하지만 간단한 join이 지원되긴 한다.
- ES optimizes search performance by denormalizing data.
- Performance > disk space
- ES only supports simple joins
- Joins are expensive
Querying nested objects
Creating the index with mapping
PUT /department
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"employees": {
"type": "nested"
}
}
}
}
Adding test documents
PUT /department/_doc/1
{
"name": "Development",
"employees": [
{
"name": "Eric Green",
"age": 39,
"gender": "M",
"position": "Big Data Specialist"
},
{
"name": "James Taylor",
"age": 27,
"gender": "M",
"position": "Software Developer"
},
{
"name": "Gary Jenkins",
"age": 21,
"gender": "M",
"position": "Intern"
},
{
"name": "Julie Powell",
"age": 26,
"gender": "F",
"position": "Intern"
},
{
"name": "Benjamin Smith",
"age": 46,
"gender": "M",
"position": "Senior Software Engineer"
}
]
}
PUT /department/_doc/2
{
"name": "HR & Marketing",
"employees": [
{
"name": "Patricia Lewis",
"age": 42,
"gender": "F",
"position": "Senior Marketing Manager"
},
{
"name": "Maria Anderson",
"age": 56,
"gender": "F",
"position": "Head of HR"
},
{
"name": "Margaret Harris",
"age": 19,
"gender": "F",
"position": "Intern"
},
{
"name": "Ryan Nelson",
"age": 31,
"gender": "M",
"position": "Marketing Manager"
},
{
"name": "Kathy Williams",
"age": 49,
"gender": "F",
"position": "Senior Marketing Manager"
},
{
"name": "Jacqueline Hill",
"age": 28,
"gender": "F",
"position": "Junior Marketing Manager"
},
{
"name": "Donald Morris",
"age": 39,
"gender": "M",
"position": "SEO Specialist"
},
{
"name": "Evelyn Henderson",
"age": 24,
"gender": "F",
"position": "Intern"
},
{
"name": "Earl Moore",
"age": 21,
"gender": "M",
"position": "Junior SEO Specialist"
},
{
"name": "Phillip Sanchez",
"age": 35,
"gender": "M",
"position": "SEM Specialist"
}
]
}
Querying nested fields
- employees중 intern이면서 여자인 사람을 찾아보자.
GET /department/_search
{
"query": {
"nested": {
"path": "employees",
"query": {
"bool": {
"must": [
{
"match": {
"employees.position": "intern"
}
},
{
"term": {
"employees.gender.keyword": {
"value": "F"
}
}
}
]
}
}
}
}
}
- 꼭 nested를 명시해줘야 하는 이유는 object array가 저장될 때 아래처럼 저장되기ㄷ 때문.
- emplyee와 department를 따로 분리해서 저장해야 관리하기 편하지 않을까?
- join field를 사용하면 가능하다. RDS의 foreinkey처럼.
- 먼저 inner hits를 살펴보고, 그 후에 join field를 배워보자.
Nested inner hits
- inner hits는 relevance score로 정렬된다(디폴트)
- sort커스텀 하려면 inner_hits값 안에 sort option주면 된다.
GET /department/_search
{
"_source": false,
"query": {
"nested": {
"path": "employees",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"match": {
"employees.position": "intern"
}
},
{
"term": {
"employees.gender.keyword": {
"value": "F"
}
}
}
]
}
}
}
}
}
Mapping document relationships
- doc간의 Relation을 주기 위해서 먼저 MApping을 손봐야한다.
- relations의 키가 꼭 인덱스 이름(department)와 일치할 필요는 없다.
- 아래 매핑으로 department-employee에 부모-자식 관계가 생긴다. (부모: department)
- 실제 적용을 위해 str이었던 것을 array로 바꿔주면 된다.
PUT /department/_mapping
{
"properties": {
"join_field": {
"type": "join",
"relations": {
"department": "employee"
}
}
}
Adding documents
- parent와 CHILd는 같은 shard에 존재해야 한다.
- employees를 add할 때 routing에 parent의 id를 routing으로 지정해주는 이유다.
Adding departments
PUT /department/_doc/1
{
"name": "Development",
"join_field": "department"
}
PUT /department/_doc/2
{
"name": "Marketing",
"join_field": "department"
}
Adding employees for departments
PUT /department/_doc/3?routing=1
{
"name": "Bo Andersen",
"age": 28,
"gender": "M",
"join_field": {
"name": "employee",
"parent": 1
}
}
PUT /department/_doc/4?routing=2
{
"name": "John Doe",
"age": 44,
"gender": "M",
"join_field": {
"name": "employee",
"parent": 2
}
}
PUT /department/_doc/5?routing=1
{
"name": "James Evans",
"age": 32,
"gender": "M",
"join_field": {
"name": "employee",
"parent": 1
}
}
PUT /department/_doc/6?routing=1
{
"name": "Daniel Harris",
"age": 52,
"gender": "M",
"join_field": {
"name": "employee",
"parent": 1
}
}
PUT /department/_doc/7?routing=2
{
"name": "Jane Park",
"age": 23,
"gender": "F",
"join_field": {
"name": "employee",
"parent": 2
}
}
PUT /department/_doc/8?routing=1
{
"name": "Christina Parker",
"age": 29,
"gender": "F",
"join_field": {
"name": "employee",
"parent": 1
}
}
Querying by parent ID
Querying child documents by parent
Querying parent by child documents
Multi-level relations
Parent/child inner hits
Terms lookup mechanism
Join limitations