1. Adding Nodes for Development
1-1. Duplicate ElasticSearch
- duplicate elasticsearch in new directory
- change and uncomment node.name to give different name than the default
// elasticsearch.yml
node.name: node-2
- run another elasticsearch instance
- ElasticSearch automatically puts new node into the cluster
ex) running 2 nodes. now the status is green as replica shard can be allocated
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 7,
"active_shards" : 7,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
1-2. Run ElasticSearch with option
- rather than duplicating elasticsearch, new node can be run by running
$ bin/elasticsearch -Enode.name={node name} -Epath.data={data dir} -Epath.logs={log dir}
-E option overwrites configuration
path.data and path.logs must be specified differently, else error will occur
- because it is better to manage configs in elasticsearch directory, 1-1 method is a better method
2. Node roles
- Node can have more roles than just storing shards
GET /_cat/nodes?v
to see node.role
and master
- default is dim(data, ingest, master)
2-1. master node
-
perform cluster wide actions ex) create/delete indexes, allocate shard to nodes, etc.
-
elected by voting system
-
stable master node is import for stable cluster
-
dedicated master node will be needed if elected master node is busy
2-2. data node
- store part of cluster data
- performs queries
- purpose of having dedicated data node is to separate it from master node
2-3. ingest node
- enable node to run ingest pipelines
- ingest pipelines : series of steps that are performed when indexing documents into elasticsearch
- a simplified version of Logstash within Elasticsearch
2-4. machine learning node
- node.ml and xpack.ml.enabled can set a node for machine learning
2-5. coordination node
- distribute queries and aggregation of results
- can set a node to coordination node by disabling all other roles
- only useful for large cluster as load balancer
2-6. When to change node roles
- typically when optimizing cluster to scale the number of requests
- for better understanding what hardware resources are used for
- depends on the situation