Shadow Replica Demo
Table of Contents
Shadow replica demo
Note that to follow along, this is required to be in your elasticsearch.yml
.
Then start a 3-node cluster listening on localhost:9200.
# don't append 0/1/2 to the data path for ES instances on the same machine node.add_id_to_custom_path: false # enable use of the `index.data_path` setting node.enable_custom_paths: true
Don't forget to start with a clean slate, clean up the regular data directory as
well as the /tmp/foo
directory, since that will be where the new index is
stored.
rm -rf ~/src/elasticsearch/data /tmp/foo
Index creation
First, create an index with a custom data path and shadow_replicas: true
// Make sure we have 3 nodes GET /_cluster/health?wait_for_nodes=3 {} POST /myindex { "settings": { "number_of_shards": 1, "number_of_replicas": 2, "data_path": "/tmp/foo", "shadow_replicas": true }, "mappings": { "doc": { "properties": { "body": { "type": "string" } } } } } GET /_cluster/health?wait_for_status=green {}
{"cluster_name":"elasticsearch","status":"green","timed_out":false,"number_of_nodes":3,"number_of_data_nodes":3,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0} {"acknowledged":true} {"cluster_name":"elasticsearch","status":"green","timed_out":false,"number_of_nodes":3,"number_of_data_nodes":3,"active_primary_shards":1,"active_shards":3,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0}
Shards show up just like regular shards:
GET /_cat/shards {}
index shard prirep state docs store ip node myindex 0 p STARTED 0 85b 192.168.43.91 Lynx myindex 0 r STARTED 0 85b 192.168.43.91 Warwolves myindex 0 r STARTED 0 85b 192.168.43.91 Mop Man
Let's index a few documents and see how they show up on the various primary and replica shards.
Indexing some documents
Index two documents, flush, then index two more documents:
POST /myindex/doc {"body": "This is document 1"} POST /myindex/doc {"body": "I'm document number 2"} // Flush after indexing the first two documents POST /myindex/_flush {} POST /myindex/doc {"body": "Document 3 here"} POST /myindex/doc {"body": "Finally document 4"} // Force a refresh POST /myindex/_refresh {} // show the status of the shards GET /_cat/shards {}
{"_index":"myindex","_type":"doc","_id":"AUudq2kUeqHjI4Syvcj_","_version":1,"_shards":{"total":3,"successful":3,"failed":0},"created":true} {"_index":"myindex","_type":"doc","_id":"AUudq2lyeqHjI4SyvckA","_version":1,"_shards":{"total":3,"successful":3,"failed":0},"created":true} {"_shards":{"total":3,"successful":3,"failed":0}} {"_index":"myindex","_type":"doc","_id":"AUudq2n2eqHjI4SyvckB","_version":1,"_shards":{"total":3,"successful":3,"failed":0},"created":true} {"_index":"myindex","_type":"doc","_id":"AUudq2oZeqHjI4SyvckC","_version":1,"_shards":{"total":3,"successful":3,"failed":0},"created":true} {"_shards":{"total":3,"successful":3,"failed":0}} index shard prirep state docs store ip node myindex 0 p STARTED 4 5.8kb 192.168.43.91 Lynx myindex 0 r STARTED 2 5.8kb 192.168.43.91 Warwolves myindex 0 r STARTED 2 5.8kb 192.168.43.91 Mop Man
Looking at segments
Notice how the replicas only have the first two documents, because the segments
created during the 3rd and 4th documents have not been fsyncd and committed yet.
We can confirm this by checking out the /myindex/_segments
info
GET /myindex/_segments?pretty {}
{ "_0": { "generation": 0, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2949, "memory_in_bytes": 2388, "committed": true, "search": true, "version": "5.1.0", "compound": true }, "_1": { "generation": 1, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2910, "memory_in_bytes": 2388, "committed": false, "search": true, "version": "5.1.0", "compound": true } } { "_0": { "generation": 0, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2949, "memory_in_bytes": 2388, "committed": true, "search": true, "version": "5.1.0", "compound": true } } { "_0": { "generation": 0, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2949, "memory_in_bytes": 2388, "committed": true, "search": true, "version": "5.1.0", "compound": true } }
One shard has the _0
and _1
segments, where only _0 is committed, whereas
the two nodes with the replicas show only the _0 (committed) segments.
Flushing to make documents visible
Now we can flush again to make the other two documents visible. Note that I refresh manually here, because the replica might finish flushing and refreshing before the primary does, so the segments wouldn't yet be visible.
POST /myindex/_flush {} POST /myindex/_refresh {} GET /_cat/shards {}
{"_shards":{"total":3,"successful":3,"failed":0}} {"_shards":{"total":3,"successful":3,"failed":0}} index shard prirep state docs store ip node myindex 0 p STARTED 4 5.9kb 192.168.43.91 Lynx myindex 0 r STARTED 4 5.9kb 192.168.43.91 Warwolves myindex 0 r STARTED 4 5.9kb 192.168.43.91 Mop Man
Now we see all the segments show up to all nodes:
GET /myindex/_segments?pretty {}
{ "_0": { "generation": 0, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2949, "memory_in_bytes": 2388, "committed": true, "search": true, "version": "5.1.0", "compound": true }, "_1": { "generation": 1, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2910, "memory_in_bytes": 2388, "committed": true, "search": true, "version": "5.1.0", "compound": true } } { "_0": { "generation": 0, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2949, "memory_in_bytes": 2388, "committed": true, "search": true, "version": "5.1.0", "compound": true }, "_1": { "generation": 1, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2910, "memory_in_bytes": 2388, "committed": true, "search": true, "version": "5.1.0", "compound": true } } { "_0": { "generation": 0, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2949, "memory_in_bytes": 2388, "committed": true, "search": true, "version": "5.1.0", "compound": true }, "_1": { "generation": 1, "num_docs": 2, "deleted_docs": 0, "size_in_bytes": 2910, "memory_in_bytes": 2388, "committed": true, "search": true, "version": "5.1.0", "compound": true } }
Failover when a node dies
Before stopping any node, let's index a couple more documents that will exist only in the translog:
POST /myindex/doc {"body": "document number 5 in the translog"} POST /myindex/doc {"body": "another document (number 6) in the translog"} POST /myindex/_refresh {} GET /_cat/shards {}
{"_index":"myindex","_type":"doc","_id":"AUudq9PVeqHjI4SyvckD","_version":1,"_shards":{"total":3,"successful":3,"failed":0},"created":true} {"_index":"myindex","_type":"doc","_id":"AUudq9P2eqHjI4SyvckE","_version":1,"_shards":{"total":3,"successful":3,"failed":0},"created":true} {"_shards":{"total":3,"successful":3,"failed":0}} index shard prirep state docs store ip node myindex 0 p STARTED 6 8.8kb 192.168.43.91 Lynx myindex 0 r STARTED 4 8.8kb 192.168.43.91 Warwolves myindex 0 r STARTED 4 8.8kb 192.168.43.91 Mop Man
Note that no flushing after indexing these documents
<< kill an ES node >>
With the node that has the primary gone, one of the shadow replicas has replaced it and replayed the translog that is on the shared filesystem:
GET /_cat/shards {}
index shard prirep state docs store ip node myindex 0 r STARTED 6 8.9kb 192.168.43.91 Warwolves myindex 0 p STARTED 6 8.9kb 192.168.43.91 Mop Man myindex 0 r UNASSIGNED
And all of the documents are there:
GET /myindex/_search?pretty { "query": { "match_all": {} } }
"This is document 1" "I'm document number 2" "Document 3 here" "Finally document 4" "document number 5 in the translog" "another document (number 6) in the translog"
Files in the data directories
Finally, we can see where the data for the index is stored:
tree ~/src/elasticsearch/data echo "--------------" tree /tmp/foo
/Users/hinmanm/src/elasticsearch/data └── elasticsearch └── nodes ├── 0 │ ├── _state │ │ └── global-1.st │ ├── indices │ │ └── myindex │ │ ├── 0 │ │ │ └── _state │ │ │ └── state-11.st │ │ └── _state │ │ └── state-1.st │ └── node.lock ├── 1 │ ├── _state │ │ └── global-1.st │ ├── indices │ │ └── myindex │ │ ├── 0 │ │ │ └── _state │ │ │ └── state-11.st │ │ └── _state │ │ └── state-1.st │ └── node.lock └── 2 ├── _state │ └── global-1.st ├── indices │ └── myindex │ ├── 0 │ │ └── _state │ │ └── state-4.st │ └── _state │ └── state-1.st └── node.lock 23 directories, 12 files -------------- /tmp/foo └── myindex └── 0 ├── index │ ├── _0.cfe │ ├── _0.cfs │ ├── _0.si │ ├── _1.cfe │ ├── _1.cfs │ ├── _1.si │ ├── _2.cfe │ ├── _2.cfs │ ├── _2.si │ ├── segments_4 │ └── write.lock └── translog 4 directories, 11 files