Segment-based replication (aka Shadow Replicas)
Table of Contents
- Overview
- Development Phases
- Development
- DONE Per-index paths
- NEEDSREVIEW Shadow Engine [30/30][100%]- Reproduction for shared filesystem
- Reproduction for regular filesystem
- Tasks
- DONE resolve the rest of the nocommits in shadow-replicas branch
- DONE relocating a replica doesn't work
- DONE relocating a primary doesn't work
- ShadowEngine tests [8/8]- DONE test basic engine things
- DONE test that a flush & refresh makes new segments visible to replica
- DONE test that replica -> primary promotion works
- DONE test that replica relocation works
- DONE test that primary relocation works
- DONE test that deleting an index with shadow replicas cleans up the disk
- DONE figure out the Mock modules snafu
- DONE disable checking index when closed
 
- CANCELLED shadow engine freaks out if there are no segments* files on disk
- DONE make shadow replicas not delete the data directory, the primary will do that
- DONE add a read only block while recovering translog on primary
- DONE make shadow replicas clean up if they are the last deletion in the cluster
- DONE update shadow replica issue with new plan (shadow_replicas: true)
- DONE write.lock missing when closing primary shard?!
- DONE ensure no merges run on shadow replicas
- DONE Tell Luca and Clint about the typo in ShardStateAction
- DONE ensure shadow becomes normal after switching from replica to primary
- DONE handle replaying translog on recovery if replica is shadow
- DONE create the shard routing with the shadow flag in RoutingNodes
- CANCELLED sometimes a regular engine gets created instead of a shadow one
- CANCELLED don't replicate operations if the replica is a shadow replica
- CANCELLED Figure out how to send shard started event after re-creation
- DONE don't send docs to the replica when using shared FS
- CANCELLED ensure mapping is updated on replica when request is not sent
- DONE add tests for failover when indexing concurrently
- DONE make ShadowEngine throw exceptions instead of no-ops
- CANCELLED look into waiting for mapping changes on refresh for SRs
 
 
| Headline | Time | ||
|---|---|---|---|
| Total time | 3d 8:23 | ||
| NEEDSREVIEW [#A] Shadow Engine [30/30] [100%] | 3d 8:23 | ||
| Tasks | 3:07 | ||
| DONE don't send docs to the replica when using shared FS | 2:38 | ||
| DONE make ShadowEngine throw exceptions instead of no-ops | 0:29 | 
Overview
Taken from: https://github.com/elasticsearch/elasticsearch/issues/8976
Today, Elasticsearch replicates changes at the document level. When a document is indexed/updated on the primary shard, the new version of the document is sent to the replica to be indexed there as well. Document-level replication is what provides distributed near real-time search. Not all customers need (or want) this, as the process of indexing every document twice (primary & replica) has a cost.
Instead, we would like to support segment-based replication as an alternative. With segment based replication, indexing happens only on the primary shard. Instead of the replica shards that we have today, we can have shadow_replicas. When the primary shard flushes (or merges), it would push the new segments and commit point out to the shadow replicas. Shadow replicas would then reopen the searcher to expose the new segments to search. The primary would flush more often than we do today to reduce the lag in replication.
Long-term, it would be nice to support both regular replicas (for failover) and shadow replicas (for search-only)
Advantages
- Copying segments across is lower cost than indexing each document twice
- Recovery is faster Because the segments on the primary and replica don't diverge in the way they do with document based replication, only new segments are copied over.
Disadvantages
- Search and document retrieval is no longer real time. Changes are only exposed to shadow_replicas when the primary flushes, instead of on refreshes
- If the primary fails, documents in the transaction log will be lost unless using shared storage
- Any file corruption might be copied over to the shadow_replica (or if we fail the primary because of corruption, there is no alternative source of data)
- Increased network I/O, as we would need to copy the same data over multiple times when small segments are merged into bigger segments. Copy 5gb segments across the network could introduce a significant amount of lag in the replicas
- Potential disadvantage - More frequent flushing on the primary could hurt indexing speed
Development Phases
Phase 1 - Add support for shadow_replicas without normal replicas
(Limited to one node per server/VM)
The first development phase will only add support for indices with either real or shadow_replicas meaning all or none replicas will be shadow_replicas. We likely have to introduce the notion of a shadow_replica on the cluster internal data-structures ie. a flag on the ShardRouting such that we won't have to change the wire-protocol again in the next iteration. On a very high level this first iteration will be implemented as a read-only engine that periodically or on flush refreshes it's local Lucene index.
On the indexing side we "simply" don't forward the the documents to the replicas and return immediately once the primary successfully indexed it.
To catch up with the primary it has to somehow go back to the current primary and sync up it's local Lucene segments. Since we already have this logic implemented and used during recovery large parts of it should be reusable. (See RecoveryTarget / RecoverySource)
The biggest downside of this approach is the potential dataloss if we loose the primary since we don't have any documents replicated since the last flush.
Potential features / pitfalls
- We might need to make this useful is an allocation decider that allows only shadow_replicas to be allocated on a certain node
- Mappings might be slightly behind if we don't index on certain nodes
- Testing might be tricky here - we might be able to randomly swap in stuff like this and trigger flushes instead of refresh transparently. I'd love to implement this somehow that it is entirely transparent. ie. a refresh on a shadow_replica index means full flush + sync even if we have to make this optional with being enabled in tests.
CANCELLED Phase 2 - Add support for shadow_replicas WITH real replicas CANCELLED
- State "CANCELLED"  from ""            
 Not actually planning on doing this
This second phase aims to support full fletched replicas together with shadow_replicas mixed in the same index. This will help users to prevent dataloss if primaries die while still have the ability to have read-only nodes in the cluster which will not be used for indexing.
Development
This is my section to write down all the rabbit-holes I look through trying to figure out where to implement this.
Questions
- Can flush time be configured on a per-index basis? YES, it can be a global setting or per-index
- How to kick off recovery manually?
Might be able to do this with RecoveryTarget.startRecovery()…
- How to inhibit sending indexing requests to the replica shards?
There is an ignoreReplicas method in
TransportShardReplicationOperationAction, while hard-coded to false right now,
it could be changed to be a per-index setting
DONE Per-index paths
Igor had the great idea to continue storing metadata in a regular index
directory (in path.data), and use the settings loaded from that directory to
determine which distributor to use for the index, then the per-path settings
could be implemented as a distributor.
The other option I could think of was custom metadata in the cluster state that told Elasticsearch where to go looking for other indices when doing the initial import of data.
Not sure which one to go with…
We need to be able to pass a new directory for
FsDirectoryService.newFSDirectory
Which is called by FsDirectoryService.build…
Which calls FsIndexStore.shardIndexLocations, which uses the NodeEnvironment
to determine where to put it.
It looks like FsIndexStore has access to the index settings, maybe it can maintain a map there and use that to determine where to look for the data?
If you specify a custom data location, how do you handle the situation when data already exists for that index, and you need to tell Elasticsearch about it?
Maybe you can create the index with the path, close it, then re-open it? (DOESN'T WORK)
Gotta have something in the cluster state that tells ES where to look for custom index metadata?
Also, need to figure out how the translog is being created and use the custom location for that.
Create index
PUT /_cluster/settings { "transient": { "logger.env": "TRACE" } } POST /myindex { "settings": { "number_of_shards": 1, "number_of_replicas": 1, //"data_template": "i_{{index_name}}/{{node_id}}/shard_{{shard_num}}", "data_path": "/tmp/myindex" } } POST /myindex/doc {"body": "foo"} POST /myindex/doc {"body": "bar"} POST /myindex/doc {"body": "baz"} POST /myindex/doc {"body": "eggplant"} POST /myindex/_refresh {} GET /_cat/count/myindex {}
{"acknowledged":true,"persistent":{},"transient":{"logger":{"env":"TRACE"}}}
HTTP/1.1 400 Bad Request
Content-Type: application/json; charset=UTF-8
Content-Length: 213
{"error":"IndexCreationException[[myindex] failed to create index]; nested: ElasticsearchIllegalArgumentException[custom data_path is disabled unless all nodes are at least version 1.5.0-SNAPSHOT]; ","status":400}
{"_index":"myindex","_type":"doc","_id":"AUqf-WSCUlSGRj_zSu7l","_version":1,"created":true}
{"_index":"myindex","_type":"doc","_id":"AUqf-WXiUlSGRj_zSu7m","_version":1,"created":true}
{"_index":"myindex","_type":"doc","_id":"AUqf-WZhUlSGRj_zSu7n","_version":1,"created":true}
{"_index":"myindex","_type":"doc","_id":"AUqf-We-UlSGRj_zSu7o","_version":1,"created":true}
{"_shards":{"total":10,"successful":8,"failed":0}}
1420023130 11:52:10 4
Close index
POST /myindex/_close {}
{"acknowledged":true}
Move data on disk
mv /tmp/myindex /tmp/foo
Update index settings
PUT /myindex/_settings?ignore_unavailable=true { "index": { "data_path": "/tmp/foo" } } GET /myindex/_settings?pretty {}
{"acknowledged":true}
{
  "myindex" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "2",
        "ignore_unavailable" : "true",
        "creation_date" : "1418997720075",
        "number_of_replicas" : "1",
        "version" : {
          "created" : "2000099"
        },
        "uuid" : "5uwUpVXDTVGsfhuZBMMoSg",
        "data_path" : "/tmp/foo"
      }
    }
  }
}
Re-open and check index
POST /myindex/_open {} POST /myindex/_refresh {} GET /_cat/count/myindex {}
{"acknowledged":true}
{"_shards":{"total":4,"successful":2,"failed":0}}
1419931043 10:17:23 4
[file trees]
tree /tmp/foo tree ~/src/elasticsearch/data
/tmp/foo ├── 0 │ └── myindex │ ├── 0 │ │ ├── index │ │ │ ├── _0.cfe │ │ │ ├── _0.cfs │ │ │ ├── _0.si │ │ │ ├── segments_2 │ │ │ └── write.lock │ │ └── translog │ │ └── translog-2 │ └── 1 │ ├── index │ │ ├── _0.cfe │ │ ├── _0.cfs │ │ ├── _0.si │ │ ├── segments_2 │ │ └── write.lock │ └── translog │ └── translog-2 └── 1 └── myindex ├── 0 │ ├── index │ │ ├── _0.cfe │ │ ├── _0.cfs │ │ ├── _0.si │ │ ├── segments_2 │ │ └── write.lock │ └── translog │ └── translog-3 └── 1 ├── index │ ├── _0.cfe │ ├── _0.cfs │ ├── _0.si │ ├── segments_2 │ └── write.lock └── translog └── translog-3 16 directories, 24 files /Users/hinmanm/src/elasticsearch/data └── elasticsearch └── nodes ├── 0 │ ├── _state │ │ └── global-1.st │ ├── indices │ │ └── myindex │ │ ├── 0 │ │ │ └── _state │ │ │ ├── state-3.st │ │ │ └── state-6.st │ │ ├── 1 │ │ │ └── _state │ │ │ ├── state-3.st │ │ │ └── state-6.st │ │ └── _state │ │ └── state-5.st │ └── node.lock └── 1 ├── _state │ └── global-1.st ├── indices │ └── myindex │ ├── 0 │ │ └── _state │ │ └── state-6.st │ ├── 1 │ │ └── _state │ │ └── state-6.st │ └── _state │ └── state-5.st └── node.lock 20 directories, 12 files
:END:
NEEDSREVIEW Shadow Engine [30/30] [100%]
- State "NEEDSREVIEW" from "INPROGRESS"  
 PR is up: https://github.com/elasticsearch/elasticsearch/pull/9727
- State "TODO" from "NEEDSREVIEW"
- State "NEEDSREVIEW" from "INPROGRESS"  
 PR: https://github.com/elasticsearch/elasticsearch/pull/9727
Shadow replica is an instance of an IndexShard, the target always starts the
recovery, we can encapsulate the entire logic for this in a special IndexShard.
Need a dedicated Engine and a dedicated IndexShard for the shadow replicas.
TO START
Index-level setting: shadow_replicas: true, start with this only and work from there
Reproduction for shared filesystem
This is for testing shadow replicas on a shared filesystem
This requires this in elasticsearch.yml
node.add_id_to_custom_path: false node.enable_custom_paths: true
Create index with shadow_replicas: true
trash -a ~/src/elasticsearch/data
PUT /_cluster/settings { "transient": { "logger._root": "DEBUG", "logger.env": "TRACE", "logger.index": "DEBUG", "logger.index.engine.internal": "TRACE", "logger.index.gateway": "TRACE", "logger.indices.cluster": "TRACE", "logger.cluster.action.shard": "TRACE", "logger.action.support.replication": "TRACE" } } POST /myindex { "index": { "number_of_shards": 1, "number_of_replicas": 1, "data_path": "/tmp/foo", "shadow_replicas": true } }
{"acknowledged":true,"persistent":{},"transient":{"logger":{"cluster":{"action":{"shard":"TRACE"}},"_root":"DEBUG","indices":{"cluster":"TRACE"},"engine":{"internal":"TRACE"},"index":"DEBUG","action":{"support":{"replication":"TRACE"}},"env":"TRACE","index.gateway":"TRACE"}}}
{"acknowledged":true}
Adding documents
If implemented correctly, they should only be added to the primary shard and not to any of the replicas.
GET /_cat/shards {}
index shard prirep state docs store ip node myindex 0 p STARTED 4 11.1kb 10.0.0.16 Piranha myindex 0 r STARTED 4 11.1kb 10.0.0.16 Buzz
GET /_cat/nodes?h=name,id&full_id {}
name id Blonde Phantom sQ2_ZdEjQMaPF4GDAmLJog Tempo pl8yj9dFTZmpgDihYe3uJg
POST /myindex/doc {"body": "foo bar"} POST /myindex/doc?refresh {"body": "foo bar"} POST /myindex/_flush {} POST /myindex/doc {"body": "foo bar"} POST /myindex/doc?refresh {"body": "foo bar"} GET /_cat/shards {}
{"_index":"myindex","_type":"doc","_id":"AUtVqJYGgeBguaeEl0KR","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}
{"_index":"myindex","_type":"doc","_id":"AUtVqJhZgeBguaeEl0KS","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}
{"_shards":{"total":2,"successful":2,"failed":0}}
{"_index":"myindex","_type":"doc","_id":"AUtVqJjvgeBguaeEl0KT","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}
{"_index":"myindex","_type":"doc","_id":"AUtVqJkPgeBguaeEl0KU","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}
index   shard prirep state   docs store ip        node    
myindex 0     p      STARTED    4 8.3kb 10.0.0.16 Box     
myindex 0     r      STARTED    2 8.3kb 10.0.0.16 Piranha
Relocate a shard
I am using this to test what will be shard relocation and recovery
POST /_cluster/reroute { "commands" : [ { "move" : { "index" : "myindex", "shard" : 0, "from_node" : "2lk0Ff_WTfeVNOQzlf5yGA", "to_node" : "RRw_HPuETNOs2gDnr4D4Hg" } } ] }
Reproduction for regular filesystem
In this case, we need to move segments around whenever the primary flushes
Requires this in elasticsearch.yml
node.enable_custom_paths: true
Create index with shadow_replicas: true
trash -a ~/src/elasticsearch/data
PUT /_cluster/settings { "transient": { "logger._root": "DEBUG", "logger.env": "TRACE", "logger.index": "DEBUG", "logger.index.engine.internal": "TRACE", "logger.index.gateway": "TRACE", "logger.indices.cluster": "TRACE", "logger.cluster.action.shard": "TRACE", "logger.action.support.replication": "TRACE" } } POST /myindex { "index": { "number_of_shards": 2, "number_of_replicas": 1, "shadow_replicas": true, "shared_filesystem": false } }
{"acknowledged":true,"persistent":{},"transient":{"logger":{"cluster":{"action":{"shard":"TRACE"}},"_root":"DEBUG","indices":{"cluster":"TRACE"},"engine":{"internal":"TRACE"},"index":"DEBUG","action":{"support":{"replication":"TRACE"}},"env":"TRACE","index.gateway":"TRACE"}}}
{"acknowledged":true}
Adding documents
If implemented correctly, they should only be added to the primary shard and not to any of the replicas.
GET /_cat/shards {}
index shard prirep state docs store ip node myindex 0 p STARTED 2 5.7kb 192.168.5.65 Blonde Phantom myindex 0 r STARTED 1 2.8kb 192.168.5.65 Mandrill myindex 1 r STARTED 1 2.8kb 192.168.5.65 Blonde Phantom myindex 1 p STARTED 2 5.5kb 192.168.5.65 Tempo
GET /_cat/nodes?h=name,id&full_id {}
name id Blonde Phantom sQ2_ZdEjQMaPF4GDAmLJog Tempo pl8yj9dFTZmpgDihYe3uJg
POST /myindex/doc {"body": "foo bar"} POST /myindex/doc?refresh {"body": "foo bar"} POST /myindex/_flush {} POST /myindex/doc {"body": "foo bar"} POST /myindex/doc?refresh {"body": "foo bar"} GET /_cat/shards {}
{"_index":"myindex","_type":"doc","_id":"AUuAI-hVJM_bJBgIb2_d","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}
{"_index":"myindex","_type":"doc","_id":"AUuAI-l2JM_bJBgIb2_e","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}
{"_shards":{"total":4,"successful":4,"failed":0}}
{"_index":"myindex","_type":"doc","_id":"AUuAI-qhJM_bJBgIb2_f","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}
{"_index":"myindex","_type":"doc","_id":"AUuAI-rDJM_bJBgIb2_g","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}
index   shard prirep state   docs store ip           node           
myindex 0     p      STARTED    1 2.8kb 192.168.5.65 Blonde Phantom 
myindex 0     r      STARTED    1 2.8kb 192.168.5.65 Tempo          
myindex 1     r      STARTED    1 2.8kb 192.168.5.65 Blonde Phantom 
myindex 1     p      STARTED    2 5.5kb 192.168.5.65 Tempo
Tasks
DONE resolve the rest of the nocommits in shadow-replicas branch
DONE relocating a replica doesn't work
Right now relocating a shard doesn't work because flushing throws a missing file error for some reason :(
I need to track down where the flush is occurring and why it is happening even though both engines should be ignoring the flush command.
Everything looks peachy on the old replica side when relocating:
[DEBUG][discovery.zen.publish    ] [Flambe] received cluster state version 10
[DEBUG][cluster.service          ] [Flambe] processing [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]: execute
[DEBUG][cluster.service          ] [Flambe] cluster state updated, version [10], source [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]
[DEBUG][cluster.service          ] [Flambe] set local cluster state to version 10
[DEBUG][cluster.service          ] [Flambe] processing [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]: done applying updated cluster_state (version: 10)
[DEBUG][discovery.zen.publish    ] [Flambe] received cluster state version 11
[DEBUG][cluster.service          ] [Flambe] processing [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]: execute
[DEBUG][cluster.service          ] [Flambe] cluster state updated, version [11], source [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]
[DEBUG][cluster.service          ] [Flambe] set local cluster state to version 11
[DEBUG][indices.cluster          ] [Flambe] [myindex][0] removing shard (not allocated)
[DEBUG][index                    ] [Flambe] [myindex] [0] closing... (reason: [removing shard (not allocated)])
[DEBUG][index.shard              ] [Flambe] [myindex][0] state: [STARTED]->[CLOSED], reason [removing shard (not allocated)]
[DEBUG][index.engine.internal    ] [Flambe] [myindex][0] close now acquire writeLock
[DEBUG][index.engine.internal    ] [Flambe] [myindex][0] close acquired writeLock
[DEBUG][index.engine.internal    ] [Flambe] [myindex][0] close searcherManager
[TRACE][env                      ] [Flambe] shard lock wait count for [[myindex][0]] is now [0]
[TRACE][env                      ] [Flambe] last shard lock wait decremented, removing lock for [[myindex][0]]
[TRACE][env                      ] [Flambe] released shard lock for [[myindex][0]]
[DEBUG][index.store              ] [Flambe] [myindex][0] store reference count on close: 0
[DEBUG][index                    ] [Flambe] [myindex] [0] closed (reason: [removing shard (not allocated)])
[DEBUG][indices.cluster          ] [Flambe] [myindex] cleaning index (no shards allocated)
[DEBUG][indices                  ] [Flambe] [myindex] closing ... (reason [removing index (no shards allocated)])
[DEBUG][indices                  ] [Flambe] [myindex] closing index service (reason [removing index (no shards allocated)])
[DEBUG][indices                  ] [Flambe] [myindex] closing index cache (reason [removing index (no shards allocated)])
[DEBUG][index.cache.filter.weighted] [Flambe] [myindex] full cache clear, reason [close]
[DEBUG][index.cache.bitset       ] [Flambe] [myindex] clearing all bitsets because [close]
[DEBUG][indices                  ] [Flambe] [myindex] clearing index field data (reason [removing index (no shards allocated)])
[DEBUG][indices                  ] [Flambe] [myindex] closing analysis service (reason [removing index (no shards allocated)])
[DEBUG][indices                  ] [Flambe] [myindex] closing mapper service (reason [removing index (no shards allocated)])
[DEBUG][indices                  ] [Flambe] [myindex] closing index query parser service (reason [removing index (no shards allocated)])
[DEBUG][indices                  ] [Flambe] [myindex] closing index service (reason [removing index (no shards allocated)])
[DEBUG][indices                  ] [Flambe] [myindex] closed... (reason [removing index (no shards allocated)])
[DEBUG][cluster.service          ] [Flambe] processing [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]: done applying updated cluster_state (version: 11)
[DEBUG][cluster.service          ] [Flambe] processing [indices_store]: execute
[DEBUG][indices.store            ] [Flambe] [myindex][0] deleting shard that is no longer used
[TRACE][env                      ] [Flambe] deleting shard [myindex][0] directory, paths: [[/Users/hinmanm/src/elasticsearch/data/elasticsearch/nodes/2/indices/myindex/0]]
[TRACE][env                      ] [Flambe] acquiring node shardlock on [[myindex][0]], timeout [0]
[TRACE][env                      ] [Flambe] successfully acquired shardlock for [[myindex][0]]
[TRACE][env                      ] [Flambe] skipping shard deletion because [myindex][0] uses shadow replicas
[TRACE][env                      ] [Flambe] shard lock wait count for [[myindex][0]] is now [0]
[TRACE][env                      ] [Flambe] last shard lock wait decremented, removing lock for [[myindex][0]]
[TRACE][env                      ] [Flambe] released shard lock for [[myindex][0]]
[DEBUG][cluster.service          ] [Flambe] processing [indices_store]: no change in cluster_state
As well as on the new (receiving) replica side:
[DEBUG][discovery.zen.publish    ] [Jane Kincaid] received cluster state version 10
[DEBUG][cluster.service          ] [Jane Kincaid] processing [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]: execute
[DEBUG][cluster.service          ] [Jane Kincaid] cluster state updated, version [10], source [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]
[DEBUG][cluster.service          ] [Jane Kincaid] set local cluster state to version 10
[DEBUG][indices.cluster          ] [Jane Kincaid] [myindex] creating index
[DEBUG][indices                  ] [Jane Kincaid] creating Index [myindex], shards [1]/[1]
[DEBUG][index.mapper             ] [Jane Kincaid] [myindex] using dynamic[true], default mapping: default_mapping_location[null], loaded_from[file:/Users/hinmanm/src/elasticsearch/target/classes/org/elasticsearch/index/mapper/default-mapping.json], default percolator mapping: location[null], loaded_from[null]
[DEBUG][index.cache.query.parser.resident] [Jane Kincaid] [myindex] using [resident] query cache with max_size [100], expire [null]
[DEBUG][index.store.fs           ] [Jane Kincaid] [myindex] using index.store.throttle.type [none], with index.store.throttle.max_bytes_per_sec [0b]
[DEBUG][indices.cluster          ] [Jane Kincaid] [myindex] adding mapping [doc], source [{"doc":{"properties":{"body":{"type":"string"}}}}]
[DEBUG][indices.cluster          ] [Jane Kincaid] [myindex][0] creating shard
[TRACE][env                      ] [Jane Kincaid] acquiring node shardlock on [[myindex][0]], timeout [5000]
[TRACE][env                      ] [Jane Kincaid] successfully acquired shardlock for [[myindex][0]]
[DEBUG][index                    ] [Jane Kincaid] [myindex] creating shard_id [myindex][0]
[DEBUG][index.store.fs           ] [Jane Kincaid] [myindex] using [/tmp/foo/myindex/0/index] as shard's index location
[DEBUG][index.merge.scheduler    ] [Jane Kincaid] [myindex][0] using [concurrent] merge scheduler with max_thread_count[2], max_merge_count[7], auto_throttle[true]
[DEBUG][index.store.fs           ] [Jane Kincaid] [myindex] using [/tmp/foo/myindex/0/translog] as shard's translog location
[DEBUG][index.deletionpolicy     ] [Jane Kincaid] [myindex][0] Using [keep_only_last] deletion policy
[DEBUG][index.merge.policy       ] [Jane Kincaid] [myindex][0] using [tiered] merge mergePolicy with expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_once[10], max_merge_at_once_explicit[30], max_merged_segment[5gb], segments_per_tier[10.0], reclaim_deletes_weight[2.0]
[DEBUG][index.shard              ] [Jane Kincaid] [myindex][0] state: [CREATED]
[DEBUG][index.translog           ] [Jane Kincaid] [myindex][0] interval [5s], flush_threshold_ops [2147483647], flush_threshold_size [512mb], flush_threshold_period [30m]
[DEBUG][index.shard              ] [Jane Kincaid] [myindex][0] state: [CREATED]->[RECOVERING], reason [from [Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}]
[DEBUG][cluster.service          ] [Jane Kincaid] processing [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]: done applying updated cluster_state (version: 10)
[INFO ][index.engine.internal    ] [Jane Kincaid] [myindex][0] cowardly refusing to CREATE
[INFO ][index.engine.internal    ] [Jane Kincaid] [myindex][0] cowardly refusing to CREATE
[DEBUG][index.shard              ] [Jane Kincaid] [myindex][0] state: [RECOVERING]->[POST_RECOVERY], reason [post recovery]
[DEBUG][index.shard              ] [Jane Kincaid] [myindex][0] scheduling refresher every 1s
[DEBUG][indices.recovery         ] [Jane Kincaid] [myindex][0] recovery completed from [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}], took [168ms]
[DEBUG][cluster.action.shard     ] [Jane Kincaid] sending shard started for [myindex][0], node[KaR0Q_QFSs2Capu6kCBPaw], relocating [MqFW42-ZT8G6AL8zR2krTg], [R], s[INITIALIZING], indexUUID [HjGOiD2dTRiSDZFuWdyf8A], reason [after recovery (replica) from node [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}]]
[DEBUG][discovery.zen.publish    ] [Jane Kincaid] received cluster state version 11
[DEBUG][cluster.service          ] [Jane Kincaid] processing [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]: execute
[DEBUG][cluster.service          ] [Jane Kincaid] cluster state updated, version [11], source [zen-disco-receive(from master [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}])]
[DEBUG][cluster.service          ] [Jane Kincaid] set local cluster state to version 11
[DEBUG][index.shard              ] [Jane Kincaid] [myindex][0] state: [POST_RECOVERY]->[STARTED], reason [global state is [STARTED]]
However, on the primary side:
[DEBUG][cluster.service          ] [Xemu] processing [cluster_reroute (api)]: execute
[DEBUG][cluster.routing.allocation.decider] [Xemu] Node [eJj0uF6YQIGQ0610E9-Rqg] has 30.184784851314088% free disk (75403067392 bytes)
[DEBUG][cluster.service          ] [Xemu] cluster state updated, version [10], source [cluster_reroute (api)]
[DEBUG][cluster.service          ] [Xemu] publishing cluster state version 10
[DEBUG][cluster.service          ] [Xemu] set local cluster state to version 10
[DEBUG][river.cluster            ] [Xemu] processing [reroute_rivers_node_changed]: execute
[DEBUG][river.cluster            ] [Xemu] processing [reroute_rivers_node_changed]: no change in cluster_state
[DEBUG][cluster.service          ] [Xemu] processing [cluster_reroute (api)]: done applying updated cluster_state (version: 10)
[DEBUG][cluster.service          ] [Xemu] processing [recovery_mapping_check]: execute
[DEBUG][cluster.service          ] [Xemu] processing [recovery_mapping_check]: no change in cluster_state
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW: commit: start
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW: commit: enter lock
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW: commit: now prepare
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW: prepareCommit: flush
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW:   index before flush _0(5.1.0):c2 _1(5.1.0):c2
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] DW: startFullFlush
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW: apply all deletes during flush
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW: now apply all deletes for all segments maxDoc=4
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] BD: applyDeletes: open segment readers took 0 msec
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] BD: applyDeletes: no segments; skipping
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] BD: prune sis=segments: _0(5.1.0):c2 _1(5.1.0):c2 minGen=3 packetCount=0
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] DW: elasticsearch[Xemu][generic][T#7] finishFullFlush success=true
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW: startCommit(): start
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW: startCommit index=_0(5.1.0):c2 _1(5.1.0):c2 changeCount=8
[TRACE][index.engine.internal.lucene.iw] [Xemu] [myindex][0] elasticsearch[Xemu][generic][T#7] IW: hit exception committing segments file
[WARN ][index.engine.internal    ] [Xemu] [myindex][0] failed to flush shard post recovery
org.elasticsearch.index.engine.FlushFailedEngineException: [myindex][0] Flush failed
  at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:799)
  at org.elasticsearch.index.engine.internal.InternalEngine$FlushingRecoveryCounter.endRecovery(InternalEngine.java:1435)
  at org.elasticsearch.index.engine.internal.RecoveryCounter.close(RecoveryCounter.java:63)
  at org.elasticsearch.common.lease.Releasables.close(Releasables.java:45)
  at org.elasticsearch.common.lease.Releasables.close(Releasables.java:60)
  at org.elasticsearch.common.lease.Releasables.close(Releasables.java:81)
  at org.elasticsearch.common.lease.Releasables.close(Releasables.java:89)
  at org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1048)
  at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:673)
  at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:120)
  at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:48)
  at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:141)
  at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:127)
  at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:276)
  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: /private/tmp/foo/myindex/0/index/_1.cfs
  at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
  at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
  at java.nio.channels.FileChannel.open(FileChannel.java:287)
  at java.nio.channels.FileChannel.open(FileChannel.java:335)
  at org.apache.lucene.util.IOUtils.fsync(IOUtils.java:392)
  at org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:281)
  at org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:226)
  at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:78)
  at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:78)
  at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:78)
  at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4284)
  at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2721)
  at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2824)
  at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2791)
  at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:785)
  ... 17 more
[DEBUG][cluster.action.shard     ] [Xemu] received shard started for [myindex][0], node[KaR0Q_QFSs2Capu6kCBPaw], relocating [MqFW42-ZT8G6AL8zR2krTg], [R], s[INITIALIZING], indexUUID [HjGOiD2dTRiSDZFuWdyf8A], reason [after recovery (replica) from node [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}]]
[DEBUG][cluster.service          ] [Xemu] processing [shard-started ([myindex][0], node[KaR0Q_QFSs2Capu6kCBPaw], relocating [MqFW42-ZT8G6AL8zR2krTg], [R], s[INITIALIZING]), reason [after recovery (replica) from node [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}]]]: execute
[DEBUG][cluster.action.shard     ] [Xemu] [myindex][0] will apply shard started [myindex][0], node[KaR0Q_QFSs2Capu6kCBPaw], relocating [MqFW42-ZT8G6AL8zR2krTg], [R], s[INITIALIZING], indexUUID [HjGOiD2dTRiSDZFuWdyf8A], reason [after recovery (replica) from node [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}]]
[DEBUG][cluster.routing.allocation.decider] [Xemu] Node [eJj0uF6YQIGQ0610E9-Rqg] has 30.184784851314088% free disk (75403067392 bytes)
[DEBUG][cluster.routing.allocation.decider] [Xemu] Node [KaR0Q_QFSs2Capu6kCBPaw] has 30.184784851314088% free disk (75403067392 bytes)
[DEBUG][cluster.service          ] [Xemu] cluster state updated, version [11], source [shard-started ([myindex][0], node[KaR0Q_QFSs2Capu6kCBPaw], relocating [MqFW42-ZT8G6AL8zR2krTg], [R], s[INITIALIZING]), reason [after recovery (replica) from node [[Xemu][eJj0uF6YQIGQ0610E9-Rqg][Xanadu.local][inet[/10.0.0.16:9300]]{add_id_to_custom_path=false, enable_custom_paths=true}]]]
[DEBUG][cluster.service          ] [Xemu] publishing cluster state version 11
[DEBUG][cluster.service          ] [Xemu] set local cluster state to version 11
Here's the full primary log:
Aha! I know why, it's because when we open the IndexReader on the new replica,
it removes all the files the last commit point doesn't know about, which would
be the _1.cfs file that the primary is complaining about…
Fixed this by not removing anything if shadow replicas are in use for the index, but I'm not sure how happy I am with this solution… are we going to leak files in the directory?
DONE relocating a primary doesn't work
So, because the original primary still contains a lock on the write.lock file,
the soon-to-become primary can't open an IndexWriter on it, so hrm…
Shay: we should just throw an exception if a primary tries to relocate, fail the primary and a new one will be promoted out of the replicas. This will unblock for the time being.
In the future, we might need to close the InternalEngine for the primary before
relocating the data to the new primary. Either that, or we need to keep the
newly created primary from opening an IndexWriter until all the segments are
copied over.
ShadowEngine tests [8/8]
Yep, I need to write lots of them…
DONE test basic engine things
that all the write operations don't actually get executed, basically that it is read-only
DONE test that a flush & refresh makes new segments visible to replica
of course, only on a shared file-system
DONE test that replica -> primary promotion works
this works in manual testing, but need a test in Java that does it
Simon is working on this
DONE test that replica relocation works
This should be the easier of the two, since it doesn't actually have to replay anything, just create a new IndexReader on the replica
DONE test that primary relocation works
This is tough because of the IndexWriter problem
Right now we fail the shard, is this desired?
DONE test that deleting an index with shadow replicas cleans up the disk
Simon may do this, or I may, depending on who gets to it first.
I added this test.
DONE figure out the Mock modules snafu
Right now all the Mock wrappers we have fail the tests like crazy, but I don't know how to disable them from the test itself. Figure this out and then figure out how to re-enable them and not fail the tests
This is the tests.enable_mock_modules setting (in InternalTestCluster),
which defaults to true. With this enabled, I get failures like:
java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_0.cfs=1}
	at __randomizedtesting.SeedInfo.seed([8B82D7C4C56C71B1:DC4DF66AAB62646F]:0)
	at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:762)
	at org.elasticsearch.test.store.MockDirectoryHelper$ElasticsearchMockDirectoryWrapper.close(MockDirectoryHelper.java:135)
	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAllFilesClosed(ElasticsearchAssertions.java:681)
	at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:85)
	at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:1743)
	at org.elasticsearch.test.ElasticsearchIntegrationTest.afterInternal(ElasticsearchIntegrationTest.java:618)
Figure this out with Simon
DONE disable checking index when closed
Trying to check a replica that's been closed doesn't work, because the primary is still using that data so there's still a lock on the directory and checking opens a new lock.
CANCELLED shadow engine freaks out if there are no segments* files on disk CANCELLED
- State "CANCELLED"  from "TODO"        
 Simon said this is as-expected
See:
org.apache.lucene.index.IndexNotFoundException: no segments* file found in store(ElasticsearchMockDirectoryWrapper(least_used[default(mmapfs(/private/var/folders/kb/fplmcbhs4z52p4x59cqgm4kw0000gn/T/tests-20150211010430-146/0000 has-space/test/2/index),niofs(/private/var/folders/kb/fplmcbhs4z52p4x59cqgm4kw0000gn/T/tests-20150211010430-146/0000 has-space/test/2/index))])): files: [recovery.1423613070833.segments_1, write.lock] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:637) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:68) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) at org.elasticsearch.index.engine.ShadowEngine.<init>(ShadowEngine.java:68) at org.elasticsearch.index.engine.ShadowEngineFactory.newEngine(ShadowEngineFactory.java:28) at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1170)
This happens if the shadow engine is created on a new index before the InternalEngine
DONE make shadow replicas not delete the data directory, the primary will do that
I changed NodeEnvironment.deleteShardDirectorySafe not to delete the shard
directory if shadow replicas are used, deleting the index directory will take
care of that.
DONE add a read only block while recovering translog on primary
Currently we replay the translog "internally" when a shadow replica has been promoted to a primary, however, we don't actually block any indexing operations while this replaying is going on. We need to add a block while this is going on.
Clint, see: https://github.com/elasticsearch/elasticsearch/pull/9203
Maybe Tanguy knows more about how to add this dynamically.
Fixed this by failing the shard if it is promoted to a primary
DONE make shadow replicas clean up if they are the last deletion in the cluster
Currently I prevent NodeEnvironment from deleting anything if shadow
replicas are used, but I need a way to ensure the last deletion in the cluster
actually removes the directory.
Right now, whoever uses this will be responsible for removing the data themselves.
This is a subset of this, but implemented correctly. Simon says that the
deletion should move up to the IndexService instead of living in a lower
level.
Simon fixed this! \o/
DONE update shadow replica issue with new plan (shadow_replicas: true)
DONE write.lock missing when closing primary shard?!
I think NodeEnvironment is doing this
Found it:
[2015-01-16 15:25:24,590][TRACE][index.store.deletes ] [Thane Ector][myindex][0] recovery CleanFilesRequestHandler: delete file FOO [2015-01-16 15:25:24,591][TRACE][index.store.deletes ] [Thane Ector][myindex][0] StoreDirectory.deleteFile: delete file FOO
It's CleanFilesRequestHandler, which invokes Store.cleanupAndVerify, which
in turn deletes every file not contained in the source's metadata.
See:
[2015-01-14 14:51:16,431][INFO ][node ] [Gamora] stopping ... [2015-01-14 14:51:16,462][DEBUG][index.engine.internal ] [Gamora] [myindex][0] close now acquire writeLock [2015-01-14 14:51:16,462][DEBUG][index.engine.internal ] [Gamora] [myindex][0] close acquired writeLock [2015-01-14 14:51:16,462][DEBUG][index.engine.internal ] [Gamora] [myindex][0] close searcherManager [2015-01-14 14:51:16,463][DEBUG][index.engine.internal ] [Gamora] [myindex][0] rollback indexWriter [2015-01-14 14:51:16,469][WARN ][index.engine.internal ] [Gamora] [myindex][0] failed to rollback writer on close java.nio.file.NoSuchFileException: /private/tmp/foo/myindex/0/index/write.lock at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixPath.toRealPath(UnixPath.java:837) at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.clearLockHeld(NativeFSLockFactory.java:182) at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.close(NativeFSLockFactory.java:172) at org.apache.lucene.util.IOUtils.close(IOUtils.java:96) at org.apache.lucene.util.IOUtils.close(IOUtils.java:83) at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2044) at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:1972) at org.elasticsearch.index.engine.internal.InternalEngine.close(InternalEngine.java:1229) at org.apache.lucene.util.IOUtils.close(IOUtils.java:96) at org.apache.lucene.util.IOUtils.close(IOUtils.java:83) at org.elasticsearch.index.shard.IndexShard.close(IndexShard.java:669) at org.elasticsearch.index.IndexService.closeShardInjector(IndexService.java:384) at org.elasticsearch.index.IndexService.removeShard(IndexService.java:363) at org.elasticsearch.index.IndexService.close(IndexService.java:255) at org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:373) at org.elasticsearch.indices.IndicesService.access$000(IndicesService.java:89) at org.elasticsearch.indices.IndicesService$1.run(IndicesService.java:131) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
DONE ensure no merges run on shadow replicas
Need to use a custom merge policy and/or scheduler?
I think this is handled by not registering a merge policy with the
IndexWriter, but I should make sure the IW isn't installing a default policy.
DONE Tell Luca and Clint about the typo in ShardStateAction
We have:
public static final String SHARD_STARTED_ACTION_NAME = "internal:cluster/shard/failure"; public static final String SHARD_FAILED_ACTION_NAME = "internal:cluster/shard/started";
DONE ensure shadow becomes normal after switching from replica to primary
I kind of added this in IndexShard.shardRouting() already
DONE handle replaying translog on recovery if replica is shadow
I think this can be done by creating a new shard and doing local recovery, but will need to check more.
Look in IndexShardGatewayService.recover and InedxShardGateway.recover, this is where we recover from translog. Can I just call it manually?
[DEBUG][cluster.action.shard] sending shard started for [myindex][0], node[4-kS-6aGQUWE7Qi65hxUEg], [P], s[STARTED], indexUUID [FEBqQ6I4Q0asig34M1Ai-w], reason [after recovery from gateway] [DEBUG][cluster.action.shard] received shard started for [myindex][0], node[4-kS-6aGQUWE7Qi65hxUEg], [P], s[STARTED], indexUUID [FEBqQ6I4Q0asig34M1Ai-w], reason [after recovery from gateway] [DEBUG][cluster.action.shard] [myindex][0] ignoring shard started event for [myindex][0], node[4-kS-6aGQUWE7Qi65hxUEg], [P], s[STARTED], indexUUID [FEBqQ6I4Q0asig34M1Ai-w], reason [after recovery from gateway], current state: STARTED
DONE create the shard routing with the shadow flag in RoutingNodes
CANCELLED sometimes a regular engine gets created instead of a shadow one CANCELLED
- State "CANCELLED"  from "TODO"        
 fixed this
CANCELLED don't replicate operations if the replica is a shadow replica CANCELLED
- State "CANCELLED"  from "TODO"        
 not doing this, the operation is transmitted but ignored
CANCELLED Figure out how to send shard started event after re-creation CANCELLED
- State "CANCELLED"  from "TODO"        
 don't need this anymore, we are recovering internally
The shard recreates fine, but whenever we try to send the "hey I'm started now", the master rejects it because it thinks the shard is already started.
DONE don't send docs to the replica when using shared FS
I think we can stick this in TransportShardReplicationOperationAction
CANCELLED ensure mapping is updated on replica when request is not sent CANCELLED
- State "CANCELLED"  from "TODO"        
 Simon said Shay mentioned this is as expected
Related to don't send docs to the replica when using shared FS, there is currently a nocommit for when dynamic mappings are used. It looks like the replica doesn't get the new mappings until much later in this case.
CANCELLED look into waiting for mapping changes on refresh for SRs CANCELLED
- State "CANCELLED"  from "TODO"        
 See: ensure mapping is updated on replica when request is not sent