-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide option to allow writes when master is down #60605
Conversation
Pinging @elastic/es-distributed (:Distributed/Cluster Coordination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When indexing with the block in place, we would previously timeout the entire shard or bulk request after the timeout provided (defaulting to 1 minute).
With the new metadata_write
block, the write will go through (which is fine), but in case of a shard failure, it will block the request indefinitely instead.
I think this has two potential bad effects:
- We could build up lots of shard failed requests waiting for this.
- When a master comes back, we could have a burst of those sent to master.
I guess the byte based limiting also puts a limit to 1 and the shard failed deduplication solves 2 so this is likely not an issue, but thought I would mention anyway in case it makes others worried.
Otherwise looking good to me.
server/src/internalClusterTest/java/org/elasticsearch/cluster/NoMasterNodeIT.java
Show resolved
Hide resolved
As you pointed out, the previous behavior was to unconditionally time out these write requests in the Reroute stage after a minute. The new behavior will proceed in the reroute phase, but keep the requests in a "stuck" state until a master is back. As a lot of requests can be piling up on a node within a minute (more than the node has memory), I think this should not introduce new unseen behavior. The byte-based memory limit for indexing is of help not only with this new block, but also with the old blocks. With the write block active (i.e. the current default), many requests can start piling up, with no bound at all (each one is turned into a ClusterStateObserver, waiting up to a minute for cluster state updates). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Elasticsearch currently blocks writes by default when a master is unavailable. The cluster.no_master_block setting allows a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the master for consistency). Eventually we should switch the default of cluster.no_master_block to this new mode.
We can't assert on the specific exception, unfortunately.
We can't assert on the specific exception, unfortunately.
Elasticsearch currently blocks writes by default when a master is unavailable. The
cluster.no_master_block
setting allows a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the master for consistency).Eventually we should switch the default of
cluster.no_master_block
to this new mode.