Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Zen2] Gather votes from all nodes #34335

Merged
merged 26 commits into from
Oct 6, 2018

Conversation

DaveCTurner
Copy link
Contributor

Today we accept that some nodes may vote for the wrong master in an election.
This is mostly fine because they do end up joining the correct master in the
end, but the lack of a vote from every follower may prevent a future desirable
reconfiguration from taking place.

The solution is to hold another election in a yet-higher term in order to
collect a complete set of votes. Elections are somewhat disruptive so we should
think carefully about when this election should take place. One option is to
wait as late as possible (on the grounds that it might not ever be necessary).
This unfortunately makes it harder to predict how an
apparently-smoothly-running cluster will react to nodes leaving and joining.
Instead we prefer to perform the election as soon as possible in the leader's
term, adding "votes from all followers" to the invariants that we expect to
hold in a stable cluster. The start of a leader's term is already a somewhat
disrupted time for the cluster, so performing another election at this point
does not materially change the cluster's behaviour.

This change implements the logic needed to trigger a new election in order to
satisfy this extra stabilisation condition.

@DaveCTurner DaveCTurner added >enhancement v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Oct 5, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch October 5, 2018 16:56
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch ywelsch mentioned this pull request Oct 5, 2018
61 tasks
@DaveCTurner DaveCTurner changed the title Gather votes from all nodes [Zen2] Gather votes from all nodes Oct 5, 2018
@@ -753,6 +766,11 @@ public String toString() {
private final AckListener ackListener;
private final ActionListener<Void> publishListener;

// We may not have accepted our own state before receiving a join from another node, causing its join to be rejected (we cannot
// safely accept a join whose last-accepted term/version is ahead of ours), so store them up and process them at the end.
// TODO this is unpleasant, is there a better way?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's not so bad. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as reconfiguration (which cares about the joins) can only happen in the next cluster state update (and is only triggered at the end of this publication), I think this is ok.

@DaveCTurner
Copy link
Contributor Author

Iterated test runs found some interesting cases here (I thought I was done at 2c98dd7, all the subsequent commits were chasing test failures) but they've now passed 600 successes in a row on 5d85715 so I think this is good to go.

@DaveCTurner
Copy link
Contributor Author

Another failure at ~800 iterations; this specific case is fixed in 71a642d but I think there's a related failure in which the PublishResponse containing the vital join gets dropped. This would mean that the affected node appears to be lagging: it never applies the last-published state, but we don't have lag detection yet.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (sourceNode.equals(getLocalNode())) {
preVoteCollector.update(getPreVoteResponse(), getLocalNode());
} else {
becomeFollower("handlePublishRequest", sourceNode); // updates preVoteCollector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change comment to "also updates preVoteCollector" to make it clearer that that is not the only purpose (or maybe I just misinterpreted this)

@@ -753,6 +766,11 @@ public String toString() {
private final AckListener ackListener;
private final ActionListener<Void> publishListener;

// We may not have accepted our own state before receiving a join from another node, causing its join to be rejected (we cannot
// safely accept a join whose last-accepted term/version is ahead of ours), so store them up and process them at the end.
// TODO this is unpleasant, is there a better way?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as reconfiguration (which cares about the joins) can only happen in the next cluster state update (and is only triggered at the end of this publication), I think this is ok.

@DaveCTurner DaveCTurner merged commit 03da4f6 into elastic:zen2 Oct 6, 2018
@DaveCTurner DaveCTurner deleted the 2018-10-05-term-bumping branch October 6, 2018 06:22
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants