[Zen2] Gather votes from all nodes #34335

DaveCTurner · 2018-10-05T16:56:27Z

Today we accept that some nodes may vote for the wrong master in an election.
This is mostly fine because they do end up joining the correct master in the
end, but the lack of a vote from every follower may prevent a future desirable
reconfiguration from taking place.

The solution is to hold another election in a yet-higher term in order to
collect a complete set of votes. Elections are somewhat disruptive so we should
think carefully about when this election should take place. One option is to
wait as late as possible (on the grounds that it might not ever be necessary).
This unfortunately makes it harder to predict how an
apparently-smoothly-running cluster will react to nodes leaving and joining.
Instead we prefer to perform the election as soon as possible in the leader's
term, adding "votes from all followers" to the invariants that we expect to
hold in a stable cluster. The start of a leader's term is already a somewhat
disrupted time for the cluster, so performing another election at this point
does not materially change the cluster's behaviour.

This change implements the logic needed to trigger a new election in order to
satisfy this extra stabilisation condition.

elasticmachine · 2018-10-05T16:56:29Z

Pinging @elastic/es-distributed

DaveCTurner · 2018-10-05T17:00:28Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

@@ -753,6 +766,11 @@ public String toString() {
        private final AckListener ackListener;
        private final ActionListener<Void> publishListener;

+        // We may not have accepted our own state before receiving a join from another node, causing its join to be rejected (we cannot
+        // safely accept a join whose last-accepted term/version is ahead of ours), so store them up and process them at the end.
+        // TODO this is unpleasant, is there a better way?


Maybe it's not so bad. WDYT?

as reconfiguration (which cares about the joins) can only happen in the next cluster state update (and is only triggered at the end of this publication), I think this is ok.

DaveCTurner · 2018-10-05T17:03:47Z

Iterated test runs found some interesting cases here (I thought I was done at 2c98dd7, all the subsequent commits were chasing test failures) but they've now passed 600 successes in a row on 5d85715 so I think this is good to go.

DaveCTurner · 2018-10-05T18:25:28Z

Another failure at ~800 iterations; this specific case is fixed in 71a642d but I think there's a related failure in which the PublishResponse containing the vital join gets dropped. This would mean that the affected node appears to be lagging: it never applies the last-published state, but we don't have lag detection yet.

ywelsch

LGTM

ywelsch · 2018-10-05T21:26:58Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

+            if (sourceNode.equals(getLocalNode())) {
+                preVoteCollector.update(getPreVoteResponse(), getLocalNode());
+            } else {
+                becomeFollower("handlePublishRequest", sourceNode); // updates preVoteCollector


maybe change comment to "also updates preVoteCollector" to make it clearer that that is not the only purpose (or maybe I just misinterpreted this)

ywelsch · 2018-10-05T21:34:26Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

@@ -753,6 +766,11 @@ public String toString() {
        private final AckListener ackListener;
        private final ActionListener<Void> publishListener;

+        // We may not have accepted our own state before receiving a join from another node, causing its join to be rejected (we cannot
+        // safely accept a join whose last-accepted term/version is ahead of ours), so store them up and process them at the end.
+        // TODO this is unpleasant, is there a better way?


as reconfiguration (which cares about the joins) can only happen in the next cluster state update (and is only triggered at the end of this publication), I think this is ok.

DaveCTurner added 20 commits October 5, 2018 09:32

Add assertions about the preVoteCollector's consistency

9129532

Extract variable

33113de

Log all exceptions the same

26be291

Update max term seen

4642e24

Remove term-bump workaround

22130dc

Bump term on discovery of the need to do so

3ba1a6d

Reinstate assertion that every connected node has voted for the leader

4a70f62

Make fields private

9cea60f

Generate DiscoveryNodes deterministically

5acffc9

Private

e11e929

Handle publish requests without attached joins

2c98dd7

Fix missing update to preVoteCollector

7569c4f

TODO is done

9108380

Trace join handling

af8916a

Include join in message

b2675ae

Track all the joins and process them at the end

83d3e31

Wait for local ack, onCompletion might not be late enough

290b960

Extend delay in the unresponsive leader test

e86e4a2

Added TODO

04b0451

Unused imports

5d85715

DaveCTurner added >enhancement v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Oct 5, 2018

DaveCTurner requested a review from ywelsch October 5, 2018 16:56

ywelsch mentioned this pull request Oct 5, 2018

A new cluster coordination layer #32006

Closed

61 tasks

DaveCTurner changed the title ~~Gather votes from all nodes~~ [Zen2] Gather votes from all nodes Oct 5, 2018

DaveCTurner commented Oct 5, 2018

View reviewed changes

Deal with late-arriving joins

71a642d

DaveCTurner added 3 commits October 5, 2018 19:33

Harmonise join filtering logic

8405768

Higher prio log messages

77bd13b

Add lag-fixing hack

5666232

ywelsch approved these changes Oct 5, 2018

View reviewed changes

DaveCTurner added 2 commits October 5, 2018 23:12

Comment fixes from review

dbdfcc1

Used the wrong branch when simplifying exception logging

8e4b8dd

DaveCTurner merged commit 03da4f6 into elastic:zen2 Oct 6, 2018

DaveCTurner deleted the 2018-10-05-term-bumping branch October 6, 2018 06:22

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Zen2] Gather votes from all nodes #34335

[Zen2] Gather votes from all nodes #34335

DaveCTurner commented Oct 5, 2018

elasticmachine commented Oct 5, 2018

DaveCTurner Oct 5, 2018

ywelsch Oct 5, 2018

DaveCTurner commented Oct 5, 2018

DaveCTurner commented Oct 5, 2018

ywelsch left a comment

ywelsch Oct 5, 2018

ywelsch Oct 5, 2018

[Zen2] Gather votes from all nodes #34335

[Zen2] Gather votes from all nodes #34335

Conversation

DaveCTurner commented Oct 5, 2018

elasticmachine commented Oct 5, 2018

DaveCTurner Oct 5, 2018

Choose a reason for hiding this comment

ywelsch Oct 5, 2018

Choose a reason for hiding this comment

DaveCTurner commented Oct 5, 2018

DaveCTurner commented Oct 5, 2018

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Oct 5, 2018

Choose a reason for hiding this comment

ywelsch Oct 5, 2018

Choose a reason for hiding this comment