Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Zen2] Add warning if cluster fails to form fast enough #35993

Conversation

DaveCTurner
Copy link
Contributor

Today if a leader is not discovered or elected then nodes are essentially
silent at INFO and above, and log copiously at DEBUG and below. A short delay
when electing a leader is not unusual, for instance if other nodes have not yet
started, but a persistent failure to elect a leader is a problem worthy of log
messages in the default configuration.

With this change, while there is no leader each node outputs a WARN-level log
message every 10 seconds (by default) indicating as such, describing the
current discovery state and the current quorum(s).

Today if a leader is not discovered or elected then nodes are essentially
silent at INFO and above, and log copiously at DEBUG and below. A short delay
when electing a leader is not unusual, for instance if other nodes have not yet
started, but a persistent failure to elect a leader is a problem worthy of log
messages in the default configuration.

With this change, while there is no leader each node outputs a WARN-level log
message every 10 seconds (by default) indicating as such, describing the
current discovery state and the current quorum(s).
@DaveCTurner DaveCTurner added >enhancement v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Nov 28, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch November 28, 2018 12:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner DaveCTurner changed the title Add warning if cluster fails to form fast enough [Zen2] Add warning if cluster fails to form fast enough Nov 28, 2018
@ywelsch ywelsch mentioned this pull request Nov 29, 2018
61 tasks
@DaveCTurner DaveCTurner changed the base branch from zen2 to master December 6, 2018 08:27
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the logic here is spread over too many classes. Can we just expose the info from PeerFinder that we need (i.e. lastResolvedAddresses)? Coordinator takes care of scheduling this and can then just assemble the information from the various components into a log output, not requiring a callback.

@@ -386,5 +386,17 @@ public static VotingConfiguration of(DiscoveryNode... nodes) {
// this could be used in many more places - TODO use this where appropriate
return new VotingConfiguration(Arrays.stream(nodes).map(DiscoveryNode::getId).collect(Collectors.toSet()));
}

public String getQuorumDescription() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method (and the other describe methods) are without context in their respective classes. I would prefer to have the full construction of the output in warnClusterFormationFailed

foundPeers.forEach(possibleVotes::addVote);
final String isQuorumOrNot = coordinationState.get().isElectionQuorum(possibleVotes) ? "is a quorum" : "is not a quorum";

logger.warn("leader not discovered or elected yet: election requires {}, have discovered {} which {}; discovery " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's swap leader for master.


public String getBootstrapDescription() {
if (initialMasterNodeCount == 0) {
return "external cluster bootstrapping";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is meant by "external cluster bootstrapping"?

foundPeers.forEach(possibleVotes::addVote);
final String isQuorumOrNot = coordinationState.get().isElectionQuorum(possibleVotes) ? "is a quorum" : "is not a quorum";

logger.warn("leader not discovered or elected yet: election requires {}, have discovered {} which {}; discovery " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this is not a master-eligible node: does it even make sense to talk about elections here? Maybe it should state that it is a non-master-eligible nodes and that it cannot find a master?

@DaveCTurner DaveCTurner requested a review from ywelsch December 7, 2018 14:53
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 9d41798 into elastic:master Dec 7, 2018
@DaveCTurner DaveCTurner deleted the 2018-11-28-cluster-formation-timeout-warning branch December 7, 2018 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants