-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Zen2] Add warning if cluster fails to form fast enough #35993
[Zen2] Add warning if cluster fails to form fast enough #35993
Conversation
Today if a leader is not discovered or elected then nodes are essentially silent at INFO and above, and log copiously at DEBUG and below. A short delay when electing a leader is not unusual, for instance if other nodes have not yet started, but a persistent failure to elect a leader is a problem worthy of log messages in the default configuration. With this change, while there is no leader each node outputs a WARN-level log message every 10 seconds (by default) indicating as such, describing the current discovery state and the current quorum(s).
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the logic here is spread over too many classes. Can we just expose the info from PeerFinder that we need (i.e. lastResolvedAddresses)? Coordinator takes care of scheduling this and can then just assemble the information from the various components into a log output, not requiring a callback.
@@ -386,5 +386,17 @@ public static VotingConfiguration of(DiscoveryNode... nodes) { | |||
// this could be used in many more places - TODO use this where appropriate | |||
return new VotingConfiguration(Arrays.stream(nodes).map(DiscoveryNode::getId).collect(Collectors.toSet())); | |||
} | |||
|
|||
public String getQuorumDescription() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method (and the other describe methods) are without context in their respective classes. I would prefer to have the full construction of the output in warnClusterFormationFailed
foundPeers.forEach(possibleVotes::addVote); | ||
final String isQuorumOrNot = coordinationState.get().isElectionQuorum(possibleVotes) ? "is a quorum" : "is not a quorum"; | ||
|
||
logger.warn("leader not discovered or elected yet: election requires {}, have discovered {} which {}; discovery " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's swap leader for master.
|
||
public String getBootstrapDescription() { | ||
if (initialMasterNodeCount == 0) { | ||
return "external cluster bootstrapping"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is meant by "external cluster bootstrapping"?
foundPeers.forEach(possibleVotes::addVote); | ||
final String isQuorumOrNot = coordinationState.get().isElectionQuorum(possibleVotes) ? "is a quorum" : "is not a quorum"; | ||
|
||
logger.warn("leader not discovered or elected yet: election requires {}, have discovered {} which {}; discovery " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case this is not a master-eligible node: does it even make sense to talk about elections here? Maybe it should state that it is a non-master-eligible nodes and that it cannot find a master?
... and back out the unnecessary changes elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Today if a leader is not discovered or elected then nodes are essentially
silent at INFO and above, and log copiously at DEBUG and below. A short delay
when electing a leader is not unusual, for instance if other nodes have not yet
started, but a persistent failure to elect a leader is a problem worthy of log
messages in the default configuration.
With this change, while there is no leader each node outputs a WARN-level log
message every 10 seconds (by default) indicating as such, describing the
current discovery state and the current quorum(s).