Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistency caused by the autopilot StatsFetcher #4528

Merged
merged 2 commits into from
Aug 15, 2018

Conversation

kyhavlov
Copy link
Contributor

This fix came out of looking into an issue in the enterprise redundancy zones feature - the reads on the RPC result channels in the stats fetcher were being affected by a single failed outgoing RPC (doing a select against multiple channels does not give priority to the first channel) which was incorrectly marking other servers as failed/unhealthy. This caused some inconsistencies with autopilot promoting servers, namely around redundancy zones where the current voter in a zone could be marked as unhealthy.

Fixing this also exposed that the restriction around removing dead servers (removalCount < peers/2) shouldn't apply in the case of non-voting servers, since there's no downside to removing them as they can't affect the quorum.

@kyhavlov kyhavlov requested a review from mkeeler August 14, 2018 21:34
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@kyhavlov kyhavlov merged commit fa8990c into master Aug 15, 2018
@kyhavlov kyhavlov deleted the autopilot-fixes branch August 15, 2018 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants