Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault 6815/respect min quorum #23

Merged
merged 15 commits into from
Nov 5, 2022
Merged

Conversation

peteski22
Copy link

Adjusted raft-autopilot to respect the minimum quorum setting regardless of what type of server it's dealing with (stale, failed, voters, non-voters).

Servers will be allowed to be removed in order of increasing precedence until the threshold is reached (stale-non-voters => stale=> voters, failed-non-voters => failed-voters).

The following has been added to the Promoter interface: PotentialVoterPredicate(NodeType) bool. This is to allow the application to make decisions about which types of node can be voters, as they may have added additional types to NodeType.

Finally, we should note that any downstream integrators of this repo may have work to do when moving to a new version due to the addition to the interface, however at the time of creation there were only HashiCorp repos consuming this and 2 external repos (https://pkg.go.dev/github.com/hashicorp/raft-autopilot?tab=importedby).

Related issues:

#17
hashicorp/vault#16919

peteski22 added 2 commits November 1, 2022 15:36
reconcile.go Outdated Show resolved Hide resolved
types.go Outdated Show resolved Hide resolved
types.go Outdated Show resolved Hide resolved
reconcile.go Outdated Show resolved Hide resolved
reconcile.go Outdated Show resolved Hide resolved
reconcile.go Outdated

maxRemoval := (voters - 1) / 2
// remove failed non-voters
failedNonVoters := vr.Filter(failed.FailedNonVoters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this filter call?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to filter the []*Server objects to only return the raft.ServerIDs of the ones we want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which ones do we want? I'm confused because it's just ensuring that these nodes are present in the map, and I'm not sure why that's needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the collection of server ID ([]raft.ServerID), on FailedServers object, the 'stale' fields, StaleNonVoters and StaleVoters are both just collections (slices) or the ID. But FailedNonVoters and FailedVoters are collections of server objects. So that filter is just a way to get the IDs, but ensuring the voter registry knows about them (as we pull voter eligibility info from the registry inside the adjudicate func).

reconcile.go Outdated Show resolved Hide resolved
reconcile.go Outdated
a.logger.Info("Attempting removal of failed server node", "id", srv.ID, "name", srv.Name, "address", srv.Address)
a.delegate.RemoveFailedServer(srv)
// Remove servers in order of increasing precedence
// Update the voter registry after each batch is processed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing the order here, e.g. before FailedNonVoters were considered first, now we're looking at them after the stale servers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm asking is: what's the basis for the "precedence" ordering you're using?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe (and may well be totally wrong) it should be that stale servers are the lowest 'value', as to the application those servers don't exist (in its view).

So first removing stale servers (non-voting and then voting) when possible, should be done before we attempt to remove failed servers (that the app does know about and could potentially become healthy again).

A failed non-voter is still (most likely) a potential voter, a stale voter or non-voter won't be.

Might be one which requires more thought/conversation :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's true in Vault's case at least. A stale server could simply be one whose echo message we haven't seen yet. So it might be stale now, and in 10s it won't be.

Copy link
Author

@peteski22 peteski22 Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vault would be the 'application' in this scenario, and Vault's chance to say which servers should/shouldn't be part of the raft config is via KnownServers which it supplies. Naming both healthy and failed servers.

So if the server isn't in that collection of known servers, but is in the Raft configuration, then it's stale (from autopilot's perspective).. and needs to be removed.

I may have been in this code for too long on my own, but that was how it seemed to me to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here in Vault's code a server is either healthy (alive) or failed (left) https://github.com/hashicorp/vault/blob/main/physical/raft/raft_autopilot.go#L428. If the server isn't in the map of servers, it's stale to Autopilot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this for a while. There are tradeoffs here. If we remove stale first then we are moving things closer to the applications desired cluster config. However, this does involve messing with the raft config which is inherently a bit riskier than telling the application to remove a failed server. However, at least in Consul's usage, the request to remove the failed server will almost always result in the server being removed from the raft config very quickly after.

For Consul's case the only time when one of those operations will not result in a raft config update is when the failed server isn't actually in the raft config yet. This seems like an extreme edge case and not something we need to explicitly account for.

So in the end I think we should follow two rules:

  1. Deal with non-voters first as their removal shouldn't impact cluster stability.
  2. Handle stale before failed so as to be moving things towards the applications desired server set.

The code as implemented does follow those rules. It would be great if you could add a comment about the ordering somewhere in here though (which I should have done when originally writing the library)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the rules you've mentioned in a comment in pruneDeadServers now, so hopefully the intent is documented to some extent.

// Update the voter registry after each batch is processed

// remove stale non-voters
toRemove := a.adjudicateRemoval(failed.StaleNonVoters, vr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in principle you could now collapse these 4 calls into 2, e.g.

stale := append(append([]raft.ServerID{}, failed.StaleNonVoters...), failed.StaleVoters...)
toRemove = a.adjudicateRemoval(stale, vr)
if err = a.removeStaleServers(toRemove); err != nil {
	return err
}
vr.removeAll(toRemove)

and similarly for the failed servers. Up to you, it's also fine as you have it.

@peteski22 peteski22 requested a review from mkeeler November 2, 2022 19:13
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a handful of comments and marked them based on my opinion of them blocking approval:

⚠️ == blocking
🎨 == non-blocking refactoring/suggestion
❓ == non-blocking question

reconcile.go Outdated Show resolved Hide resolved
reconcile_test.go Outdated Show resolved Hide resolved
reconcile_test.go Outdated Show resolved Hide resolved
reconcile_test.go Show resolved Hide resolved
types.go Outdated Show resolved Hide resolved
reconcile.go Outdated
a.logger.Info("Attempting removal of failed server node", "id", srv.ID, "name", srv.Name, "address", srv.Address)
a.delegate.RemoveFailedServer(srv)
// Remove servers in order of increasing precedence
// Update the voter registry after each batch is processed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this for a while. There are tradeoffs here. If we remove stale first then we are moving things closer to the applications desired cluster config. However, this does involve messing with the raft config which is inherently a bit riskier than telling the application to remove a failed server. However, at least in Consul's usage, the request to remove the failed server will almost always result in the server being removed from the raft config very quickly after.

For Consul's case the only time when one of those operations will not result in a raft config update is when the failed server isn't actually in the raft config yet. This seems like an extreme edge case and not something we need to explicitly account for.

So in the end I think we should follow two rules:

  1. Deal with non-voters first as their removal shouldn't impact cluster stability.
  2. Handle stale before failed so as to be moving things towards the applications desired server set.

The code as implemented does follow those rules. It would be great if you could add a comment about the ordering somewhere in here though (which I should have done when originally writing the library)

@peteski22 peteski22 requested a review from mkeeler November 4, 2022 09:32
@peteski22 peteski22 marked this pull request as ready for review November 5, 2022 08:20
@peteski22 peteski22 merged commit d936f51 into master Nov 5, 2022
@peteski22 peteski22 deleted the VAULT-6815/respect-min-quorum branch November 5, 2022 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants