-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vault 6815/respect min quorum #23
Conversation
…se logic to handle juding whether a server can be removed or not
Co-authored-by: Nick Cabatoff <[email protected]>
reconcile.go
Outdated
|
||
maxRemoval := (voters - 1) / 2 | ||
// remove failed non-voters | ||
failedNonVoters := vr.Filter(failed.FailedNonVoters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this filter call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to filter the []*Server
objects to only return the raft.ServerID
s of the ones we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which ones do we want? I'm confused because it's just ensuring that these nodes are present in the map, and I'm not sure why that's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the collection of server ID ([]raft.ServerID
), on FailedServers
object, the 'stale' fields, StaleNonVoters
and StaleVoters
are both just collections (slices) or the ID. But FailedNonVoters
and FailedVoters
are collections of server objects. So that filter is just a way to get the IDs, but ensuring the voter registry knows about them (as we pull voter eligibility info from the registry inside the adjudicate func).
reconcile.go
Outdated
a.logger.Info("Attempting removal of failed server node", "id", srv.ID, "name", srv.Name, "address", srv.Address) | ||
a.delegate.RemoveFailedServer(srv) | ||
// Remove servers in order of increasing precedence | ||
// Update the voter registry after each batch is processed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing the order here, e.g. before FailedNonVoters were considered first, now we're looking at them after the stale servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm asking is: what's the basis for the "precedence" ordering you're using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe (and may well be totally wrong) it should be that stale servers are the lowest 'value', as to the application those servers don't exist (in its view).
So first removing stale servers (non-voting and then voting) when possible, should be done before we attempt to remove failed servers (that the app does know about and could potentially become healthy again).
A failed non-voter is still (most likely) a potential voter, a stale voter or non-voter won't be.
Might be one which requires more thought/conversation :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's true in Vault's case at least. A stale server could simply be one whose echo message we haven't seen yet. So it might be stale now, and in 10s it won't be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vault would be the 'application' in this scenario, and Vault's chance to say which servers should/shouldn't be part of the raft config is via KnownServers
which it supplies. Naming both healthy and failed servers.
So if the server isn't in that collection of known servers, but is in the Raft configuration, then it's stale (from autopilot's perspective).. and needs to be removed.
I may have been in this code for too long on my own, but that was how it seemed to me to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here in Vault's code a server is either healthy (alive) or failed (left) https://github.com/hashicorp/vault/blob/main/physical/raft/raft_autopilot.go#L428. If the server isn't in the map of servers, it's stale to Autopilot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this for a while. There are tradeoffs here. If we remove stale first then we are moving things closer to the applications desired cluster config. However, this does involve messing with the raft config which is inherently a bit riskier than telling the application to remove a failed server. However, at least in Consul's usage, the request to remove the failed server will almost always result in the server being removed from the raft config very quickly after.
For Consul's case the only time when one of those operations will not result in a raft config update is when the failed server isn't actually in the raft config yet. This seems like an extreme edge case and not something we need to explicitly account for.
So in the end I think we should follow two rules:
- Deal with non-voters first as their removal shouldn't impact cluster stability.
- Handle stale before failed so as to be moving things towards the applications desired server set.
The code as implemented does follow those rules. It would be great if you could add a comment about the ordering somewhere in here though (which I should have done when originally writing the library)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the rules you've mentioned in a comment in pruneDeadServers
now, so hopefully the intent is documented to some extent.
// Update the voter registry after each batch is processed | ||
|
||
// remove stale non-voters | ||
toRemove := a.adjudicateRemoval(failed.StaleNonVoters, vr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in principle you could now collapse these 4 calls into 2, e.g.
stale := append(append([]raft.ServerID{}, failed.StaleNonVoters...), failed.StaleVoters...)
toRemove = a.adjudicateRemoval(stale, vr)
if err = a.removeStaleServers(toRemove); err != nil {
return err
}
vr.removeAll(toRemove)
and similarly for the failed servers. Up to you, it's also fine as you have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a handful of comments and marked them based on my opinion of them blocking approval:
🎨 == non-blocking refactoring/suggestion
❓ == non-blocking question
reconcile.go
Outdated
a.logger.Info("Attempting removal of failed server node", "id", srv.ID, "name", srv.Name, "address", srv.Address) | ||
a.delegate.RemoveFailedServer(srv) | ||
// Remove servers in order of increasing precedence | ||
// Update the voter registry after each batch is processed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this for a while. There are tradeoffs here. If we remove stale first then we are moving things closer to the applications desired cluster config. However, this does involve messing with the raft config which is inherently a bit riskier than telling the application to remove a failed server. However, at least in Consul's usage, the request to remove the failed server will almost always result in the server being removed from the raft config very quickly after.
For Consul's case the only time when one of those operations will not result in a raft config update is when the failed server isn't actually in the raft config yet. This seems like an extreme edge case and not something we need to explicitly account for.
So in the end I think we should follow two rules:
- Deal with non-voters first as their removal shouldn't impact cluster stability.
- Handle stale before failed so as to be moving things towards the applications desired server set.
The code as implemented does follow those rules. It would be great if you could add a comment about the ordering somewhere in here though (which I should have done when originally writing the library)
Great catch, thanks for that. Co-authored-by: Matt Keeler <[email protected]>
Co-authored-by: Matt Keeler <[email protected]>
Adjusted raft-autopilot to respect the minimum quorum setting regardless of what type of server it's dealing with (stale, failed, voters, non-voters).
Servers will be allowed to be removed in order of increasing precedence until the threshold is reached (stale-non-voters => stale=> voters, failed-non-voters => failed-voters).
The following has been added to the Promoter interface: PotentialVoterPredicate(NodeType) bool. This is to allow the application to make decisions about which types of node can be voters, as they may have added additional types to NodeType.
Finally, we should note that any downstream integrators of this repo may have work to do when moving to a new version due to the addition to the interface, however at the time of creation there were only HashiCorp repos consuming this and 2 external repos (https://pkg.go.dev/github.com/hashicorp/raft-autopilot?tab=importedby).
Related issues:
#17
hashicorp/vault#16919