Vault 6815/respect min quorum #23

peteski22 · 2022-11-01T20:59:05Z

Adjusted raft-autopilot to respect the minimum quorum setting regardless of what type of server it's dealing with (stale, failed, voters, non-voters).

Servers will be allowed to be removed in order of increasing precedence until the threshold is reached (stale-non-voters => stale=> voters, failed-non-voters => failed-voters).

The following has been added to the Promoter interface: PotentialVoterPredicate(NodeType) bool. This is to allow the application to make decisions about which types of node can be voters, as they may have added additional types to NodeType.

Finally, we should note that any downstream integrators of this repo may have work to do when moving to a new version due to the addition to the interface, however at the time of creation there were only HashiCorp repos consuming this and 2 external repos (https://pkg.go.dev/github.com/hashicorp/raft-autopilot?tab=importedby).

Related issues:

#17
hashicorp/vault#16919

…se logic to handle juding whether a server can be removed or not

reconcile.go

types.go

reconcile.go

Co-authored-by: Nick Cabatoff <[email protected]>

ncabatoff · 2022-11-02T15:00:18Z

reconcile.go


-	maxRemoval := (voters - 1) / 2
+	// remove failed non-voters
+	failedNonVoters := vr.Filter(failed.FailedNonVoters)


What is the purpose of this filter call?

Just to filter the []*Server objects to only return the raft.ServerIDs of the ones we want.

Which ones do we want? I'm confused because it's just ensuring that these nodes are present in the map, and I'm not sure why that's needed.

We want the collection of server ID ([]raft.ServerID), on FailedServers object, the 'stale' fields, StaleNonVoters and StaleVoters are both just collections (slices) or the ID. But FailedNonVoters and FailedVoters are collections of server objects. So that filter is just a way to get the IDs, but ensuring the voter registry knows about them (as we pull voter eligibility info from the registry inside the adjudicate func).

reconcile.go

ncabatoff · 2022-11-02T15:02:59Z

reconcile.go

-		a.logger.Info("Attempting removal of failed server node", "id", srv.ID, "name", srv.Name, "address", srv.Address)
-		a.delegate.RemoveFailedServer(srv)
+	// Remove servers in order of increasing precedence
+	// Update the voter registry after each batch is processed


Why are we changing the order here, e.g. before FailedNonVoters were considered first, now we're looking at them after the stale servers?

I guess what I'm asking is: what's the basis for the "precedence" ordering you're using?

I believe (and may well be totally wrong) it should be that stale servers are the lowest 'value', as to the application those servers don't exist (in its view).

So first removing stale servers (non-voting and then voting) when possible, should be done before we attempt to remove failed servers (that the app does know about and could potentially become healthy again).

A failed non-voter is still (most likely) a potential voter, a stale voter or non-voter won't be.

Might be one which requires more thought/conversation :D

I'm not sure that's true in Vault's case at least. A stale server could simply be one whose echo message we haven't seen yet. So it might be stale now, and in 10s it won't be.

Vault would be the 'application' in this scenario, and Vault's chance to say which servers should/shouldn't be part of the raft config is via KnownServers which it supplies. Naming both healthy and failed servers.

So if the server isn't in that collection of known servers, but is in the Raft configuration, then it's stale (from autopilot's perspective).. and needs to be removed.

I may have been in this code for too long on my own, but that was how it seemed to me to work.

So here in Vault's code a server is either healthy (alive) or failed (left) https://github.com/hashicorp/vault/blob/main/physical/raft/raft_autopilot.go#L428. If the server isn't in the map of servers, it's stale to Autopilot.

I went back and forth on this for a while. There are tradeoffs here. If we remove stale first then we are moving things closer to the applications desired cluster config. However, this does involve messing with the raft config which is inherently a bit riskier than telling the application to remove a failed server. However, at least in Consul's usage, the request to remove the failed server will almost always result in the server being removed from the raft config very quickly after.

For Consul's case the only time when one of those operations will not result in a raft config update is when the failed server isn't actually in the raft config yet. This seems like an extreme edge case and not something we need to explicitly account for.

So in the end I think we should follow two rules:

Deal with non-voters first as their removal shouldn't impact cluster stability.

Handle stale before failed so as to be moving things towards the applications desired server set.

The code as implemented does follow those rules. It would be great if you could add a comment about the ordering somewhere in here though (which I should have done when originally writing the library)

I've added the rules you've mentioned in a comment in pruneDeadServers now, so hopefully the intent is documented to some extent.

ncabatoff · 2022-11-02T15:08:43Z

reconcile.go

+	// Update the voter registry after each batch is processed
+
+	// remove stale non-voters
+	toRemove := a.adjudicateRemoval(failed.StaleNonVoters, vr)


I guess in principle you could now collapse these 4 calls into 2, e.g.

stale := append(append([]raft.ServerID{}, failed.StaleNonVoters...), failed.StaleVoters...) toRemove = a.adjudicateRemoval(stale, vr) if err = a.removeStaleServers(toRemove); err != nil { return err } vr.removeAll(toRemove)

and similarly for the failed servers. Up to you, it's also fine as you have it.

mkeeler

I left a handful of comments and marked them based on my opinion of them blocking approval:

⚠️ == blocking
🎨 == non-blocking refactoring/suggestion
❓ == non-blocking question

reconcile.go

reconcile_test.go

types.go

mkeeler · 2022-11-03T14:10:36Z

reconcile.go

-		a.logger.Info("Attempting removal of failed server node", "id", srv.ID, "name", srv.Name, "address", srv.Address)
-		a.delegate.RemoveFailedServer(srv)
+	// Remove servers in order of increasing precedence
+	// Update the voter registry after each batch is processed


I went back and forth on this for a while. There are tradeoffs here. If we remove stale first then we are moving things closer to the applications desired cluster config. However, this does involve messing with the raft config which is inherently a bit riskier than telling the application to remove a failed server. However, at least in Consul's usage, the request to remove the failed server will almost always result in the server being removed from the raft config very quickly after.

For Consul's case the only time when one of those operations will not result in a raft config update is when the failed server isn't actually in the raft config yet. This seems like an extreme edge case and not something we need to explicitly account for.

So in the end I think we should follow two rules:

Deal with non-voters first as their removal shouldn't impact cluster stability.

Handle stale before failed so as to be moving things towards the applications desired server set.

The code as implemented does follow those rules. It would be great if you could add a comment about the ordering somewhere in here though (which I should have done when originally writing the library)

Great catch, thanks for that. Co-authored-by: Matt Keeler <[email protected]>

Co-authored-by: Matt Keeler <[email protected]>

peteski22 added 2 commits November 1, 2022 15:36

Fixed red squigglies on comments

7005cc3

Added additional map to track voter eligibility and tried to centrali…

874cbcc

…se logic to handle juding whether a server can be removed or not

ncabatoff reviewed Nov 2, 2022

View reviewed changes

reconcile.go Outdated Show resolved Hide resolved

types.go Outdated Show resolved Hide resolved

types.go Outdated Show resolved Hide resolved

reconcile.go Outdated Show resolved Hide resolved

reconcile.go Outdated Show resolved Hide resolved

Peter Wilson and others added 3 commits November 2, 2022 13:09

Update reconcile.go

32d3f81

Co-authored-by: Nick Cabatoff <[email protected]>

PR feedback updates

e4d643c

Removed call to delete on the underlying map during the adjudication

08bf949

ncabatoff reviewed Nov 2, 2022

View reviewed changes

reconcile.go Outdated Show resolved Hide resolved

Added test, made most things private for the package

e9f4b47

ncabatoff reviewed Nov 2, 2022

View reviewed changes

Remove spurious check for finding the voter eligibility info

7123c5c

ncabatoff reviewed Nov 2, 2022

View reviewed changes

peteski22 added 2 commits November 2, 2022 15:44

Bad refactor - corrected method name in test

0ef5700

Another test

1bfc302

peteski22 requested a review from mkeeler November 2, 2022 19:13

mkeeler requested changes Nov 3, 2022

View reviewed changes

Peter Wilson and others added 6 commits November 3, 2022 15:16

Update reconcile.go

029e90d

Great catch, thanks for that. Co-authored-by: Matt Keeler <[email protected]>

Update reconcile_test.go

cd5e495

Co-authored-by: Matt Keeler <[email protected]>

Tests for min quorum

dbaeaff

Fix added test

2ff6a18

Try to make it clear the rules we're basing the code on

e1db596

TIL: Variadic functions/ellipses in Go

47a597d

peteski22 requested a review from mkeeler November 4, 2022 09:32

mkeeler approved these changes Nov 4, 2022

View reviewed changes

peteski22 marked this pull request as ready for review November 5, 2022 08:20

peteski22 merged commit d936f51 into master Nov 5, 2022

peteski22 deleted the VAULT-6815/respect-min-quorum branch November 5, 2022 08:21

peteski22 mentioned this pull request Nov 8, 2022

updated raft-autopilot to v0.2.0 hashicorp/vault#17848

Merged

raskchanky mentioned this pull request Feb 24, 2023

Dead server cleanup should take into account potential future voters and min quorum #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vault 6815/respect min quorum #23

Vault 6815/respect min quorum #23

peteski22 commented Nov 1, 2022

ncabatoff Nov 2, 2022

peteski22 Nov 2, 2022

ncabatoff Nov 2, 2022

peteski22 Nov 2, 2022

ncabatoff Nov 2, 2022

ncabatoff Nov 2, 2022

peteski22 Nov 2, 2022

ncabatoff Nov 2, 2022

peteski22 Nov 2, 2022 •

edited

Loading

peteski22 Nov 2, 2022

mkeeler Nov 3, 2022

peteski22 Nov 3, 2022

ncabatoff Nov 2, 2022

mkeeler left a comment

mkeeler Nov 3, 2022

Vault 6815/respect min quorum #23

Vault 6815/respect min quorum #23

Conversation

peteski22 commented Nov 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peteski22 Nov 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkeeler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peteski22 Nov 2, 2022 •

edited

Loading