Detect leader via the delegate #7

vishalnayak · 2021-03-31T15:09:50Z

In Vault, autopilot relying on leader's address to detect the ID is opening up a failure mode.

It is possible for raft config and the running nodes to have different addresses. Vault hasn't yet done the piece where the addresses in the raft config gets updated (possibly by re-adding the existing nodes with updated addresses). The main reason for this is that adding a node is not a straight forward ritual and requires unseal keys et al.

Anyways, if the customers restart the node with a different cluster address, since autopilot expects the addresses to match, autopilot will then start erroring out and state API skips returning some servers.

To get around it, the delegate is optionally made to return a IsLeader as part of known servers.

This doesn't affect Consul since the old style leader detection is still in place if the detection via the delegate fails.

mkeeler

I think this should be fine but could you add a test similar to this one:

raft-autopilot/state_test.go

Line 84 in f273c7b

func TestGatherNextStateInputs(t *testing.T) {

mkeeler · 2021-04-13T12:55:11Z

On a meta note, if the raft config doesn't contain the updated address yet then how is raft working at all?

The addresses in the config are used to initiate replication, so it is possible that the leaders address doesn't have to be accurate but all the others will. You may want to consider then how the addresses can be kept in sync to prevent outages when nodes are restarted.

Also there is a distinction between general raft data which is stored via the LogStore and the raft configuration which is stored via the StableStore. I haven't thought it through much but does the stable store need to be guarded by the seal or would it be sufficient to only guard the log store?

mkeeler

All thats missing is the test then we could merge this. I still am not sure about the core use case of not being able to update the addresses in raft until after Vault is unsealed but I will defer to the vault team on the viability of the other approach.

vishalnayak · 2021-04-21T15:23:55Z

Currently when nodes are restarted, Vault expects the same address to be used for nodes that are in the raft config. This fix is for only when the addresses are attempted to be updated during a restart.

vishalnayak mentioned this pull request Mar 31, 2021

Autopilot: Return leader info via delegate hashicorp/vault#11247

Merged

vishalnayak requested a review from mkeeler March 31, 2021 15:38

mkeeler reviewed Apr 12, 2021

View reviewed changes

mkeeler requested changes Apr 13, 2021

View reviewed changes

vishalnayak added 2 commits April 23, 2021 10:52

Detect leader via the delegate

cc425d4

Add test

6114e57

vishalnayak force-pushed the detect-leader-via-delegate branch from c8dfcf4 to 6114e57 Compare April 23, 2021 14:53

vishalnayak added 2 commits April 23, 2021 10:54

go fmt

b7d34db

Add golden files

85ecb4f

mkeeler approved these changes Apr 23, 2021

View reviewed changes

vishalnayak merged commit 839ebcd into master Apr 23, 2021

vishalnayak deleted the detect-leader-via-delegate branch April 23, 2021 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect leader via the delegate #7

Detect leader via the delegate #7

vishalnayak commented Mar 31, 2021 •

edited

Loading

mkeeler left a comment

mkeeler commented Apr 13, 2021

mkeeler left a comment

vishalnayak commented Apr 21, 2021

Detect leader via the delegate #7

Detect leader via the delegate #7

Conversation

vishalnayak commented Mar 31, 2021 • edited Loading

mkeeler left a comment

Choose a reason for hiding this comment

mkeeler commented Apr 13, 2021

mkeeler left a comment

Choose a reason for hiding this comment

vishalnayak commented Apr 21, 2021

vishalnayak commented Mar 31, 2021 •

edited

Loading