Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect leader via the delegate #7

Merged
merged 4 commits into from
Apr 23, 2021
Merged

Conversation

vishalnayak
Copy link
Contributor

@vishalnayak vishalnayak commented Mar 31, 2021

In Vault, autopilot relying on leader's address to detect the ID is opening up a failure mode.

It is possible for raft config and the running nodes to have different addresses. Vault hasn't yet done the piece where the addresses in the raft config gets updated (possibly by re-adding the existing nodes with updated addresses). The main reason for this is that adding a node is not a straight forward ritual and requires unseal keys et al.

Anyways, if the customers restart the node with a different cluster address, since autopilot expects the addresses to match, autopilot will then start erroring out and state API skips returning some servers.

To get around it, the delegate is optionally made to return a IsLeader as part of known servers.

This doesn't affect Consul since the old style leader detection is still in place if the detection via the delegate fails.

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be fine but could you add a test similar to this one:

func TestGatherNextStateInputs(t *testing.T) {

@mkeeler
Copy link
Member

mkeeler commented Apr 13, 2021

On a meta note, if the raft config doesn't contain the updated address yet then how is raft working at all?

The addresses in the config are used to initiate replication, so it is possible that the leaders address doesn't have to be accurate but all the others will. You may want to consider then how the addresses can be kept in sync to prevent outages when nodes are restarted.

Also there is a distinction between general raft data which is stored via the LogStore and the raft configuration which is stored via the StableStore. I haven't thought it through much but does the stable store need to be guarded by the seal or would it be sufficient to only guard the log store?

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All thats missing is the test then we could merge this. I still am not sure about the core use case of not being able to update the addresses in raft until after Vault is unsealed but I will defer to the vault team on the viability of the other approach.

@vishalnayak
Copy link
Contributor Author

Currently when nodes are restarted, Vault expects the same address to be used for nodes that are in the raft config. This fix is for only when the addresses are attempted to be updated during a restart.

@vishalnayak vishalnayak force-pushed the detect-leader-via-delegate branch from c8dfcf4 to 6114e57 Compare April 23, 2021 14:53
@vishalnayak vishalnayak merged commit 839ebcd into master Apr 23, 2021
@vishalnayak vishalnayak deleted the detect-leader-via-delegate branch April 23, 2021 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants