Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voting config exclusions should work with absent nodes #47990

Closed
DaveCTurner opened this issue Oct 14, 2019 · 8 comments · Fixed by #55673
Closed

Voting config exclusions should work with absent nodes #47990

DaveCTurner opened this issue Oct 14, 2019 · 8 comments · Fixed by #55673
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. help wanted adoptme

Comments

@DaveCTurner
Copy link
Contributor

Today the voting config exclusions API accepts node filters and resolves them to a collection of node IDs against the current cluster membership.

This is problematic since we may want to exclude nodes that are not currently members of the cluster. For instance:

  • if attempting to remove a flaky node from the cluster you cannot reliably exclude it from the voting configuration since it may not reliably be a member of the cluster

  • if cluster.auto_shrink_voting_configuration: false then naively shrinking the cluster will remove some nodes but will leaving their node IDs in the voting configuration. The only way to clean up the voting configuration is to grow the cluster back to its original size (potentially replacing some of the voting configuration) and then use the exclusions API.

I'm opening this to discuss an alternative API that accepts node names and node IDs (but not node filters in general) and which will exclude present and future matching nodes without requiring a matching node to be present right now.

@DaveCTurner DaveCTurner added >bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. team-discuss labels Oct 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

@sebgl
Copy link

sebgl commented Oct 14, 2019

elastic/cloud-on-k8s#1986 brings some context on how this impacts ECK.

@DaveCTurner
Copy link
Contributor Author

The things we would lose with such an API include:

  • validating that you didn't make a mistake and exclude a nonexistent node.
  • support for excluding nodes by node attributes, e.g. to exclude a whole rack at once.

@DaveCTurner
Copy link
Contributor Author

We discussed this last week and think we should do it, just based on node name and node ID.

@ywelsch ywelsch added the help wanted adoptme label Nov 25, 2019
@zacharymorn
Copy link
Contributor

Hi @DaveCTurner I can take a look at this one.

@DaveCTurner
Copy link
Contributor Author

Thanks @zacharymorn, go for it.

@zacharymorn
Copy link
Contributor

Hi @DaveCTurner ,

Sorry for the delay on this. I had created a PR based on my understanding.

One thing I did notice though is that when we start to allow absent node names and ids, this by definition would also allow incorrect names and ids as per my understanding there's no way to differentiate between the cases. Please let me know if this is expected.

Thanks,
Zach

@DaveCTurner
Copy link
Contributor Author

Thanks @zacharymorn, yes, as I noted above with this change we necessarily lose the ability to check that the nodes we're excluding aren't merely incorrect.

DaveCTurner pushed a commit that referenced this issue Apr 16, 2020
Today the voting config exclusions API accepts node filters and resolves them
to a collection of node IDs against the current cluster membership.

This is problematic since we may want to exclude nodes that are not currently
members of the cluster. For instance:

- if attempting to remove a flaky node from the cluster you cannot reliably
  exclude it from the voting configuration since it may not reliably be a
  member of the cluster

- if `cluster.auto_shrink_voting_configuration: false` then naively shrinking
  the cluster will remove some nodes but will leaving their node IDs in the
  voting configuration. The only way to clean up the voting configuration is to
  grow the cluster back to its original size (potentially replacing some of the
  voting configuration) and then use the exclusions API.

This commit adds an alternative API that accepts node names and node IDs but
not node filters in general, and deprecates the current node-filters-based API.

Relates #47990.
DaveCTurner pushed a commit to DaveCTurner/elasticsearch that referenced this issue Apr 16, 2020
Today the voting config exclusions API accepts node filters and resolves them
to a collection of node IDs against the current cluster membership.

This is problematic since we may want to exclude nodes that are not currently
members of the cluster. For instance:

- if attempting to remove a flaky node from the cluster you cannot reliably
  exclude it from the voting configuration since it may not reliably be a
  member of the cluster

- if `cluster.auto_shrink_voting_configuration: false` then naively shrinking
  the cluster will remove some nodes but will leaving their node IDs in the
  voting configuration. The only way to clean up the voting configuration is to
  grow the cluster back to its original size (potentially replacing some of the
  voting configuration) and then use the exclusions API.

This commit adds an alternative API that accepts node names and node IDs but
not node filters in general, and deprecates the current node-filters-based API.

Relates elastic#47990.
DaveCTurner added a commit that referenced this issue Apr 16, 2020
Today the voting config exclusions API accepts node filters and resolves them
to a collection of node IDs against the current cluster membership.

This is problematic since we may want to exclude nodes that are not currently
members of the cluster. For instance:

- if attempting to remove a flaky node from the cluster you cannot reliably
  exclude it from the voting configuration since it may not reliably be a
  member of the cluster

- if `cluster.auto_shrink_voting_configuration: false` then naively shrinking
  the cluster will remove some nodes but will leaving their node IDs in the
  voting configuration. The only way to clean up the voting configuration is to
  grow the cluster back to its original size (potentially replacing some of the
  voting configuration) and then use the exclusions API.

This commit adds an alternative API that accepts node names and node IDs but
not node filters in general, and deprecates the current node-filters-based API.

Relates #47990.
Backport of #50836 to 7.x.

Co-authored-by: zacharymorn <[email protected]>
DaveCTurner added a commit that referenced this issue Apr 24, 2020
The use of node filters for excluding nodes from the voting configuration was
deprecated in #50836; this commit removes support for node filters in this API.

Closes #47990
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. help wanted adoptme
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants