Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of shards limits decider #53577

Conversation

jasontedor
Copy link
Member

On clusters with a large number of shards, the shards limits allocation decider can exhibit poor performance leading to timeouts applying cluster state updates. This occurs because for every shard, we do a loop to count the number of shards on the node, and the number of shards for the index of the shard. This is roughly quadratic in the number of shards. This loop is not necessary, since we already have a O(1) method to count the number of non-relocating shards on a node, and with this commit we add some infrastructure to RoutingNode to make counting the number of shards per index O(1).

Closes #53559

On clusters with a large number of shards, the shards limits allocation
decider can exhibit poor performance leading to timeouts applying
cluster state updates. This occurs because for every shard, we do a loop
to count the number of shards on the node, and the number of shards for
the index of the shard. This is roughly quadratic in the number of
shards. This loop is not necessary, since we already have a O(1) method
to count the number of non-relocating shards on a node, and with this
commit we add some infrastructure to RoutingNode to make counting the
number of shards per index O(1).
@jasontedor jasontedor added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.0.0 v7.7.0 v7.6.2 v6.8.8 labels Mar 14, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Allocation)

@jimczi jimczi removed the v7.6.2 label Mar 18, 2020
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Left two minor and optional comments.

@jasontedor
Copy link
Member Author

@elasticmachine update branch

@jasontedor jasontedor merged commit ca7a135 into elastic:master Mar 19, 2020
jasontedor added a commit that referenced this pull request Mar 19, 2020
On clusters with a large number of shards, the shards limits allocation
decider can exhibit poor performance leading to timeouts applying
cluster state updates. This occurs because for every shard, we do a loop
to count the number of shards on the node, and the number of shards for
the index of the shard. This is roughly quadratic in the number of
shards. This loop is not necessary, since we already have a O(1) method
to count the number of non-relocating shards on a node, and with this
commit we add some infrastructure to RoutingNode to make counting the
number of shards per index O(1).
jasontedor added a commit that referenced this pull request Mar 19, 2020
On clusters with a large number of shards, the shards limits allocation
decider can exhibit poor performance leading to timeouts applying
cluster state updates. This occurs because for every shard, we do a loop
to count the number of shards on the node, and the number of shards for
the index of the shard. This is roughly quadratic in the number of
shards. This loop is not necessary, since we already have a O(1) method
to count the number of non-relocating shards on a node, and with this
commit we add some infrastructure to RoutingNode to make counting the
number of shards per index O(1).
@jasontedor jasontedor deleted the shard-limits-allocation-decider-performance branch March 19, 2020 01:00
@jasontedor jasontedor added v7.6.2 and removed v6.8.8 labels Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v7.6.2 v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster access to started shards count of the index in each node
5 participants