Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dedicated step for checking shrink allocation status #35161

Merged
merged 6 commits into from
Nov 5, 2018

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Nov 1, 2018

This adds a new step for checking whether an index is allocated correctly based
on the rules added prior to running the shrink step. It also fixes a bug where
for shrink we are not allowed to have the shards relocating for the shrink step.

This also allows us to simplify AllocationRoutedStep and provide better
feedback in the step info for why either the allocation or the shrink checks
have failed.

Resolves #34938

This adds a new step for checking whether an index is allocated correctly based
on the rules added prior to running the shrink step. It also fixes a bug where
for shrink we are not allowed to have the shards relocating for the shrink step.

Resolves elastic#34938
@dakrone dakrone added WIP :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Nov 1, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@dakrone dakrone removed the WIP label Nov 1, 2018
Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dakrone I left a comment

int allocationPendingAllShards = 0;

ImmutableOpenIntMap<IndexShardRoutingTable> allShards = clusterState.getRoutingTable().index(index).getShards();
for (ObjectCursor<IndexShardRoutingTable> shardRoutingTable : allShards.values()) {
int allocationPendingThisShard = 0;
int shardCopiesThisShard = shardRoutingTable.value.size();
for (ShardRouting shardRouting : shardRoutingTable.value.shards()) {
String currentNodeId = shardRouting.currentNodeId();
boolean canRemainOnCurrentNode = ALLOCATION_DECIDERS
.canRemain(shardRouting, clusterState.getRoutingNodes().node(currentNodeId), allocation)
.type() == Decision.Type.YES;
if (canRemainOnCurrentNode == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a check for the shard being started here too otherwise we will end up with the same issue on the allocate action but it will be harder to diagnose

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to address that in a separate PR, since I think it warrants more testing for the full ramifications, would that be okay with you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, could you create an issue so we can track the extra work and it doesn't get missed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, opened #35258

@dakrone dakrone changed the base branch from index-lifecycle to master November 2, 2018 21:42
Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when the issue for the extra work is created

int allocationPendingAllShards = 0;

ImmutableOpenIntMap<IndexShardRoutingTable> allShards = clusterState.getRoutingTable().index(index).getShards();
for (ObjectCursor<IndexShardRoutingTable> shardRoutingTable : allShards.values()) {
int allocationPendingThisShard = 0;
int shardCopiesThisShard = shardRoutingTable.value.size();
for (ShardRouting shardRouting : shardRoutingTable.value.shards()) {
String currentNodeId = shardRouting.currentNodeId();
boolean canRemainOnCurrentNode = ALLOCATION_DECIDERS
.canRemain(shardRouting, clusterState.getRoutingNodes().node(currentNodeId), allocation)
.type() == Decision.Type.YES;
if (canRemainOnCurrentNode == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, could you create an issue so we can track the extra work and it doesn't get missed?

@dakrone dakrone merged commit 3ee004c into elastic:master Nov 5, 2018
dakrone added a commit that referenced this pull request Nov 5, 2018
This adds a new step for checking whether an index is allocated correctly based
on the rules added prior to running the shrink step. It also fixes a bug where
for shrink we are not allowed to have the shards relocating for the shrink step.

This also allows us to simplify AllocationRoutedStep and provide better
feedback in the step info for why either the allocation or the shrink checks
have failed.

Resolves #34938
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Nov 6, 2018
This is a follow-up from elastic#35161 where we now check for started and relocating
state in `AllocationRoutedStep`.

Resolves elastic#35258
dakrone added a commit that referenced this pull request Nov 7, 2018
* [ILM] Check shard and relocation status in AllocationRoutedStep

This is a follow-up from #35161 where we now check for started and relocating
state in `AllocationRoutedStep`.

Resolves #35258
dakrone added a commit that referenced this pull request Nov 7, 2018
* [ILM] Check shard and relocation status in AllocationRoutedStep

This is a follow-up from #35161 where we now check for started and relocating
state in `AllocationRoutedStep`.

Resolves #35258
pgomulka pushed a commit to pgomulka/elasticsearch that referenced this pull request Nov 13, 2018
…tic#35316)

* [ILM] Check shard and relocation status in AllocationRoutedStep

This is a follow-up from elastic#35161 where we now check for started and relocating
state in `AllocationRoutedStep`.

Resolves elastic#35258
@dakrone dakrone deleted the ilm-fix-shrink-allocation-check branch February 4, 2019 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ILM shrink action runs when shards aren't allocated on the same node
4 participants