Waiting for all shards to be active after a cluster restart may never be possible for a shrink step #35321

dakrone · 2018-11-06T23:04:05Z

Consider the following scenario:

An index with at least 1 replica is just about to start its Shrink step, so it does the following:

sets the index to read-only
sets the index to be allocated only on node_id:123XYZ
waits for a copy of each shard on node_id:123XYZ
performs the shrink step
etc

If, after accomplishing step 2, but before step 3 is done, the user restarts the cluster, when the cluster comes back up, due to the allocation rule, the replicas for the index will not be allowed to be allocated because of the _id filtering performed in step 2. This leads the check in step 3 never to pass due to the check at:

elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifecycle/CheckShrinkReadyStep.java

Lines 56 to 60 in ec53288

    
           if (ActiveShardCount.ALL.enoughShardsActive(clusterState, index.getName()) == false) { 
        
               logger.debug("[{}] shrink action for [{}] cannot make progress because not all shards are active", 
        
                   getKey().getAction(), index.getName()); 
        
               return new Result(false, new CheckShrinkReadyStep.Info("", expectedShardCount, -1)); 
        
           }

And a perpetual error step op:

    "test-000039" : {
      "step" : "check-shrink-allocation",
      "step_time" : "2018-11-06T22:54:39.805Z",
      "step_time_millis" : 1541544879805,
      "step_info" : {
        "message" : "Waiting for all shards to become active",
        "node_id" : "",
        "shards_left_to_allocate" : -1,
        "expected_shards" : 2
      }
    },

Since shrink does not require all copies of the shard to be active, we should remove this check

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-11-06T23:04:06Z

Pinging @elastic/es-core-infra

Since it's still possible to shrink an index when replicas are unassigned, we should not check that all copies are available when performing the shrink, since we set the allocation requirement for a single node. Resolves elastic#35321

Since it's still possible to shrink an index when replicas are unassigned, we should not check that all copies are available when performing the shrink, since we set the allocation requirement for a single node. Resolves #35321

Since it's still possible to shrink an index when replicas are unassigned, we should not check that all copies are available when performing the shrink, since we set the allocation requirement for a single node. Resolves elastic#35321

dakrone added >bug :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Nov 6, 2018

dakrone self-assigned this Nov 6, 2018

dakrone mentioned this issue Nov 7, 2018

Remove ALL shard check in CheckShrinkReadyStep #35346

Merged

dakrone closed this as completed in #35346 Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waiting for all shards to be active after a cluster restart may never be possible for a shrink step #35321

Waiting for all shards to be active after a cluster restart may never be possible for a shrink step #35321

dakrone commented Nov 6, 2018

elasticmachine commented Nov 6, 2018

Waiting for all shards to be active after a cluster restart may never be possible for a shrink step #35321

Waiting for all shards to be active after a cluster restart may never be possible for a shrink step #35321

Comments

dakrone commented Nov 6, 2018

elasticmachine commented Nov 6, 2018