Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put a fake allocation id on allocate stale primary command #34140

Conversation

vladimirdolzhenko
Copy link
Contributor

Put a fake allocation id on allocate stale primary command; remove it after recovery is done

Spin-off of #33432 (comment)

Eliminates case when the node after AllocateStalePrimary (or after recovery from a snapshot) crashes between shard initialization and shard recovery and allocation id is already adjusted.

Relates to #33432

@vladimirdolzhenko vladimirdolzhenko added >enhancement :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 labels Sep 28, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Vladimir Dolzhenko added 3 commits September 29, 2018 21:16
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. I also think we need some unit tests. Maybe here: AllocationCommandsTests

@@ -69,6 +69,13 @@ public void shardInitialized(ShardRouting unassignedShard, ShardRouting initiali
@Override
public void shardStarted(ShardRouting initializingShard, ShardRouting startedShard) {
addAllocationId(startedShard);
if (startedShard.primary()
// started shard has to have null recoverySource; have to pick up recoverySource from its initializing state
&& (initializingShard.recoverySource() instanceof RecoverySource.ExistingStoreRecoverySource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are snapshot and restore relevant here? I hope we can make this more explicit, depending on ExistingStoreRecoverySource#FORCE_STALE_PRIMARY_INSTANCE in some form or fashion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fbdf6d7, details here: #34140 (comment)

@@ -156,9 +164,12 @@ public MetaData applyChanges(MetaData oldMetaData, RoutingTable newRoutingTable)
// forcing an empty primary resets the in-sync allocations to the empty set (ShardRouting.allocatedPostIndexCreate)
indexMetaDataBuilder.putInSyncAllocationIds(shardId.id(), Collections.emptySet());
} else {
assert recoverySource instanceof RecoverySource.ExistingStoreRecoverySource
|| recoverySource instanceof RecoverySource.SnapshotRecoverySource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot/restore behaves like allocating a stale primary ( confirmed by @ywelsch )

Enforced check and extended comment at fbdf6d7

RecoverySource.ExistingStoreRecoverySource was introduced due to misleading by BalanceConfigurationTests test failure (fixed at d3df30c):

it used oversimplified allocator and it leads to inconsistent behaviour allocation decision (and after that inconsistent cluster state). On removing nodes some shards could be completely unassigned, in such cases PrimaryShardAllocator makes decision to to allocate that shard as NO_VALID_SHARD_COPY, while used in test NoopGatewayAllocator ignores that.

@@ -141,7 +141,8 @@ boolean validate(MetaData metaData) {

if (shardRouting.primary() && shardRouting.initializing() &&
shardRouting.recoverySource().getType() == RecoverySource.Type.EXISTING_STORE &&
inSyncAllocationIds.contains(shardRouting.allocationId().getId()) == false)
inSyncAllocationIds.contains(shardRouting.allocationId().getId()) == false &&
inSyncAllocationIds.contains(RecoverySource.ExistingStoreRecoverySource.FORCED_ALLOCATION_ID) == false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to go for hard equality here - the inSync set should be exactly the FORCE_ALLOCATION_ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in e331b1f

&& (initializingShard.recoverySource() instanceof RecoverySource.ExistingStoreRecoverySource
|| initializingShard.recoverySource() instanceof RecoverySource.SnapshotRecoverySource)) {
Updates updates = changes(startedShard.shardId());
updates.removedAllocationIds.add(RecoverySource.ExistingStoreRecoverySource.FORCED_ALLOCATION_ID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make sure this is the only one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in e331b1f I added assert check in another place where we have old and new allocation ids

@vladimirdolzhenko
Copy link
Contributor Author

extended AllocationCommandsTests with AllocateStalePrimaryAllocationCommand in 8b697ef

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimirdolzhenko sorry for the delay. I left some comments

@vladimirdolzhenko
Copy link
Contributor Author

run gradle build tests

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @vladimirdolzhenko, I left some more small comments

@vladimirdolzhenko
Copy link
Contributor Author

@bleskes I addressed your comments, could you please have another look ? Thanks

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for all the iterations.

@vladimirdolzhenko vladimirdolzhenko merged commit f789d49 into elastic:master Nov 7, 2018
@vladimirdolzhenko
Copy link
Contributor Author

Thanks @bleskes for the review and comments.

@vladimirdolzhenko vladimirdolzhenko deleted the forced_allocation_allocation_id branch November 7, 2018 19:18
vladimirdolzhenko added a commit that referenced this pull request Nov 7, 2018
removes fake allocation id after recovery is done

Relates to #33432

(cherry picked from commit f789d49)
jasontedor added a commit to martijnvg/elasticsearch that referenced this pull request Nov 8, 2018
* master: (24 commits)
  Replicate index settings to followers (elastic#35089)
  Rename RealmConfig.globalSettings() to settings() (elastic#35330)
  [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329)
  Scripting: Add back lookup vars in score script (elastic#34833)
  watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271)
  Remove ALL shard check in CheckShrinkReadyStep (elastic#35346)
  Use soft-deleted docs to resolve strategy for engine operation (elastic#35230)
  [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316)
  Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160)
  Add a frozen engine implementation (elastic#34357)
  Put a fake allocation id on allocate stale primary command (elastic#34140)
  [CCR] Enforce auto follow pattern name restrictions (elastic#35197)
  [ILM] rolling upgrade tests (elastic#35328)
  [ML] Add Missing data checking class (elastic#35310)
  Apply `ignore_throttled` also to concrete indices (elastic#35335)
  Make version field names more meaningful  (elastic#35334)
  [CCR] Added HLRC support for pause follow API (elastic#35216)
  [Docs] Improve Convert Processor description (elastic#35280)
  [Painless] Removes extraneous compile method (elastic#35323)
  [CCR] Fail with a better error if leader index is red (elastic#35298)
  ...
pgomulka pushed a commit to pgomulka/elasticsearch that referenced this pull request Nov 13, 2018
…4140)

removes fake allocation id after recovery is done

Relates to elastic#33432
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants