-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put a fake allocation id on allocate stale primary command #34140
Put a fake allocation id on allocate stale primary command #34140
Conversation
… after recovery is done Relates to elastic#33432
Pinging @elastic/es-distributed |
…fore recovery is done (historyUUID is adjusted):
…cation_allocation_id
…efore recovery is done (historyUUID is adjusted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. I also think we need some unit tests. Maybe here: AllocationCommandsTests
@@ -69,6 +69,13 @@ public void shardInitialized(ShardRouting unassignedShard, ShardRouting initiali | |||
@Override | |||
public void shardStarted(ShardRouting initializingShard, ShardRouting startedShard) { | |||
addAllocationId(startedShard); | |||
if (startedShard.primary() | |||
// started shard has to have null recoverySource; have to pick up recoverySource from its initializing state | |||
&& (initializingShard.recoverySource() instanceof RecoverySource.ExistingStoreRecoverySource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are snapshot and restore relevant here? I hope we can make this more explicit, depending on ExistingStoreRecoverySource#FORCE_STALE_PRIMARY_INSTANCE
in some form or fashion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fbdf6d7, details here: #34140 (comment)
@@ -156,9 +164,12 @@ public MetaData applyChanges(MetaData oldMetaData, RoutingTable newRoutingTable) | |||
// forcing an empty primary resets the in-sync allocations to the empty set (ShardRouting.allocatedPostIndexCreate) | |||
indexMetaDataBuilder.putInSyncAllocationIds(shardId.id(), Collections.emptySet()); | |||
} else { | |||
assert recoverySource instanceof RecoverySource.ExistingStoreRecoverySource | |||
|| recoverySource instanceof RecoverySource.SnapshotRecoverySource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshot/restore behaves like allocating a stale primary ( confirmed by @ywelsch )
Enforced check and extended comment at fbdf6d7
RecoverySource.ExistingStoreRecoverySource was introduced due to misleading by BalanceConfigurationTests test failure (fixed at d3df30c):
it used oversimplified allocator and it leads to inconsistent behaviour allocation decision (and after that inconsistent cluster state). On removing nodes some shards could be completely unassigned, in such cases PrimaryShardAllocator makes decision to to allocate that shard as NO_VALID_SHARD_COPY, while used in test NoopGatewayAllocator ignores that.
@@ -141,7 +141,8 @@ boolean validate(MetaData metaData) { | |||
|
|||
if (shardRouting.primary() && shardRouting.initializing() && | |||
shardRouting.recoverySource().getType() == RecoverySource.Type.EXISTING_STORE && | |||
inSyncAllocationIds.contains(shardRouting.allocationId().getId()) == false) | |||
inSyncAllocationIds.contains(shardRouting.allocationId().getId()) == false && | |||
inSyncAllocationIds.contains(RecoverySource.ExistingStoreRecoverySource.FORCED_ALLOCATION_ID) == false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to go for hard equality here - the inSync set should be exactly the FORCE_ALLOCATION_ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in e331b1f
&& (initializingShard.recoverySource() instanceof RecoverySource.ExistingStoreRecoverySource | ||
|| initializingShard.recoverySource() instanceof RecoverySource.SnapshotRecoverySource)) { | ||
Updates updates = changes(startedShard.shardId()); | ||
updates.removedAllocationIds.add(RecoverySource.ExistingStoreRecoverySource.FORCED_ALLOCATION_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make sure this is the only one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in e331b1f I added assert check in another place where we have old and new allocation ids
…g fake allocation id
…NoopGatewayAllocator that generated inconsistent cluster state
…y FORCE_STALE_PRIMARY_INSTANCE
…cation_allocation_id
extended AllocationCommandsTests with AllocateStalePrimaryAllocationCommand in 8b697ef |
…cation_allocation_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladimirdolzhenko sorry for the delay. I left some comments
server/src/main/java/org/elasticsearch/cluster/routing/IndexRoutingTable.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/RecoverySource.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/allocation/AllocationCommandsTests.java
Show resolved
Hide resolved
...er/src/test/java/org/elasticsearch/cluster/routing/allocation/BalanceConfigurationTests.java
Show resolved
Hide resolved
…) equal to startedShard.allocationId()
…the fake allocation id
…cation_allocation_id
run gradle build tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @vladimirdolzhenko, I left some more small comments
server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java
Outdated
Show resolved
Hide resolved
… node name instead of node id
@bleskes I addressed your comments, could you please have another look ? Thanks |
…cation_allocation_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for all the iterations.
Thanks @bleskes for the review and comments. |
* master: (24 commits) Replicate index settings to followers (elastic#35089) Rename RealmConfig.globalSettings() to settings() (elastic#35330) [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329) Scripting: Add back lookup vars in score script (elastic#34833) watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271) Remove ALL shard check in CheckShrinkReadyStep (elastic#35346) Use soft-deleted docs to resolve strategy for engine operation (elastic#35230) [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316) Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160) Add a frozen engine implementation (elastic#34357) Put a fake allocation id on allocate stale primary command (elastic#34140) [CCR] Enforce auto follow pattern name restrictions (elastic#35197) [ILM] rolling upgrade tests (elastic#35328) [ML] Add Missing data checking class (elastic#35310) Apply `ignore_throttled` also to concrete indices (elastic#35335) Make version field names more meaningful (elastic#35334) [CCR] Added HLRC support for pause follow API (elastic#35216) [Docs] Improve Convert Processor description (elastic#35280) [Painless] Removes extraneous compile method (elastic#35323) [CCR] Fail with a better error if leader index is red (elastic#35298) ...
…4140) removes fake allocation id after recovery is done Relates to elastic#33432
Put a fake allocation id on allocate stale primary command; remove it after recovery is done
Spin-off of #33432 (comment)
Eliminates case when the node after AllocateStalePrimary (or after recovery from a snapshot) crashes between shard initialization and shard recovery and allocation id is already adjusted.
Relates to #33432