-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce retention leases versioning #37951
Introduce retention leases versioning #37951
Conversation
Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jasontedor for a quick response :). I left a comment to discuss.
assert primaryMode == false; | ||
this.retentionLeases.clear(); | ||
this.retentionLeases.putAll(retentionLeases.stream().collect(Collectors.toMap(RetentionLease::id, Function.identity()))); | ||
if (retentionLeases.version() > version) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have an issue with the primary fail-over here. Suppose the primary is syncing two new leases requests L1 and L2, both of them are processed on replica-1, but the primary crashes before any of them arrives on the replica-2. If the replica-2 is promoted, then we add two new leases: L3 and L4. Unfortunately, replica-1 will ignore these two new requests, and its leases (L1 and L2) are different from the leases on the new primary (L3 and L4). I think we can use the primary-term with the leases version to have a total ordering pair <<term, version>>. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we could find a simpler way, but I can't. +1 from me on the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also looked for a simpler way, and @ywelsch did too. I could not find anything, and neither could @ywelsch. We synced earlier and agreed that this looks like the only way forward. We need to do a refactoring to the replication tracker and index shard to push the operation primary term down into the tracker, and then we will proceed with updating this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. great catch @dnhatn .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dnhatn is going to work on adding the retention lease infrastructure to index level replication test case infrastructure so that we can test these scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dnhatn I have updated this PR to include the primary term here. I want to add some additional testing, but this PR is ready for another review round from you.
* master: (100 commits) Push primary term to replication tracker (elastic#38044) Introduce ability to minimize round-trips in CCS (elastic#37828) Don't Assert Ack on when Publish Timeout is 0 in Test (elastic#38077) Reduce object creation in Rounding class (elastic#38061) Treat put-mapping calls with `_doc` as a top-level key as typed calls. (elastic#38032) Fix test bug when testing the merging of mappings and templates. (elastic#38021) spelling: java script -- not JavaScript (elastic#37057) Enable SSL in reindex with security QA tests (elastic#37600) Disable BWC tests during backport (elastic#38074) SQL: Added SSL configuration options tests (elastic#37875) Minor fixes in the release notes script. (elastic#37967) Fix typo in docs. (elastic#38018) Update Lucene repo for 7.0.0-alpha2 (elastic#37985) Fix size of rolling-upgrade bootstrap config (elastic#38031) fix DateIndexNameProcessorTests offset pattern (elastic#38069) Speed up converting of temporal accessor to zoned date time (elastic#37915) Work around JDK8 timezone bug in tests (elastic#37968) Correct arg names when update mapping/settings from leader (elastic#38063) Introduce ssl settings to reindex from remote (elastic#37527) Mute testRetentionLeasesSyncOnExpiration ...
…ersion * elastic/master: Do not set up NodeAndClusterIdStateListener in test (elastic#38110) ML: better handle task state race condition (elastic#38040) Soft-deletes policy should always fetch latest leases (elastic#37940) Handle scheduler exceptions (elastic#38014) Minor logging improvements (elastic#38084) Fix Painless void return bug (elastic#38046) Update PutFollowAction serialization post-backport (elastic#37989) fix a few versionAdded values in ElasticsearchExceptions (elastic#37877) Reenable BWC tests after backport of elastic#37899 (elastic#38093) Mute failing test Mute failing test Fail start on obsolete indices documentation (elastic#37786) SQL: Implement FIRST/LAST aggregate functions (elastic#37936) Un-mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock remove unused parser fields in RemoteResponseParsers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments. This PR is really good already. Thanks @jasontedor.
server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/seqno/RetentionLeases.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/seqno/RetentionLeases.java
Outdated
Show resolved
Hide resolved
…TrackerRetentionLeaseTests.java
@elasticmachine test this please |
@elasticmachine test this please |
* master: (36 commits) Ensure joda compatibility in custom date formats (elastic#38171) Do not compute cardinality if the `terms` execution mode does not use `global_ordinals` (elastic#38169) Do not set timeout for IndexRequests in GatewayIndexStateIT (elastic#38147) Zen2ify testMasterFailoverDuringIndexingWithMappingChanges (elastic#38178) SQL: [Docs] Add limitation for aggregate functions on scalars (elastic#38186) Add elasticsearch-node detach-cluster command (elastic#37979) Add tests for fractional epoch parsing (elastic#38162) Enable bw tests for elastic#37871 and elastic#38032. (elastic#38167) Clear send behavior rule in CloseWhileRelocatingShardsIT (elastic#38159) Fix testCorruptedIndex (elastic#38161) Add finalReduce flag to SearchRequest (elastic#38104) Forbid negative field boosts in analyzed queries (elastic#37930) Remove AtomiFieldData#getLegacyFieldValues (elastic#38087) Universal cluster bootstrap method for tests with autoMinMasterNodes=false (elastic#38038) Fix FullClusterRestartIT.testHistoryUUIDIsAdded (elastic#38098) Replace joda time in ingest-common module (elastic#38088) Fix eclipse config for ssl-config (elastic#38096) Don't load global ordinals with the `map` execution_hint (elastic#37833) Relax fault detector in some disruption tests (elastic#38101) Fix java time epoch date formatters (elastic#37829) ...
* master: Replace awaitBusy with assertBusy in atLeastDocsIndexed (elastic#38190) Adjust SearchRequest version checks (elastic#38181) AwaitsFix testClientSucceedsWithVerificationDisabled (elastic#38213) Zen2ify RareClusterStateIT (elastic#38184) ML: Fix error race condition on stop _all datafeeds and close _all jobs (elastic#38113) AwaitsFix PUT mapping with _doc on an index that has types (elastic#38204) Allow built-in monitoring_user role to call GET _xpack API (elastic#38060) Update geo_shape docs to include unsupported features (elastic#38138) [ML] Remove "8" prefixes from file structure finder timestamp formats (elastic#38016) Disable bwc tests while backporting elastic#38104 (elastic#38182) Enable TLSv1.3 by default for JDKs with support (elastic#38103) Fix _host based require filters (elastic#38173) RestoreService should update primary terms when restoring shards of existing indices (elastic#38177) Throw if two inner_hits have the same name (elastic#37645)
…ersion * elastic/master: SnapshotShardsService Simplifications (elastic#38025) Default include_type_name to false in the yml test harness. (elastic#38058) Disable bwc preparing to backport of#37977, elastic#37857 and elastic#37872 (elastic#38126) Adding ml_settings entry to HLRC and Docs for deprecation_info (elastic#38118)
* elastic/master: Introduce retention leases versioning (elastic#37951) Correctly disable tests for FIPS JVMs (elastic#38214)
* elastic/master: (54 commits) Introduce retention leases versioning (elastic#37951) Correctly disable tests for FIPS JVMs (elastic#38214) AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (elastic#38227) Preserve ILM operation mode when creating new lifecycles (elastic#38134) Enable trace log in FollowerFailOverIT (elastic#38148) SnapshotShardsService Simplifications (elastic#38025) Default include_type_name to false in the yml test harness. (elastic#38058) Disable bwc preparing to backport of#37977, elastic#37857 and elastic#37872 (elastic#38126) Adding ml_settings entry to HLRC and Docs for deprecation_info (elastic#38118) Replace awaitBusy with assertBusy in atLeastDocsIndexed (elastic#38190) Adjust SearchRequest version checks (elastic#38181) AwaitsFix testClientSucceedsWithVerificationDisabled (elastic#38213) Zen2ify RareClusterStateIT (elastic#38184) ML: Fix error race condition on stop _all datafeeds and close _all jobs (elastic#38113) AwaitsFix PUT mapping with _doc on an index that has types (elastic#38204) Allow built-in monitoring_user role to call GET _xpack API (elastic#38060) Update geo_shape docs to include unsupported features (elastic#38138) [ML] Remove "8" prefixes from file structure finder timestamp formats (elastic#38016) Disable bwc tests while backporting elastic#38104 (elastic#38182) Enable TLSv1.3 by default for JDKs with support (elastic#38103) ...
We should increase primary term before renewing leases; otherwise, the term of the latest RetentionLeases will be lower than the current term. Relates #37951
If the innerLength is 0, the version won't be increased; then there will be two RetentionLeases with the same term and version, but their leases are different. Relates elastic#37951 Closes elastic#38245
Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.
Relates #37165