-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use soft-deleted docs to resolve strategy for engine operation #35230
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue (as far as I understand) is that by (only) restoring the local checkpoint tracker, we're not able to just replay history from the translog for recovery purposes. The situation which is not properly handled is when there are out-of-order deletes in the safe commit. In particular, the compareOpToLuceneDocBasedOnSeqNo
logic does not take those (soft-)deletes into account while the newly restored local checkpoint tracker does. Under normal (replica) mode of a shard, out-of-order deletes are handled through the version map: We keep track of all deletes that have happened above the local checkpoint, so that when an (older) indexing operation arrives, we can safely process it as a stale op.
Instead of giving up on the (IMO much cleaner) approach of restoring the state from the safe index commit, we just need to extend this to what's actually required for the replication logic to properly work. One approach would be to restore the out-of-order deletes into the version map. Another approach would be to ensure that the version map is truly only needed for stuff that's not refreshed yet. The way to fix that would be to make the "load from index" in compareOpToLuceneDocBasedOnSeqNo
aware of the soft-deletes.
@dnhatn I want to make sure I understand this properly - this only happens when we have soft deletes enabled, correct? If so, +1 to what Yannick said. The out of order delivery during translog replay is not detected because the processing of the delete from the translog is shortcutted by the local checkpoint optimization (the bug you described) or by the seq# deduplication logic in the following engine (if the local checkpoint is still low). In both cases the source of the problem IMO is that we only restored part of the engine state - i.e., the local checkpoint tracker but left delete knowledge (currently in the version map tombstone) incomplete. Assuming this is a soft delete related issue, I agree that we should proceed to implement the natural next step we were planning anyway[1] and make out of order detection delivery soft deletes aware. [1] We are heading towards making the version map only capturing the pending ops in lucene. Nothing more. |
@ywelsch and @bleskes Thanks for looking. You both understand the problem correctly.
Yes, this only happens with soft-deleted enabled.
I will implement this option since "we are heading towards making the version map only capturing the pending ops in Lucene." which eliminates the former approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great but I left some comments.
result = new DocIdAndSeqNo(docID, seqNo, context, isDeleted); | ||
} else if (result.seqNo == seqNo && (result.isDeleted || isDeleted == false)) { | ||
// the extra guard "result.isDeleted || isDeleted == false" is to make sure that we pick a live doc instead of | ||
// a deleted doc in case the same index operation was indexed twice - one as live and another as soft-deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when can this happen exactly?
} else { | ||
seqNo = SequenceNumbers.UNASSIGNED_SEQ_NO; | ||
} | ||
final boolean isDeleted = (liveDocs != null && liveDocs.get(docID) == false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have an optimization (and maybe test it somehow) that when we hit a live doc we stop and return. This is important to stop iterating quickly when people are updating an existing doc (which is a comment pattern imo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed lets return if we hit a live doc and otherwise use the highest seqId. then we don't have the extraguard below or it's implicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
* <li>a doc ID and the associated seqNo otherwise | ||
* </ul> | ||
* Loads the internal docId and sequence number of the latest copy for a given uid from the provided reader. | ||
* The flag {@link DocIdAndSeqNo#isDeleted} indicates whether the returned document is soft(deleted) or live. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want (soft)deleted
} | ||
} | ||
engine.refresh("test"); | ||
try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test will be much stronger if you do this also after every refresh (while maintaining last known sec no per id in a separate map.
} else { | ||
seqNo = SequenceNumbers.UNASSIGNED_SEQ_NO; | ||
} | ||
final boolean isDeleted = (liveDocs != null && liveDocs.get(docID) == false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed lets return if we hit a live doc and otherwise use the highest seqId. then we don't have the extraguard below or it's implicit.
|
||
DocIdAndSeqNo(int docId, long seqNo, LeafReaderContext context) { | ||
DocIdAndSeqNo(int docId, long seqNo, LeafReaderContext context, boolean isDeleted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call it isLive
we usually mark docs as live not deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// The live document must always be the latest copy, thus we can early terminate here. | ||
// If a nested docs is live, we return the first doc which doesn't have term (only the last doc has term). | ||
// This should not be an issue since we no longer use primary term as tier breaker when comparing operations. | ||
return new DocIdAndSeqNo(docID, seqNo, context, isLive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked the assertion you had there that if already have a result, this one has a higher seq no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I pushed 33675c4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 2
A CCR test failure shows that the approach in #34474 is flawed. Restoring the LocalCheckpointTracker from an index commit can cause both FollowingEngine and InternalEngine to incorrectly ignore some deletes. Here is a small scenario illustrating the problem: 1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene 2. Flush a commit consisting of only the delete tombstone 3. Index doc with seq=0 => engine will add that doc to Lucene but soft-deleted 4. Restart an engine with the commit (step 2); the engine will fill its LocalCheckpointTracker with the delete tombstone in the commit 5. Replay the local translog in reverse order: index#0 then delete#1 6. When process index#0, an engine will add it into Lucene as a live doc and advance the local checkpoint to 1 (seq#1 was restored from the commit - step 4). 7. When process delete#1, an engine will skip it because seq_no=1 is less than or equal to the local checkpoint. We should have zero document after recovering from translog, but here we have one. Since all operations after the local checkpoint of the safe commit are retained, we should find them if the look-up considers also soft-deleted documents. This PR fills the disparity between the version map and the local checkpoint tracker by taking soft-deleted documents into account while resolving strategy for engine operations. Relates #34474 Relates #33656
A CCR test failure shows that the approach in #34474 is flawed. Restoring the LocalCheckpointTracker from an index commit can cause both FollowingEngine and InternalEngine to incorrectly ignore some deletes. Here is a small scenario illustrating the problem: 1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene 2. Flush a commit consisting of only the delete tombstone 3. Index doc with seq=0 => engine will add that doc to Lucene but soft-deleted 4. Restart an engine with the commit (step 2); the engine will fill its LocalCheckpointTracker with the delete tombstone in the commit 5. Replay the local translog in reverse order: index#0 then delete#1 6. When process index#0, an engine will add it into Lucene as a live doc and advance the local checkpoint to 1 (seq#1 was restored from the commit - step 4). 7. When process delete#1, an engine will skip it because seq_no=1 is less than or equal to the local checkpoint. We should have zero document after recovering from translog, but here we have one. Since all operations after the local checkpoint of the safe commit are retained, we should find them if the look-up considers also soft-deleted documents. This PR fills the disparity between the version map and the local checkpoint tracker by taking soft-deleted documents into account while resolving strategy for engine operations. Relates #34474 Relates #33656
* master: (24 commits) Replicate index settings to followers (elastic#35089) Rename RealmConfig.globalSettings() to settings() (elastic#35330) [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329) Scripting: Add back lookup vars in score script (elastic#34833) watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271) Remove ALL shard check in CheckShrinkReadyStep (elastic#35346) Use soft-deleted docs to resolve strategy for engine operation (elastic#35230) [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316) Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160) Add a frozen engine implementation (elastic#34357) Put a fake allocation id on allocate stale primary command (elastic#34140) [CCR] Enforce auto follow pattern name restrictions (elastic#35197) [ILM] rolling upgrade tests (elastic#35328) [ML] Add Missing data checking class (elastic#35310) Apply `ignore_throttled` also to concrete indices (elastic#35335) Make version field names more meaningful (elastic#35334) [CCR] Added HLRC support for pause follow API (elastic#35216) [Docs] Improve Convert Processor description (elastic#35280) [Painless] Removes extraneous compile method (elastic#35323) [CCR] Fail with a better error if leader index is red (elastic#35298) ...
* elastic/master: (25 commits) Fixes fast vector highlighter docs per issue 24318. (elastic#34190) [ML] Prevent notifications on deletion of a non existent job (elastic#35337) [CCR] Auto follow Coordinator fetch cluster state in system context (elastic#35120) Mute test for elastic#35361 Preserve `date_histogram` format when aggregating on unmapped fields (elastic#35254) Test: Mute failing SSL test Allow unmapped fields in composite aggregations (elastic#35331) [RCI] Add IndexShardOperationPermits.asyncBlockOperations(ActionListener<Releasable>) (elastic#34902) HLRC: reindex API with wait_for_completion false (elastic#35202) Add docs on JNA temp directory not being noexec (elastic#35355) [CCR] Adjust list of dynamic index settings that should be replicated (elastic#35195) Replicate index settings to followers (elastic#35089) Rename RealmConfig.globalSettings() to settings() (elastic#35330) [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329) Scripting: Add back lookup vars in score script (elastic#34833) watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271) Remove ALL shard check in CheckShrinkReadyStep (elastic#35346) Use soft-deleted docs to resolve strategy for engine operation (elastic#35230) [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316) Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160) ...
…ic#35230) A CCR test failure shows that the approach in elastic#34474 is flawed. Restoring the LocalCheckpointTracker from an index commit can cause both FollowingEngine and InternalEngine to incorrectly ignore some deletes. Here is a small scenario illustrating the problem: 1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene 2. Flush a commit consisting of only the delete tombstone 3. Index doc with seq=0 => engine will add that doc to Lucene but soft-deleted 4. Restart an engine with the commit (step 2); the engine will fill its LocalCheckpointTracker with the delete tombstone in the commit 5. Replay the local translog in reverse order: index#0 then delete#1 6. When process index#0, an engine will add it into Lucene as a live doc and advance the local checkpoint to 1 (seq#1 was restored from the commit - step 4). 7. When process delete#1, an engine will skip it because seq_no=1 is less than or equal to the local checkpoint. We should have zero document after recovering from translog, but here we have one. Since all operations after the local checkpoint of the safe commit are retained, we should find them if the look-up considers also soft-deleted documents. This PR fills the disparity between the version map and the local checkpoint tracker by taking soft-deleted documents into account while resolving strategy for engine operations. Relates elastic#34474 Relates elastic#33656
This PR reverts #35230. Previously, we reply on soft-deletes to fill the mismatch between the version map and the Lucene index. This is no longer needed after #43202 where we rebuild the version map when opening an engine. Moreover, PrunePostingsMergePolicy can prune _id of soft-deleted documents out of order; thus the lookup result including soft-deletes sometimes does not return the latest version (although it's okay as we only use a valid result in an engine). With this change, we use only live documents in Lucene to resolve the indexing strategy. This is perfectly safe since we keep all deleted documents after the local checkpoint in the version map. Closes #42979
This PR reverts #35230. Previously, we reply on soft-deletes to fill the mismatch between the version map and the Lucene index. This is no longer needed after #43202 where we rebuild the version map when opening an engine. Moreover, PrunePostingsMergePolicy can prune _id of soft-deleted documents out of order; thus the lookup result including soft-deletes sometimes does not return the latest version (although it's okay as we only use a valid result in an engine). With this change, we use only live documents in Lucene to resolve the indexing strategy. This is perfectly safe since we keep all deleted documents after the local checkpoint in the version map. Closes #42979
A CCR test failure shows that the approach in #34474 is flawed. Restoring the LocalCheckpointTracker from an index commit can cause both FollowingEngine and InternalEngine to incorrectly ignore some delete operations.
Here is a small scenario to illustrate the problem:
We should have zero document after recovering from translog, but here we have one.
Since all operations after the local checkpoint of the safe commit are retained, we should find them if the look-up considers also soft-deleted documents. This PR fills the disparity between the version map and the local checkpoint tracker by taking soft-deleted documents into account while resolving strategy for engine operations.
Relates #34474
Relates #33656
/cc @jasontedor and @martijnvg