Use soft-deleted docs to resolve strategy for engine operation #35230

dnhatn · 2018-11-03T19:29:06Z

A CCR test failure shows that the approach in #34474 is flawed. Restoring the LocalCheckpointTracker from an index commit can cause both FollowingEngine and InternalEngine to incorrectly ignore some delete operations.

Here is a small scenario to illustrate the problem:

Delete doc with seq=1 => engine will add a delete tombstone to Lucene
Flush a commit consisting of only the delete tombstone
Index doc with seq=0 => engine will add that doc to Lucene but soft-deleted
Restart an engine with the commit (step 2); the engine will fill its LocalCheckpointTracker with the delete tombstone in the commit
Replay the local translog in reverse order: index#0 then delete#1
When process index#0, an engine will add it into Lucene as a live doc and advance the local checkpoint to 1 (seq#1 was restored from the commit - step 4).
When process delete#1, an engine will skip it because seq_no=1 is less than or equal to the local checkpoint.

We should have zero document after recovering from translog, but here we have one.

Since all operations after the local checkpoint of the safe commit are retained, we should find them if the look-up considers also soft-deleted documents. This PR fills the disparity between the version map and the local checkpoint tracker by taking soft-deleted documents into account while resolving strategy for engine operations.

Relates #34474
Relates #33656

/cc @jasontedor and @martijnvg

elasticmachine · 2018-11-03T19:29:07Z

Pinging @elastic/es-distributed

ywelsch

The issue (as far as I understand) is that by (only) restoring the local checkpoint tracker, we're not able to just replay history from the translog for recovery purposes. The situation which is not properly handled is when there are out-of-order deletes in the safe commit. In particular, the compareOpToLuceneDocBasedOnSeqNo logic does not take those (soft-)deletes into account while the newly restored local checkpoint tracker does. Under normal (replica) mode of a shard, out-of-order deletes are handled through the version map: We keep track of all deletes that have happened above the local checkpoint, so that when an (older) indexing operation arrives, we can safely process it as a stale op.
Instead of giving up on the (IMO much cleaner) approach of restoring the state from the safe index commit, we just need to extend this to what's actually required for the replication logic to properly work. One approach would be to restore the out-of-order deletes into the version map. Another approach would be to ensure that the version map is truly only needed for stuff that's not refreshed yet. The way to fix that would be to make the "load from index" in compareOpToLuceneDocBasedOnSeqNo aware of the soft-deletes.

bleskes · 2018-11-04T13:06:50Z

@dnhatn I want to make sure I understand this properly - this only happens when we have soft deletes enabled, correct?

If so, +1 to what Yannick said. The out of order delivery during translog replay is not detected because the processing of the delete from the translog is shortcutted by the local checkpoint optimization (the bug you described) or by the seq# deduplication logic in the following engine (if the local checkpoint is still low). In both cases the source of the problem IMO is that we only restored part of the engine state - i.e., the local checkpoint tracker but left delete knowledge (currently in the version map tombstone) incomplete. Assuming this is a soft delete related issue, I agree that we should proceed to implement the natural next step we were planning anyway[1] and make out of order detection delivery soft deletes aware.

[1] We are heading towards making the version map only capturing the pending ops in lucene. Nothing more.

dnhatn · 2018-11-04T19:57:50Z

@ywelsch and @bleskes Thanks for looking. You both understand the problem correctly.

this only happens when we have soft deletes enabled, correct

Yes, this only happens with soft-deleted enabled.

The way to fix that would be to make the "load from index" in compareOpToLuceneDocBasedOnSeqNo aware of the soft-deletes.

I will implement this option since "we are heading towards making the version map only capturing the pending ops in Lucene." which eliminates the former approach.

dnhatn · 2018-11-05T15:31:55Z

@ywelsch and @bleskes This is ready for another round. Can you please have a look? Thank you.

bleskes

Looks great but I left some comments.

bleskes · 2018-11-07T11:42:54Z

server/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDVersionAndSeqNoLookup.java

+                        result = new DocIdAndSeqNo(docID, seqNo, context, isDeleted);
+                    } else if (result.seqNo == seqNo && (result.isDeleted || isDeleted == false)) {
+                        // the extra guard "result.isDeleted || isDeleted == false" is to make sure that we pick a live doc instead of
+                        // a deleted doc in case the same index operation was indexed twice - one as live and another as soft-deleted.


when can this happen exactly?

bleskes · 2018-11-07T11:44:09Z

server/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDVersionAndSeqNoLookup.java

+                    } else {
+                        seqNo = SequenceNumbers.UNASSIGNED_SEQ_NO;
+                    }
+                    final boolean isDeleted = (liveDocs != null && liveDocs.get(docID) == false);


I think we should have an optimization (and maybe test it somehow) that when we hit a live doc we stop and return. This is important to stop iterating quickly when people are updating an existing doc (which is a comment pattern imo).

agreed lets return if we hit a live doc and otherwise use the highest seqId. then we don't have the extraguard below or it's implicit.

bleskes · 2018-11-07T11:45:11Z

server/src/main/java/org/elasticsearch/common/lucene/uid/VersionsAndSeqNoResolver.java

-     * <li>a doc ID and the associated seqNo otherwise
-     * </ul>
+     * Loads the internal docId and sequence number of the latest copy for a given uid from the provided reader.
+     * The flag {@link DocIdAndSeqNo#isDeleted} indicates whether the returned document is soft(deleted) or live.


I think you want (soft)deleted

bleskes · 2018-11-07T11:53:46Z

server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

+                }
+            }
+            engine.refresh("test");
+            try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {


I think the test will be much stronger if you do this also after every refresh (while maintaining last known sec no per id in a separate map.

s1monw · 2018-11-07T12:15:28Z

server/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDVersionAndSeqNoLookup.java

+                    } else {
+                        seqNo = SequenceNumbers.UNASSIGNED_SEQ_NO;
+                    }
+                    final boolean isDeleted = (liveDocs != null && liveDocs.get(docID) == false);


agreed lets return if we hit a live doc and otherwise use the highest seqId. then we don't have the extraguard below or it's implicit.

s1monw · 2018-11-07T12:16:25Z

server/src/main/java/org/elasticsearch/common/lucene/uid/VersionsAndSeqNoResolver.java


-        DocIdAndSeqNo(int docId, long seqNo, LeafReaderContext context) {
+        DocIdAndSeqNo(int docId, long seqNo, LeafReaderContext context, boolean isDeleted) {


can we call it isLive we usually mark docs as live not deleted

dnhatn · 2018-11-07T15:33:12Z

@bleskes and @s1monw I have addressed your comments. Would you please take another look? Thank you!

bleskes

LGTM

bleskes · 2018-11-07T15:54:16Z

server/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDVersionAndSeqNoLookup.java

+                        // The live document must always be the latest copy, thus we can early terminate here.
+                        // If a nested docs is live, we return the first doc which doesn't have term (only the last doc has term).
+                        // This should not be an issue since we no longer use primary term as tier breaker when comparing operations.
+                        return new DocIdAndSeqNo(docID, seqNo, context, isLive);


I liked the assertion you had there that if already have a result, this one has a higher seq no

Yep, I pushed 33675c4.

s1monw

LGTM 2

dnhatn · 2018-11-07T20:23:52Z

Thanks @ywelsch @bleskes and @s1monw.

A CCR test failure shows that the approach in #34474 is flawed. Restoring the LocalCheckpointTracker from an index commit can cause both FollowingEngine and InternalEngine to incorrectly ignore some deletes. Here is a small scenario illustrating the problem: 1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene 2. Flush a commit consisting of only the delete tombstone 3. Index doc with seq=0 => engine will add that doc to Lucene but soft-deleted 4. Restart an engine with the commit (step 2); the engine will fill its LocalCheckpointTracker with the delete tombstone in the commit 5. Replay the local translog in reverse order: index#0 then delete#1 6. When process index#0, an engine will add it into Lucene as a live doc and advance the local checkpoint to 1 (seq#1 was restored from the commit - step 4). 7. When process delete#1, an engine will skip it because seq_no=1 is less than or equal to the local checkpoint. We should have zero document after recovering from translog, but here we have one. Since all operations after the local checkpoint of the safe commit are retained, we should find them if the look-up considers also soft-deleted documents. This PR fills the disparity between the version map and the local checkpoint tracker by taking soft-deleted documents into account while resolving strategy for engine operations. Relates #34474 Relates #33656

* master: (24 commits) Replicate index settings to followers (elastic#35089) Rename RealmConfig.globalSettings() to settings() (elastic#35330) [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329) Scripting: Add back lookup vars in score script (elastic#34833) watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271) Remove ALL shard check in CheckShrinkReadyStep (elastic#35346) Use soft-deleted docs to resolve strategy for engine operation (elastic#35230) [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316) Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160) Add a frozen engine implementation (elastic#34357) Put a fake allocation id on allocate stale primary command (elastic#34140) [CCR] Enforce auto follow pattern name restrictions (elastic#35197) [ILM] rolling upgrade tests (elastic#35328) [ML] Add Missing data checking class (elastic#35310) Apply `ignore_throttled` also to concrete indices (elastic#35335) Make version field names more meaningful (elastic#35334) [CCR] Added HLRC support for pause follow API (elastic#35216) [Docs] Improve Convert Processor description (elastic#35280) [Painless] Removes extraneous compile method (elastic#35323) [CCR] Fail with a better error if leader index is red (elastic#35298) ...

* elastic/master: (25 commits) Fixes fast vector highlighter docs per issue 24318. (elastic#34190) [ML] Prevent notifications on deletion of a non existent job (elastic#35337) [CCR] Auto follow Coordinator fetch cluster state in system context (elastic#35120) Mute test for elastic#35361 Preserve `date_histogram` format when aggregating on unmapped fields (elastic#35254) Test: Mute failing SSL test Allow unmapped fields in composite aggregations (elastic#35331) [RCI] Add IndexShardOperationPermits.asyncBlockOperations(ActionListener<Releasable>) (elastic#34902) HLRC: reindex API with wait_for_completion false (elastic#35202) Add docs on JNA temp directory not being noexec (elastic#35355) [CCR] Adjust list of dynamic index settings that should be replicated (elastic#35195) Replicate index settings to followers (elastic#35089) Rename RealmConfig.globalSettings() to settings() (elastic#35330) [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329) Scripting: Add back lookup vars in score script (elastic#34833) watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271) Remove ALL shard check in CheckShrinkReadyStep (elastic#35346) Use soft-deleted docs to resolve strategy for engine operation (elastic#35230) [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316) Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160) ...

…ic#35230) A CCR test failure shows that the approach in elastic#34474 is flawed. Restoring the LocalCheckpointTracker from an index commit can cause both FollowingEngine and InternalEngine to incorrectly ignore some deletes. Here is a small scenario illustrating the problem: 1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene 2. Flush a commit consisting of only the delete tombstone 3. Index doc with seq=0 => engine will add that doc to Lucene but soft-deleted 4. Restart an engine with the commit (step 2); the engine will fill its LocalCheckpointTracker with the delete tombstone in the commit 5. Replay the local translog in reverse order: index#0 then delete#1 6. When process index#0, an engine will add it into Lucene as a live doc and advance the local checkpoint to 1 (seq#1 was restored from the commit - step 4). 7. When process delete#1, an engine will skip it because seq_no=1 is less than or equal to the local checkpoint. We should have zero document after recovering from translog, but here we have one. Since all operations after the local checkpoint of the safe commit are retained, we should find them if the look-up considers also soft-deleted documents. This PR fills the disparity between the version map and the local checkpoint tracker by taking soft-deleted documents into account while resolving strategy for engine operations. Relates elastic#34474 Relates elastic#33656

This PR reverts #35230. Previously, we reply on soft-deletes to fill the mismatch between the version map and the Lucene index. This is no longer needed after #43202 where we rebuild the version map when opening an engine. Moreover, PrunePostingsMergePolicy can prune _id of soft-deleted documents out of order; thus the lookup result including soft-deletes sometimes does not return the latest version (although it's okay as we only use a valid result in an engine). With this change, we use only live documents in Lucene to resolve the indexing strategy. This is perfectly safe since we keep all deleted documents after the local checkpoint in the version map. Closes #42979

dnhatn added 2 commits November 3, 2018 13:22

do not restore local checkpoint tracker

97815bd

bootstrap max_seq_no_of_updates

92dde70

dnhatn added >non-issue blocker :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.5.0 v6.6.0 labels Nov 3, 2018

dnhatn requested review from s1monw and bleskes November 3, 2018 19:29

dnhatn added >non-issue and removed >non-issue labels Nov 3, 2018

remove assert optimized count

436cbeb

ywelsch suggested changes Nov 4, 2018

View reviewed changes

dnhatn added v6.5.1 and removed v6.5.0 labels Nov 4, 2018

dnhatn added 2 commits November 4, 2018 15:52

Merge branch 'master' into engine-deletes

28a0a13

prepare for new direction

9ac1c39

dnhatn added v6.5.0 and removed v6.5.1 labels Nov 5, 2018

dnhatn added 3 commits November 5, 2018 09:50

includeDeletes

a5e1fed

Merge branch 'master' into engine-deletes

b4f858c

wording

664ef55

dnhatn added >enhancement and removed >non-issue labels Nov 5, 2018

dnhatn changed the title ~~Do not restore LocalCheckpointTracker from commit~~ Use soft-deleted docs to resolve strategy for engine operation Nov 5, 2018

dnhatn requested a review from s1monw November 6, 2018 23:47

bleskes suggested changes Nov 7, 2018

View reviewed changes

s1monw reviewed Nov 7, 2018

View reviewed changes

dnhatn added 3 commits November 7, 2018 09:48

terminate when hit live doc

b55e9cf

Merge branch 'master' into engine-deletes

4260990

verify each refresh

7646122

dnhatn requested review from bleskes and s1monw November 7, 2018 15:33

simplify

26993a3

bleskes approved these changes Nov 7, 2018

View reviewed changes

s1monw approved these changes Nov 7, 2018

View reviewed changes

assert live doc has highest seqno

33675c4

dnhatn merged commit ed8732b into elastic:master Nov 7, 2018

dnhatn deleted the engine-deletes branch November 7, 2018 20:26

dnhatn added the backport pending label Nov 7, 2018

dnhatn removed the backport pending label Nov 8, 2018

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

dnhatn mentioned this pull request Jun 7, 2019

[CI] InternalEngineTests.testLookupSeqNoByIdInLucene fails after prune ID merge policy #42979

Closed

dnhatn mentioned this pull request Jun 18, 2019

Do not use soft-deletes to resolve indexing strategy #43336

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use soft-deleted docs to resolve strategy for engine operation #35230

Use soft-deleted docs to resolve strategy for engine operation #35230

dnhatn commented Nov 3, 2018 •

edited

Loading

elasticmachine commented Nov 3, 2018

ywelsch left a comment

bleskes commented Nov 4, 2018

dnhatn commented Nov 4, 2018

dnhatn commented Nov 5, 2018

bleskes left a comment

bleskes Nov 7, 2018

bleskes Nov 7, 2018

s1monw Nov 7, 2018

dnhatn Nov 7, 2018

bleskes Nov 7, 2018

bleskes Nov 7, 2018

s1monw Nov 7, 2018

s1monw Nov 7, 2018

dnhatn commented Nov 7, 2018

bleskes left a comment

bleskes Nov 7, 2018

dnhatn Nov 7, 2018

s1monw left a comment

dnhatn commented Nov 7, 2018


		DocIdAndSeqNo(int docId, long seqNo, LeafReaderContext context) {
		DocIdAndSeqNo(int docId, long seqNo, LeafReaderContext context, boolean isDeleted) {

Use soft-deleted docs to resolve strategy for engine operation #35230

Use soft-deleted docs to resolve strategy for engine operation #35230

Conversation

dnhatn commented Nov 3, 2018 • edited Loading

elasticmachine commented Nov 3, 2018

ywelsch left a comment

Choose a reason for hiding this comment

bleskes commented Nov 4, 2018

dnhatn commented Nov 4, 2018

dnhatn commented Nov 5, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Nov 7, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw left a comment

Choose a reason for hiding this comment

dnhatn commented Nov 7, 2018

dnhatn commented Nov 3, 2018 •

edited

Loading