Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use soft-deleted docs to resolve strategy for engine operation #35230

Merged
merged 16 commits into from
Nov 7, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Nov 3, 2018

A CCR test failure shows that the approach in #34474 is flawed. Restoring the LocalCheckpointTracker from an index commit can cause both FollowingEngine and InternalEngine to incorrectly ignore some delete operations.

Here is a small scenario to illustrate the problem:

  1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene
  2. Flush a commit consisting of only the delete tombstone
  3. Index doc with seq=0 => engine will add that doc to Lucene but soft-deleted
  4. Restart an engine with the commit (step 2); the engine will fill its LocalCheckpointTracker with the delete tombstone in the commit
  5. Replay the local translog in reverse order: index#0 then delete#1
  6. When process index#0, an engine will add it into Lucene as a live doc and advance the local checkpoint to 1 (seq#1 was restored from the commit - step 4).
  7. When process delete#1, an engine will skip it because seq_no=1 is less than or equal to the local checkpoint.

We should have zero document after recovering from translog, but here we have one.

Since all operations after the local checkpoint of the safe commit are retained, we should find them if the look-up considers also soft-deleted documents. This PR fills the disparity between the version map and the local checkpoint tracker by taking soft-deleted documents into account while resolving strategy for engine operations.

Relates #34474
Relates #33656

/cc @jasontedor and @martijnvg

@dnhatn dnhatn added >non-issue blocker :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.5.0 v6.6.0 labels Nov 3, 2018
@dnhatn dnhatn requested review from s1monw and bleskes November 3, 2018 19:29
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue (as far as I understand) is that by (only) restoring the local checkpoint tracker, we're not able to just replay history from the translog for recovery purposes. The situation which is not properly handled is when there are out-of-order deletes in the safe commit. In particular, the compareOpToLuceneDocBasedOnSeqNo logic does not take those (soft-)deletes into account while the newly restored local checkpoint tracker does. Under normal (replica) mode of a shard, out-of-order deletes are handled through the version map: We keep track of all deletes that have happened above the local checkpoint, so that when an (older) indexing operation arrives, we can safely process it as a stale op.
Instead of giving up on the (IMO much cleaner) approach of restoring the state from the safe index commit, we just need to extend this to what's actually required for the replication logic to properly work. One approach would be to restore the out-of-order deletes into the version map. Another approach would be to ensure that the version map is truly only needed for stuff that's not refreshed yet. The way to fix that would be to make the "load from index" in compareOpToLuceneDocBasedOnSeqNo aware of the soft-deletes.

@bleskes
Copy link
Contributor

bleskes commented Nov 4, 2018

@dnhatn I want to make sure I understand this properly - this only happens when we have soft deletes enabled, correct?

If so, +1 to what Yannick said. The out of order delivery during translog replay is not detected because the processing of the delete from the translog is shortcutted by the local checkpoint optimization (the bug you described) or by the seq# deduplication logic in the following engine (if the local checkpoint is still low). In both cases the source of the problem IMO is that we only restored part of the engine state - i.e., the local checkpoint tracker but left delete knowledge (currently in the version map tombstone) incomplete. Assuming this is a soft delete related issue, I agree that we should proceed to implement the natural next step we were planning anyway[1] and make out of order detection delivery soft deletes aware.

[1] We are heading towards making the version map only capturing the pending ops in lucene. Nothing more.

@dnhatn dnhatn added v6.5.1 and removed v6.5.0 labels Nov 4, 2018
@dnhatn
Copy link
Member Author

dnhatn commented Nov 4, 2018

@ywelsch and @bleskes Thanks for looking. You both understand the problem correctly.

this only happens when we have soft deletes enabled, correct

Yes, this only happens with soft-deleted enabled.

The way to fix that would be to make the "load from index" in compareOpToLuceneDocBasedOnSeqNo aware of the soft-deletes.

I will implement this option since "we are heading towards making the version map only capturing the pending ops in Lucene." which eliminates the former approach.

@dnhatn dnhatn added v6.5.0 and removed v6.5.1 labels Nov 5, 2018
@dnhatn dnhatn changed the title Do not restore LocalCheckpointTracker from commit Use soft-deleted docs to resolve strategy for engine operation Nov 5, 2018
@dnhatn
Copy link
Member Author

dnhatn commented Nov 5, 2018

@ywelsch and @bleskes This is ready for another round. Can you please have a look? Thank you.

@dnhatn dnhatn requested a review from s1monw November 6, 2018 23:47
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great but I left some comments.

result = new DocIdAndSeqNo(docID, seqNo, context, isDeleted);
} else if (result.seqNo == seqNo && (result.isDeleted || isDeleted == false)) {
// the extra guard "result.isDeleted || isDeleted == false" is to make sure that we pick a live doc instead of
// a deleted doc in case the same index operation was indexed twice - one as live and another as soft-deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when can this happen exactly?

} else {
seqNo = SequenceNumbers.UNASSIGNED_SEQ_NO;
}
final boolean isDeleted = (liveDocs != null && liveDocs.get(docID) == false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have an optimization (and maybe test it somehow) that when we hit a live doc we stop and return. This is important to stop iterating quickly when people are updating an existing doc (which is a comment pattern imo).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed lets return if we hit a live doc and otherwise use the highest seqId. then we don't have the extraguard below or it's implicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

* <li>a doc ID and the associated seqNo otherwise
* </ul>
* Loads the internal docId and sequence number of the latest copy for a given uid from the provided reader.
* The flag {@link DocIdAndSeqNo#isDeleted} indicates whether the returned document is soft(deleted) or live.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want (soft)deleted

}
}
engine.refresh("test");
try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test will be much stronger if you do this also after every refresh (while maintaining last known sec no per id in a separate map.

} else {
seqNo = SequenceNumbers.UNASSIGNED_SEQ_NO;
}
final boolean isDeleted = (liveDocs != null && liveDocs.get(docID) == false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed lets return if we hit a live doc and otherwise use the highest seqId. then we don't have the extraguard below or it's implicit.


DocIdAndSeqNo(int docId, long seqNo, LeafReaderContext context) {
DocIdAndSeqNo(int docId, long seqNo, LeafReaderContext context, boolean isDeleted) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call it isLive we usually mark docs as live not deleted

@dnhatn
Copy link
Member Author

dnhatn commented Nov 7, 2018

@bleskes and @s1monw I have addressed your comments. Would you please take another look? Thank you!

@dnhatn dnhatn requested review from bleskes and s1monw November 7, 2018 15:33
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// The live document must always be the latest copy, thus we can early terminate here.
// If a nested docs is live, we return the first doc which doesn't have term (only the last doc has term).
// This should not be an issue since we no longer use primary term as tier breaker when comparing operations.
return new DocIdAndSeqNo(docID, seqNo, context, isLive);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the assertion you had there that if already have a result, this one has a higher seq no

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I pushed 33675c4.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 2

@dnhatn
Copy link
Member Author

dnhatn commented Nov 7, 2018

Thanks @ywelsch @bleskes and @s1monw.

@dnhatn dnhatn merged commit ed8732b into elastic:master Nov 7, 2018
@dnhatn dnhatn deleted the engine-deletes branch November 7, 2018 20:26
dnhatn added a commit that referenced this pull request Nov 8, 2018
A CCR test failure shows that the approach in #34474 is flawed.
Restoring the LocalCheckpointTracker from an index commit can cause both
FollowingEngine and InternalEngine to incorrectly ignore some deletes.

Here is a small scenario illustrating the problem:

1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene

2. Flush a commit consisting of only the delete tombstone

3. Index doc with seq=0  => engine will add that doc to Lucene but soft-deleted

4. Restart an engine with the commit (step 2); the engine will fill its
LocalCheckpointTracker with the delete tombstone in the commit

5. Replay the local translog in reverse order: index#0 then delete#1

6. When process index#0, an engine will add it into Lucene as a live doc
and advance the local checkpoint to 1 (seq#1 was restored from the
commit - step 4).

7. When process delete#1, an engine will skip it because seq_no=1 is
less than or equal to the local checkpoint.

We should have zero document after recovering from translog, but here we
have one.

Since all operations after the local checkpoint of the safe commit are
retained, we should find them if the look-up considers also soft-deleted
documents. This PR fills the disparity between the version map and the
local checkpoint tracker by taking soft-deleted documents into account
while resolving strategy for engine operations.

Relates #34474
Relates #33656
dnhatn added a commit that referenced this pull request Nov 8, 2018
A CCR test failure shows that the approach in #34474 is flawed.
Restoring the LocalCheckpointTracker from an index commit can cause both
FollowingEngine and InternalEngine to incorrectly ignore some deletes.

Here is a small scenario illustrating the problem:

1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene

2. Flush a commit consisting of only the delete tombstone

3. Index doc with seq=0  => engine will add that doc to Lucene but soft-deleted

4. Restart an engine with the commit (step 2); the engine will fill its
LocalCheckpointTracker with the delete tombstone in the commit

5. Replay the local translog in reverse order: index#0 then delete#1

6. When process index#0, an engine will add it into Lucene as a live doc
and advance the local checkpoint to 1 (seq#1 was restored from the
commit - step 4).

7. When process delete#1, an engine will skip it because seq_no=1 is
less than or equal to the local checkpoint.

We should have zero document after recovering from translog, but here we
have one.

Since all operations after the local checkpoint of the safe commit are
retained, we should find them if the look-up considers also soft-deleted
documents. This PR fills the disparity between the version map and the
local checkpoint tracker by taking soft-deleted documents into account
while resolving strategy for engine operations.

Relates #34474
Relates #33656
jasontedor added a commit to martijnvg/elasticsearch that referenced this pull request Nov 8, 2018
* master: (24 commits)
  Replicate index settings to followers (elastic#35089)
  Rename RealmConfig.globalSettings() to settings() (elastic#35330)
  [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329)
  Scripting: Add back lookup vars in score script (elastic#34833)
  watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271)
  Remove ALL shard check in CheckShrinkReadyStep (elastic#35346)
  Use soft-deleted docs to resolve strategy for engine operation (elastic#35230)
  [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316)
  Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160)
  Add a frozen engine implementation (elastic#34357)
  Put a fake allocation id on allocate stale primary command (elastic#34140)
  [CCR] Enforce auto follow pattern name restrictions (elastic#35197)
  [ILM] rolling upgrade tests (elastic#35328)
  [ML] Add Missing data checking class (elastic#35310)
  Apply `ignore_throttled` also to concrete indices (elastic#35335)
  Make version field names more meaningful  (elastic#35334)
  [CCR] Added HLRC support for pause follow API (elastic#35216)
  [Docs] Improve Convert Processor description (elastic#35280)
  [Painless] Removes extraneous compile method (elastic#35323)
  [CCR] Fail with a better error if leader index is red (elastic#35298)
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Nov 8, 2018
* elastic/master: (25 commits)
  Fixes fast vector highlighter docs per issue 24318. (elastic#34190)
  [ML] Prevent notifications on deletion of a non existent job (elastic#35337)
  [CCR] Auto follow Coordinator fetch cluster state in system context (elastic#35120)
  Mute test for elastic#35361
  Preserve `date_histogram` format when aggregating on unmapped fields (elastic#35254)
  Test: Mute failing SSL test
  Allow unmapped fields in composite aggregations (elastic#35331)
  [RCI] Add IndexShardOperationPermits.asyncBlockOperations(ActionListener<Releasable>) (elastic#34902)
  HLRC: reindex API with wait_for_completion false (elastic#35202)
  Add docs on JNA temp directory not being noexec (elastic#35355)
  [CCR] Adjust list of dynamic index settings that should be replicated (elastic#35195)
  Replicate index settings to followers (elastic#35089)
  Rename RealmConfig.globalSettings() to settings() (elastic#35330)
  [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329)
  Scripting: Add back lookup vars in score script (elastic#34833)
  watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271)
  Remove ALL shard check in CheckShrinkReadyStep (elastic#35346)
  Use soft-deleted docs to resolve strategy for engine operation (elastic#35230)
  [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316)
  Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160)
  ...
pgomulka pushed a commit to pgomulka/elasticsearch that referenced this pull request Nov 13, 2018
…ic#35230)

A CCR test failure shows that the approach in elastic#34474 is flawed.
Restoring the LocalCheckpointTracker from an index commit can cause both
FollowingEngine and InternalEngine to incorrectly ignore some deletes.

Here is a small scenario illustrating the problem:

1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene

2. Flush a commit consisting of only the delete tombstone

3. Index doc with seq=0  => engine will add that doc to Lucene but soft-deleted

4. Restart an engine with the commit (step 2); the engine will fill its
LocalCheckpointTracker with the delete tombstone in the commit

5. Replay the local translog in reverse order: index#0 then delete#1

6. When process index#0, an engine will add it into Lucene as a live doc
and advance the local checkpoint to 1 (seq#1 was restored from the
commit - step 4).

7. When process delete#1, an engine will skip it because seq_no=1 is
less than or equal to the local checkpoint.

We should have zero document after recovering from translog, but here we
have one.

Since all operations after the local checkpoint of the safe commit are
retained, we should find them if the look-up considers also soft-deleted
documents. This PR fills the disparity between the version map and the
local checkpoint tracker by taking soft-deleted documents into account
while resolving strategy for engine operations.

Relates elastic#34474
Relates elastic#33656
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
dnhatn added a commit that referenced this pull request Jun 19, 2019
This PR reverts #35230.

Previously, we reply on soft-deletes to fill the mismatch between the
version map and the Lucene index. This is no longer needed after #43202
where we rebuild the version map when opening an engine. Moreover,
PrunePostingsMergePolicy can prune _id of soft-deleted documents out of
order; thus the lookup result including soft-deletes sometimes does not
return the latest version (although it's okay as we only use a valid
result in an engine).

With this change, we use only live documents in Lucene to resolve the
indexing strategy. This is perfectly safe since we keep all deleted
documents after the local checkpoint in the version map.

Closes #42979
dnhatn added a commit that referenced this pull request Jun 19, 2019
This PR reverts #35230.

Previously, we reply on soft-deletes to fill the mismatch between the
version map and the Lucene index. This is no longer needed after #43202
where we rebuild the version map when opening an engine. Moreover,
PrunePostingsMergePolicy can prune _id of soft-deleted documents out of
order; thus the lookup result including soft-deletes sometimes does not
return the latest version (although it's okay as we only use a valid
result in an engine).

With this change, we use only live documents in Lucene to resolve the
indexing strategy. This is perfectly safe since we keep all deleted
documents after the local checkpoint in the version map.

Closes #42979
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v6.5.0 v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants