-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double-check local checkpoint for staleness #29276
Double-check local checkpoint for staleness #29276
Conversation
Today, when determining if an operation is stale, we compare the seqno against the local checkpoint before looking in the version map. However, in between these two checks the local checkpoint could advance, causing some tombstones to become stale, and then the stale tombstones could be collected. In this situation we might incorrectly decide that the operation is fresh and apply it. To avoid this situation, check the local checkpoint again after calling getVersionFromMap(). Since it only ever increases, this gives the right result despite other concurrent activity.
Actually turning this into something that can reliably fail a test case seems tricky. Your thoughts welcome. |
Thanks @DaveCTurner . The change is semantically correct, but I prefer we bundle things more. Right we first compare the seq# to the local checkpoint and then call
This will bundle the version map with Lucene as a single unit with a requirement to remember everything above the local checkpoint. I think that's cleaner and align better with future direction. WDYT?
I was thinking about this this morning. I was hoping that if we add refreshes to testConcurrentOutOfDocsOnReplica and sometimes run it with gc deletes equal to 0, it will reproduce. |
This models how indexing and deletion operations are handled on the replica, including the optimisations for append-only operations and the interaction with Lucene commits and the version map. It incorporates - elastic/elasticsearch#28787 - elastic/elasticsearch#28790 - elastic/elasticsearch#29276 - a proposal to always prune tombstones
Sure, I like that idea too.
🤞 I was wondering about adding some kind of
at strategic points in order to expose rare races more frequently. |
Pinging @elastic/es-distributed |
…3-28-double-check-local-checkpoint
Today, each test has its own process for deciding whether to refresh/flush/GC deletes after applying an operation. This change moves this decision into generateSingleDocHistory().
@bleskes I looked at |
@DaveCTurner shall we close this in favour of #30121 ? |
Yes, lets. |
Today, when determining if an operation is stale, we compare the seqno against
the local checkpoint before looking in the version map. However, in between
these two checks the local checkpoint could advance, causing some tombstones to
become stale, and then the stale tombstones could be collected. In this
situation we might incorrectly decide that the operation is fresh and apply it.
To avoid this situation, check the local checkpoint again after calling
getVersionFromMap(). Since it only ever increases, this gives the right result
despite other concurrent activity.