-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up duty for non-overlapping eternity tombstones #15281
Clean up duty for non-overlapping eternity tombstones #15281
Conversation
0fd8732
to
d89baf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, @abhishekrb19 ! I think we need some more clarity on what qualifies as a "dangling tombstone" and why and when we should clean them up.
...r/src/main/java/org/apache/druid/server/coordinator/duty/MarkDanglingTombstonesAsUnused.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/druid/server/coordinator/duty/MarkDanglingTombstonesAsUnused.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/druid/server/coordinator/duty/MarkDanglingTombstonesAsUnused.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/druid/server/coordinator/duty/MarkDanglingTombstonesAsUnused.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/druid/server/coordinator/duty/MarkDanglingTombstonesAsUnused.java
Outdated
Show resolved
Hide resolved
usedNonOvershadowedSegment.getInterval() | ||
.overlaps(usedSegment.getInterval()) | ||
); | ||
if (!overlaps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a dangling tombstone is one that is an infinity tombstone and DOES NOT overlap with another infinity tombstone? Shouldn't there be some more (general) conditions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one that does not overlap with any overshadowed used segment in the snapshot. Updated javadocs.
...r/src/main/java/org/apache/druid/server/coordinator/duty/MarkDanglingTombstonesAsUnused.java
Outdated
Show resolved
Hide resolved
|
||
final Optional<Set<DataSegment>> usedNonOvershadowedSegments = | ||
Optional.fromNullable(usedSegmentsTimeline) | ||
.transform(timeline -> timeline.findNonOvershadowedObjectsInInterval( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to call findNonOvershadowed
here or should we just use the set of overshadowed segments captured in the DataSourcesSnapshot
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I considered just the overshadowed set of segments (please see c5fb0e9), but realized that would not provide us a complete guarantee as there can be a partial overlap with a used segment that's not overshadowed, so we'd need to look at the larger set of findNonOvershadowed
segments - added a UT for one such scenario org.apache.druid.server.coordinator.duty.MarkDanglingSegmentsAsUnusedTest#testDanglingTombstonesWithPartiallyOverlappingUnderlyingSegment
.
EDIT: actually, just the set of overshadowed segments would suffice. The partially overlapping segments I was thinking of would still continue to exist
...r/src/main/java/org/apache/druid/server/coordinator/duty/MarkDanglingTombstonesAsUnused.java
Outdated
Show resolved
Hide resolved
Why is the eternity required to be one of the endpoints per the definition? Can't a tombstone be "useless", even without that restriction? |
@LakshSingla , I suppose the assumption is that tombstones that do not have infinity as an end-point would become useless only when they get overshadowed by newer segments. So they would be removed by the existing But I agree that there are nuances here which need to be called out clearly. |
@kfaraz |
Thanks for the reviews!
Good question - I included a rationale in the javadocs. In short, after some discussions, we're treating tombstones as zero-row segments, so they exist, but contain no data. These infinite-interval tombstones don't honor the provided segment granularity (because we cannot generate infinite segments per time grain in the timeline) and are inconvenient to append data to afterwards. So these are the the only candidate segments considered by the coordinator duty to be marked as unused for now. The finite-interval tombstones honor the provided segment granularity with #15243. |
@abhishekrb19 is the following scenario possible?
|
@AmatyaAvadhanula, yes, I think what you're describing is possible if the datasource snapshot doesn't have information about the new appended segment from a concurrent task. EDIT: Instead we went with #15379 that changed the number of core partitions of tombstone shard specs from 1 to 0. That should also address the concurrent append/replace scenario where an appended segment will exist independently of the tomstone. |
Co-authored-by: 317brian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left minor comments. LGTM overall!
...r/src/main/java/org/apache/druid/server/coordinator/duty/MarkEternityTombstonesAsUnused.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/stats/Stats.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/druid/server/coordinator/duty/MarkEternityTombstonesAsUnused.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you @abhishekrb19
Thank you for the reviews! |
This PR adds a new duty to clean up non-overlapping eternity tombstones. The candidate segments must fit all the criteria:
-INF
or ends atINF
andnumCorePartitions
as 0 for tombstones #15379 need to be manually cleaned up as it can cause data loss if removed.The algorithm to determine these tombstones is as follows:
-INF
or ends atINF
, look at the set of overshadowed used segments to see if at least one of the intervals overlaps with the candidate segment.This is a companion change to #15243.
Release note
A new duty,
MarkEternityTombstonesAsUnused
, is added to clean up non-overlapping eternity tombstones - tombstone segments that either start at-INF
or end atINF
and has no overlap with any overshadowed used segment in the datasource. A metricsegment/unneededEternityTombstone/count
is emitted that counts the number of dropped non-overshadowed eternity tombstones per datasource.Key changed/added classes in this PR
MarkEternityTombstonesAsUnused.java
MarkEternityTombstonesAsUnusedTest.java
This PR has: