Improve run time of coordinator duty MarkAsUnusedOvershadowedSegments #13287
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In clusters with a large number of segments, the duty
MarkAsUnusedOvershadowedSegments
can take a long very long time to finish. This is because of the costly invocation of
timeline.isOvershadowed
which is done for every used segment in every coordinator run.This is extravagant as there are only a few overshadowed segments which are already identified
as part of the datasource snapshot. In clusters with ~500k segments, this duty can take several
minutes to finish even when no segment is actually overshadowed.
Changes
DataSourceSnapshot.getOvershadowedSegments
to get all overshadowed segmentsSegmentTimeline
for ease of use and readability while using aVersionedIntervalTimeline
of segments.Notes
The changes here provide significant improvement in the run time of this duty.
The number of overshadowed segments in a given coordinator run is expected to be small.
Even in the rare bad case scenario (half of all used segments are overshadowed), this change
would halve the run time of the duty. Subsequent runs would again take negligible time.
(Absolute worst case scenario would be a single segment overshadowing everything 😅)
Further improvements can be made to the run time of this duty but that would require changes to the
logic of
timeline.isOvershadowed
and would provide only minor time reductions.This PR has: