-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch segment retrieval from the metadata store #15305
Merged
abhishekrb19
merged 13 commits into
apache:master
from
abhishekrb19:retrieve_used_segments_too_many_intervals_test
Nov 6, 2023
Merged
Batch segment retrieval from the metadata store #15305
abhishekrb19
merged 13 commits into
apache:master
from
abhishekrb19:retrieve_used_segments_too_many_intervals_test
Nov 6, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… are retrieved. - This is a failing test case that needs to be ignored.
server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataQuery.java
Fixed
Show fixed
Hide fixed
abhishekrb19
changed the title
Used segments retrieval fails when there are too many intervals
Batch segment retrieval from the metadata store
Nov 3, 2023
zachjsh
reviewed
Nov 3, 2023
server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataQuery.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinatorTest.java
Fixed
Show fixed
Hide fixed
zachjsh
approved these changes
Nov 6, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
abhishekrb19
deleted the
retrieve_used_segments_too_many_intervals_test
branch
November 6, 2023 19:30
CaseyPan
pushed a commit
to CaseyPan/druid
that referenced
this pull request
Nov 17, 2023
* Add a unit test that fails when used segments with too many intervals are retrieved. - This is a failing test case that needs to be ignored. * Batch the intervals (use 100 as it's consistent with batching in other places). * move the filtering inside the batch * Account for limit cross the batch splits. * Adjustments * Fixup and add tests * small refactor * add more tests. * remove wrapper. * Minor edits * assert out of range
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Used segments retrieval fails when there are too many intervals (about 120 with Derby). Same thing can happen with multiple intervals as with unused segments.
Previously, the intervals weren't capped in the SQL query and is bloated by the fact that we add grouped
OR
clause per interval to handle the eternity interval here. This PR splits up theSELECT
query into multiple batches, with 100 intervals/batch. This is similar to the batching strategy with a cap on max number of segments announced at once.Core changes:
A fix localized to kill tasks that originally exposed this bug is available here - #15306. We'd still separately want this change in the server as the issue can happen more broadly.
This PR has: