-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prioritize loading of segments based on segment interval #2261
Conversation
👍 I think we can also remove bucketMonthComparator in druid-api, I don't see it being used anywhere else. |
da72fb8
to
a001743
Compare
@drcrallen @fjy @gianm @himanshug : can one of you take a look at this PR ? |
Prior behavior is to prioritize by month, and if months are equal, to prioritize on the DataSegment identifier. This PR changes behavior to load based on complete timestamp of the start of the interval for the DataSegment. The proposed behavior deviates from prior in that segments with smaller granularity will be favored more in this Comparator compared to the prior Comparator. (ex: Hour segments will be favored over Day segments). Is there any risk the Coordinator segment balancer will spend too long just balancing batches of hour segments, and won't ever try to rebalance older segments? |
@nishantmonu51 concern pointed by @drcrallen might be very real when you have 2 dataSources with segments of 15min and hour/day granularity , it will be good if we can incorporate the the interval size into the comparator too. |
@drcrallen @himanshug: In prior behavior for segments with same month, we used to order things by the dataSource name as the segment Identifier starts with the dataSource, So in case of 2 dataSources with segments of 15min and hour/day granularity, It will favor the one whose name comes first lexicographically instead of depending on segment granularity. For proposed behavior, how do you feel about the comparator being changed to compare IntervalEndThenStart to always load the segments having more recent data first ? |
@nishantmonu51 I think that's a bit better than either prior case or the first proposed case in this PR. |
@nishantmonu51 SGTM |
public int compare(DataSegment lhs, DataSegment rhs) | ||
{ | ||
int retVal = Comparators.intervalsByStartThenEnd().compare(lhs.getInterval(), rhs.getInterval()); | ||
return retVal != 0 ? retVal : lhs.compareTo(rhs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding a check here, you can chain comparators using Ordering.from(<intervalcomparator>).compound(Ordering.natural())
fix when two segments have same interval review comments
6c1991f
to
fd6bf3f
Compare
handled review comments, depends on metamx/java-util#38 |
👍 assuming travis passes |
👍 |
Prioritize loading of segments based on segment interval
Issue - With current ordering of segments, coordinator uses bucketMonthComparator, In case we have a large number of segments to load/replicate for current month, Segment handoff can get delayed leading to realtime tasks taking a lot longer than complete than usual.
This PR attempts to prioritize loading latest segments first by ordering segments on basis of interval based comparator.