Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize loading of segments based on segment interval #2261

Merged
merged 1 commit into from
Jan 27, 2016

Conversation

nishantmonu51
Copy link
Member

Issue - With current ordering of segments, coordinator uses bucketMonthComparator, In case we have a large number of segments to load/replicate for current month, Segment handoff can get delayed leading to realtime tasks taking a lot longer than complete than usual.

This PR attempts to prioritize loading latest segments first by ordering segments on basis of interval based comparator.

@xvrl
Copy link
Member

xvrl commented Jan 13, 2016

👍 I think we can also remove bucketMonthComparator in druid-api, I don't see it being used anywhere else.

@nishantmonu51 nishantmonu51 force-pushed the improve-segment-ordering branch from da72fb8 to a001743 Compare January 14, 2016 09:05
@nishantmonu51
Copy link
Member Author

@drcrallen @fjy @gianm @himanshug : can one of you take a look at this PR ?

@nishantmonu51 nishantmonu51 added this to the 0.9.0 milestone Jan 21, 2016
@drcrallen
Copy link
Contributor

Prior behavior is to prioritize by month, and if months are equal, to prioritize on the DataSegment identifier.

This PR changes behavior to load based on complete timestamp of the start of the interval for the DataSegment.

The proposed behavior deviates from prior in that segments with smaller granularity will be favored more in this Comparator compared to the prior Comparator. (ex: Hour segments will be favored over Day segments).

Is there any risk the Coordinator segment balancer will spend too long just balancing batches of hour segments, and won't ever try to rebalance older segments?

@himanshug
Copy link
Contributor

@nishantmonu51 concern pointed by @drcrallen might be very real when you have 2 dataSources with segments of 15min and hour/day granularity , it will be good if we can incorporate the the interval size into the comparator too.
other than that LGTM

@nishantmonu51
Copy link
Member Author

@drcrallen @himanshug: In prior behavior for segments with same month, we used to order things by the dataSource name as the segment Identifier starts with the dataSource, So in case of 2 dataSources with segments of 15min and hour/day granularity, It will favor the one whose name comes first lexicographically instead of depending on segment granularity.

For proposed behavior, how do you feel about the comparator being changed to compare IntervalEndThenStart to always load the segments having more recent data first ?

@drcrallen
Copy link
Contributor

@nishantmonu51 I think that's a bit better than either prior case or the first proposed case in this PR.

@himanshug
Copy link
Contributor

@nishantmonu51 SGTM

public int compare(DataSegment lhs, DataSegment rhs)
{
int retVal = Comparators.intervalsByStartThenEnd().compare(lhs.getInterval(), rhs.getInterval());
return retVal != 0 ? retVal : lhs.compareTo(rhs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding a check here, you can chain comparators using Ordering.from(<intervalcomparator>).compound(Ordering.natural())

fix when two segments have same interval

review comments
@nishantmonu51 nishantmonu51 force-pushed the improve-segment-ordering branch from 6c1991f to fd6bf3f Compare January 27, 2016 12:06
@nishantmonu51
Copy link
Member Author

handled review comments, depends on metamx/java-util#38

@drcrallen drcrallen closed this Jan 27, 2016
@drcrallen drcrallen reopened this Jan 27, 2016
@drcrallen
Copy link
Contributor

👍 assuming travis passes

@xvrl
Copy link
Member

xvrl commented Jan 27, 2016

👍

xvrl added a commit that referenced this pull request Jan 27, 2016
Prioritize loading of segments based on segment interval
@xvrl xvrl merged commit e3d1e07 into apache:master Jan 27, 2016
@xvrl xvrl deleted the improve-segment-ordering branch January 27, 2016 18:05
@fjy fjy mentioned this pull request Feb 5, 2016
@fjy fjy added the Improvement label Feb 5, 2016
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants