Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for loadByPeriod rule #1362

Closed
wants to merge 1 commit into from
Closed

Fix for loadByPeriod rule #1362

wants to merge 1 commit into from

Conversation

anubhgup
Copy link
Contributor

If a segment (say created by merge from multiple smaller segments), has a start time less than the loadByPeriod start time, it is currently dropped by the coordinator even if it overlaps the load period. This CL fixes that.

@@ -83,6 +83,6 @@ public boolean appliesTo(DataSegment segment, DateTime referenceTimestamp)
public boolean appliesTo(Interval interval, DateTime referenceTimestamp)
{
final Interval currInterval = new Interval(period, referenceTimestamp);
return currInterval.overlaps(interval) && interval.getStartMillis() >= currInterval.getStartMillis();
return currInterval.overlaps(interval);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind adding a unit test too?

@fjy
Copy link
Contributor

fjy commented May 14, 2015

@anubhgup This is not a bug and was intended by design. This change also breaks router logic.

…loadByPeriod rule, even if the start time of the segment is less than the period start.
@anubhgup
Copy link
Contributor Author

Added unit test.

@anubhgup
Copy link
Contributor Author

What are the intended semantics of loadByPeriod? In my use case, this is breaking things since it leads to a segment getting dropped even though it contains data in my retention period. How can I work around this problem? Currently, I have a retention rule that says "loadByPeriod=7D, dropForever". When I enabled segment merging, any segment that spans that 7D boundary gets dropped by the coordinator.

@fjy
Copy link
Contributor

fjy commented May 14, 2015

@anubhgup Sorry I had to run before. Writing up a more complete view on loadByPeriod.

I am actually okay with redefining the semantics of loadByPeriod, and they should be changed to make more sense. The one use case that is currently problematic is that the router uses rules to determine how to forward queries. In a setup where you have different brokers dedicated to different time ranges, if the router receives a 1 yr query, we want this query to route to the broker dedicated for slow (older) queries. The changes here would mean the query would be forwarded to the brokers dedicated for fast queries. I am 100% in favor of removing the router entirely and having the routing logic live on the brokers.

One simple workaround is to introduce a new rule type, loadByPeriodByOverlap or something and use that. The correct solution is to rethink the router and some of the rules, which may require some more time dedication.

@anubhgup
Copy link
Contributor Author

Thanks @fjy. I can introduce the new rule type. Would it be useful to merge the new rule CL to the main branch, or should I just implement that for local use?

@xvrl
Copy link
Member

xvrl commented May 15, 2015

@anubhgup feel free to submit a PR, I don't see a reason not to include it in the main branch.

@cheddar
Copy link
Contributor

cheddar commented May 18, 2015

Can we call it loadByOverlapPeriod instead?

Also, those semantics about dropping even though it has relevant data are very unintuitive, was that introduced when we added the router? Just as a general rule, the load/drop/retention rule's semantics should be defined according to what makes sense for the retention of segments. If the router's logic cannot re-use them as is, then it shouldn't be a problem with the interface, it just means that the router cannot reuse that same logic...

@anubhgup anubhgup closed this May 26, 2015
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants