-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for loadByPeriod rule #1362
Conversation
@@ -83,6 +83,6 @@ public boolean appliesTo(DataSegment segment, DateTime referenceTimestamp) | |||
public boolean appliesTo(Interval interval, DateTime referenceTimestamp) | |||
{ | |||
final Interval currInterval = new Interval(period, referenceTimestamp); | |||
return currInterval.overlaps(interval) && interval.getStartMillis() >= currInterval.getStartMillis(); | |||
return currInterval.overlaps(interval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you mind adding a unit test too?
@anubhgup This is not a bug and was intended by design. This change also breaks router logic. |
…loadByPeriod rule, even if the start time of the segment is less than the period start.
Added unit test. |
What are the intended semantics of loadByPeriod? In my use case, this is breaking things since it leads to a segment getting dropped even though it contains data in my retention period. How can I work around this problem? Currently, I have a retention rule that says "loadByPeriod=7D, dropForever". When I enabled segment merging, any segment that spans that 7D boundary gets dropped by the coordinator. |
@anubhgup Sorry I had to run before. Writing up a more complete view on loadByPeriod. I am actually okay with redefining the semantics of loadByPeriod, and they should be changed to make more sense. The one use case that is currently problematic is that the router uses rules to determine how to forward queries. In a setup where you have different brokers dedicated to different time ranges, if the router receives a 1 yr query, we want this query to route to the broker dedicated for slow (older) queries. The changes here would mean the query would be forwarded to the brokers dedicated for fast queries. I am 100% in favor of removing the router entirely and having the routing logic live on the brokers. One simple workaround is to introduce a new rule type, loadByPeriodByOverlap or something and use that. The correct solution is to rethink the router and some of the rules, which may require some more time dedication. |
Thanks @fjy. I can introduce the new rule type. Would it be useful to merge the new rule CL to the main branch, or should I just implement that for local use? |
@anubhgup feel free to submit a PR, I don't see a reason not to include it in the main branch. |
Can we call it Also, those semantics about dropping even though it has relevant data are very unintuitive, was that introduced when we added the router? Just as a general rule, the load/drop/retention rule's semantics should be defined according to what makes sense for the retention of segments. If the router's logic cannot re-use them as is, then it shouldn't be a problem with the interface, it just means that the router cannot reuse that same logic... |
…ong type returning expression functions
If a segment (say created by merge from multiple smaller segments), has a start time less than the loadByPeriod start time, it is currently dropped by the coordinator even if it overlaps the load period. This CL fixes that.