-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handoff should ignore segments that are dropped by drop rules #6676
Conversation
@@ -21,7 +21,6 @@ | |||
|
|||
import com.google.common.collect.Sets; | |||
import com.google.common.util.concurrent.MoreExecutors; | |||
import junit.framework.Assert; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is deprecated, so I used org.junit.Assert
to replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @QiuMM, thanks for raising this PR! I left a couple of comments.
@@ -95,6 +95,17 @@ void checkForSegmentHandoffs() | |||
Map.Entry<SegmentDescriptor, Pair<Executor, Runnable>> entry = itr.next(); | |||
SegmentDescriptor descriptor = entry.getKey(); | |||
try { | |||
if (coordinatorClient.isSpecificIntervalDroppedByRule(dataSource, descriptor.getInterval())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will require more HTTP calls. I think it's better to use a single API to check this and get the server view below.
That API should be able to notify caller that the handoff for the requested segment will never happen.
We might improve the existing coordinator API, DataSourceResource.getSegmentDataSourceSpecificInterval()
or we can add a new one for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll try it.
for (Rule rule : rules) { | ||
if (rule.appliesTo(theInterval, now)) { | ||
if (rule instanceof DropRule) { | ||
dropped = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks not correct. First of all, rules can be empty. This effectively makes any segments to not be loaded into historicals. Also, only the first matched rule is applied. Even if there is a matched drop rule, a segment can be loaded if any load rule matches before checking the drop rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below is the whole loop:
boolean dropped = false;
for (Rule rule : rules) {
if (rule.appliesTo(theInterval, now)) {
if (rule instanceof DropRule) {
dropped = true;
}
break;
}
}
When found the first matched rule, the loop will break. So there is no doubt that only the first matched rule is applied.
I'll update this PR to handle the case that the rules
is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right. Thanks. I missed the break
.
@@ -620,23 +627,49 @@ private ImmutableDruidDataSource getDataSource(final String dataSourceName) | |||
@Path("/{dataSourceName}/intervals/{interval}/serverview") | |||
@Produces(MediaType.APPLICATION_JSON) | |||
@ResourceFilters(DatasourceResourceFilter.class) | |||
public Response getSegmentDataSourceSpecificInterval( | |||
public Response getSegmentServerview( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the method name to getSegmentServerview
since I think it makes more sense, and there is another method using the same name getSegmentDataSourceSpecificInterval
in this class.
@QiuMM, thank you for updating quickly and sorry for the delayed review. I've checked the latest changes, and would like to make one more suggestion. The changed coordinator API ( However, I think it's not necessary and there might be an easier way to fix this issue. I would suggest to add a new API for only this internal use case without changing the existing one. If you do this, you don't have to worry about compatibility. Also, if the new API just returns a boolean result which indicates that the given segment is handed off, is waiting for hand off, or will never be handed off, the API is lighter than returning the entire server view which is better in terms of efficiency. What do you think? |
@jihoonson thanks for your review, your suggestion sounds good to me, I'll update it ASAP. |
Hey @QiuMM - do you think you'll have a chance to come back to this one? Thanks!! |
@gianm @jihoonson sorry for the late because I'm busy with my work, I'll update this pr on this weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QiuMM thanks for the update. I left two more comments.
@Path("/{dataSourceName}/handoffComplete") | ||
@Produces(MediaType.APPLICATION_JSON) | ||
@ResourceFilters(DatasourceResourceFilter.class) | ||
public Response isHandOffComplete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it returns true
when the segment will be never loaded or is already loaded. Otherwise, it returns `false. Would you add a javadoc about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return Response.ok(false).build(); | ||
} | ||
catch (Exception e) { | ||
return Response.serverError().entity(ImmutableMap.of("error", e.toString())).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably logging the full stack trace of exception would help as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -713,6 +717,7 @@ public Response isHandOffComplete( | |||
return Response.ok(false).build(); | |||
} | |||
catch (Exception e) { | |||
log.error(e, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a proper error message like "Error while handling handoffComplete request"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @QiuMM. LGTM.
Thanks @QiuMM! This issue caused us major confusion in our initial rollout of Druid (for better or for worse, our APM service uses untrusted timestamps from external users as Druid timestamps). This is huge for anybody receiving untrusted timestamps. |
Unless I am missing something, won't this patch break rolling updates? New tasks won't be able to talk to old coordinators, and typically, we advise updating coordinators last: http://druid.io/docs/latest/operations/rolling-updates.html. If so, then there should be some kind of fallback to using the old API if the new one isn't available. |
@gianm thanks for catching it. I'll make a fix. |
Fix #5868.