Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handoff should ignore segments that are dropped by drop rules #6676

Merged
merged 8 commits into from
Jan 7, 2019

Conversation

QiuMM
Copy link
Member

@QiuMM QiuMM commented Nov 28, 2018

Fix #5868.

@@ -21,7 +21,6 @@

import com.google.common.collect.Sets;
import com.google.common.util.concurrent.MoreExecutors;
import junit.framework.Assert;
Copy link
Member Author

@QiuMM QiuMM Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is deprecated, so I used org.junit.Assert to replace it.

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @QiuMM, thanks for raising this PR! I left a couple of comments.

@@ -95,6 +95,17 @@ void checkForSegmentHandoffs()
Map.Entry<SegmentDescriptor, Pair<Executor, Runnable>> entry = itr.next();
SegmentDescriptor descriptor = entry.getKey();
try {
if (coordinatorClient.isSpecificIntervalDroppedByRule(dataSource, descriptor.getInterval())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require more HTTP calls. I think it's better to use a single API to check this and get the server view below.
That API should be able to notify caller that the handoff for the requested segment will never happen.

We might improve the existing coordinator API, DataSourceResource.getSegmentDataSourceSpecificInterval() or we can add a new one for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll try it.

for (Rule rule : rules) {
if (rule.appliesTo(theInterval, now)) {
if (rule instanceof DropRule) {
dropped = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks not correct. First of all, rules can be empty. This effectively makes any segments to not be loaded into historicals. Also, only the first matched rule is applied. Even if there is a matched drop rule, a segment can be loaded if any load rule matches before checking the drop rule.

Copy link
Member Author

@QiuMM QiuMM Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is the whole loop:

boolean dropped = false;
    for (Rule rule : rules) {
      if (rule.appliesTo(theInterval, now)) {
        if (rule instanceof DropRule) {
          dropped = true;
        }
        break;
      }
    }

When found the first matched rule, the loop will break. So there is no doubt that only the first matched rule is applied.

I'll update this PR to handle the case that the rules is empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right. Thanks. I missed the break.

@fjy fjy added this to the 0.13.1 milestone Dec 3, 2018
@@ -620,23 +627,49 @@ private ImmutableDruidDataSource getDataSource(final String dataSourceName)
@Path("/{dataSourceName}/intervals/{interval}/serverview")
@Produces(MediaType.APPLICATION_JSON)
@ResourceFilters(DatasourceResourceFilter.class)
public Response getSegmentDataSourceSpecificInterval(
public Response getSegmentServerview(
Copy link
Member Author

@QiuMM QiuMM Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the method name to getSegmentServerview since I think it makes more sense, and there is another method using the same name getSegmentDataSourceSpecificInterval in this class.

@jihoonson
Copy link
Contributor

@QiuMM, thank you for updating quickly and sorry for the delayed review.

I've checked the latest changes, and would like to make one more suggestion. The changed coordinator API (getSegmentServerview) now returns a list of ImmutableSegmentLoadInfos with a dropped flag. This is an incompatible change because this is a documented public API and its return type has been changed. (Sorry, I didn't notice that before and recommended to change the existing one.) It means this PR should be labelled Design Review.

However, I think it's not necessary and there might be an easier way to fix this issue. I would suggest to add a new API for only this internal use case without changing the existing one. If you do this, you don't have to worry about compatibility. Also, if the new API just returns a boolean result which indicates that the given segment is handed off, is waiting for hand off, or will never be handed off, the API is lighter than returning the entire server view which is better in terms of efficiency. What do you think?

@QiuMM
Copy link
Member Author

QiuMM commented Dec 11, 2018

@jihoonson thanks for your review, your suggestion sounds good to me, I'll update it ASAP.

@gianm
Copy link
Contributor

gianm commented Dec 20, 2018

Hey @QiuMM - do you think you'll have a chance to come back to this one? Thanks!!

@QiuMM
Copy link
Member Author

QiuMM commented Dec 21, 2018

@gianm @jihoonson sorry for the late because I'm busy with my work, I'll update this pr on this weekend.

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiuMM thanks for the update. I left two more comments.

@Path("/{dataSourceName}/handoffComplete")
@Produces(MediaType.APPLICATION_JSON)
@ResourceFilters(DatasourceResourceFilter.class)
public Response isHandOffComplete(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it returns true when the segment will be never loaded or is already loaded. Otherwise, it returns `false. Would you add a javadoc about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return Response.ok(false).build();
}
catch (Exception e) {
return Response.serverError().entity(ImmutableMap.of("error", e.toString())).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably logging the full stack trace of exception would help as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -713,6 +717,7 @@ public Response isHandOffComplete(
return Response.ok(false).build();
}
catch (Exception e) {
log.error(e, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a proper error message like "Error while handling handoffComplete request"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @QiuMM. LGTM.

@jihoonson jihoonson merged commit 8ebb7b5 into apache:master Jan 7, 2019
@glasser
Copy link
Contributor

glasser commented Jan 8, 2019

Thanks @QiuMM! This issue caused us major confusion in our initial rollout of Druid (for better or for worse, our APM service uses untrusted timestamps from external users as Druid timestamps). This is huge for anybody receiving untrusted timestamps.

@gianm
Copy link
Contributor

gianm commented Jan 30, 2019

Unless I am missing something, won't this patch break rolling updates? New tasks won't be able to talk to old coordinators, and typically, we advise updating coordinators last: http://druid.io/docs/latest/operations/rolling-updates.html. If so, then there should be some kind of fallback to using the old API if the new one isn't available.

@jihoonson
Copy link
Contributor

@gianm thanks for catching it. I'll make a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants