Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: long-running queries shouldn't block the main event loop #7420

Merged
merged 4 commits into from
Apr 23, 2021

Conversation

vvcephei
Copy link
Member

Description

Currently, due to #7358, long-running queries block the main server event loop, even though they execute on the worker pool instead of the event loop pool.
This patch corrects this flaw.

Testing done

Added integration tests to reproduce the bug (#7358).
Note that the newer HTTP2 API did not have the bug, but the integration test for the older API does hang and time out without the fix in place.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@vvcephei vvcephei requested review from spena and a team April 22, 2021 19:03
@vvcephei vvcephei self-assigned this Apr 22, 2021
@ghost
Copy link

ghost commented Apr 22, 2021

@confluentinc It looks like @vvcephei just signed our Contributor License Agreement. 👍

Always at your service,

clabot

@@ -296,6 +310,30 @@ public void tearDown() {
REST_APP.getServiceContext().close();
}

@Test
public void shouldStreamMultiplePushQueries() throws Exception {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works with and without my patch, but it doesn't hurt to have a test for it.

@@ -958,29 +996,24 @@ public void shouldListTopics() throws Exception {
}

@Test
public void shouldListQueries() {
public void shouldListQueries() throws ExecutionException, InterruptedException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was flake-prone, since it assumes that it could serialize itself after other tests (i.e., wait until all other tests' queries have stopped) and that other tests wouldn't start new queries during its execution.

There doesn't seem to be a reason for this, so I changed the test just to verify the thing it wanted to verify, regardless of what other queries are running.

Comment on lines +70 to +73
/**
* This integration test is displaced from the rest-client package
* to make use of utilities that are not available there.
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty obnoxious.

The integration test needs to start a REST server, but in the rest-client test module, we don't have a dependency on the rest-server test module. And we can't add one because rest-server test module already depends on rest-client main module.

I can think of two solutions that are potentially better than putting it here: create a new "integration-test" module that depends on all the other modules, or relocate this integration test to the rest-server test module.

}
}, vcf);
},
false /*if this is true, worker execution blocks the main event loop*/,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of lines changed because I fixed the indentation, but this is what actually changed here.

Copy link
Member

@AlanConfluent AlanConfluent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems from the documentation that one worker task can block another if they are invoked from the same context. Not sure if that's what you're describing in the description (though it certainly would cause the bug referenced). Anyway, in our world, worker invocations are all independent.

@spena
Copy link
Member

spena commented Apr 22, 2021

Hey @vvcephei, the PR looks good. Did you look into the other places where executeBlocking is used? I found these files that do not have the false flag either. I don't know if those could block the event loop too.

JaasAuthProvider.authenticate()
KsqlAuthorizationProviderHandler.handle()

Copy link
Member

@spena spena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Just look at the other 2 files that call executeBlocking() to see if they need the false too.

@vvcephei
Copy link
Member Author

Thanks, @spena !

I also saw those, but I had enough trouble with these integration tests that I lost my appetite to do it all in one chunk. I think I'll file some follow-on tickets to try and produce blocking in those contexts and fix it there. My instinct says that we probably never want to block the main event loop on the worker pool. Otherwise, why would we bother delegating to the worker pool? If you think it's ok to just change those other two usages without tests, I can certainly do that in this PR.

Thanks,
-John

@vvcephei vvcephei changed the title long-running queries shouldn't block the main event loop fix: long-running queries shouldn't block the main event loop Apr 23, 2021
@spena
Copy link
Member

spena commented Apr 23, 2021

@vvcephei I think it's ok to change it without tests. We should not block the event loop as you said.

@vvcephei
Copy link
Member Author

Thanks, @spena ! Since I have a green build right now, I'll go ahead and capitalize on it by merging this PR. I'll follow up right away with a separate PR to change those other two methods.

Thanks for the reviews, @spena and @AlanConfluent !

@vvcephei vvcephei merged commit 242fefb into master Apr 23, 2021
@vvcephei vvcephei deleted the 7172-fix-blocking-query branch April 23, 2021 14:25
@vvcephei vvcephei added this to the 0.18.0 milestone Apr 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants