Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix client thread leak after origin reload. #421

Merged
merged 7 commits into from
Jun 4, 2019

Conversation

mikkokar
Copy link
Contributor

@mikkokar mikkokar commented Jun 3, 2019

This pull request addresses two issues:

A thread leak associated with origin reloads.

The Styx proxy.clientWorkerThreadsCount setting was not honoured. Styx is supposed to have only one client thread pool that is shared among all backend services and proxy.clientWorkderThradsCount is the number of threads in this pool.

Due to an error in styx code, this pool was instantiated once per each Styx backend service, resulting in a high number of client threads. More than configuration allowed.

Secondly, this resulted a thread leak. When new backend services were added, additional thread pools were being created. Because the thread pool was supposedly shared, it was never closed when a backend service was removed.

This PR ensures that the client thread pool is shared, and not separately created for each new backend service. I am also adding an integration test to ensure this leak won't happen in future.

A "global" event loop for StyxHttpClient.

This PR also adds a "global" event loop for StyxHttpClient. It lives in a static context processes all IO events for StyxHttpClient. Benefit is that a client is always ready to be used and doesn't need to be closed afterward. For example:

val client = StyxHttpClient.Builder().build()

client.send(get("/")...build())
    .wait()
    .status() shouldBe OK

# When using the global event loop it is no longer necessary to close
# the client afterwards. Hence the following line can be removed:
client.close()

As styx health check functionality uses this client, the health check IO events also go on to this global event loop.

The global pool size is derived from the CPU core count: two threads per CPU core.

@mikkokar mikkokar force-pushed the origin-reload-thread-leak branch from a15b411 to fd0e850 Compare June 3, 2019 07:26
@mikkokar mikkokar requested review from kvosper and dvlato June 3, 2019 07:30
return completedFuture(null);
}

return fromNettyFuture(eventLoopGroup.shutdownGracefully());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I understand this code correctly: if the eventLoopGroup has been injected externally (and thus could be potentially shared among several different connection factories) we close it here. That doesn't seem correct.
Also we are initializing a Default Global Event Loop that's not going to be used in most cases.

Copy link
Contributor Author

@mikkokar mikkokar Jun 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your criticism is justified. Let me see if the event loop can be closed outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading your comment again, and looking at code, I notice that eventLoopGroup is not really injected. It is created with the factory that has been externally injected.

But with memoization it may have the effect you describe: The same eventLoopGroup could end up being shared with different connection factories, and one of the factories may end up closing it on behalf of others.

This feels an incorrect usage of NettyConnectionFactory rather than an error in the NettyConnectionFactory itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sure, that's right. In this case we might be safe ourselves as we don't try to close the connectionFactory except for healthchecks - and the StyxHttpClient used in healthchecks is using a different eventLoopGroup than the forwarding path... But it seems it could lead to some issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should add comments to the javadoc explaining what pitfalls one should avoid when using the class, so misuse can be prevented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check this commit: a5d70cd.

It is a different approach and removes the close method.

Now the Netty eventLoopGroup and accompanying channel type are injected from outside. They are never closed. The client event loop group is created when styx starts and stays alive until shut.

The Styx health checks & Styx standalone client are now permanently fixed on "global" event loop group without option to inject its own. This can be improved if necessary.

return completedFuture(null);
}

return fromNettyFuture(eventLoopGroup.shutdownGracefully());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should add comments to the javadoc explaining what pitfalls one should avoid when using the class, so misuse can be prevented?

@mikkokar mikkokar force-pushed the origin-reload-thread-leak branch from 76b626e to 81748bd Compare June 4, 2019 08:58
mikkokar added 7 commits June 4, 2019 10:05
Add global event loop for StyxHttpClient.
Add clues to ExpiringConnectionSpec.
* Inject Netty eventLoopGroup and socketChannelClass from StyxServer
  down to NettyConnection Factory.

* Tidy up.
@mikkokar mikkokar force-pushed the origin-reload-thread-leak branch from 81748bd to 912c2ed Compare June 4, 2019 09:26
@mikkokar mikkokar merged commit 332a631 into ExpediaGroup:master Jun 4, 2019
@mikkokar mikkokar deleted the origin-reload-thread-leak branch June 4, 2019 09:54
dvlato pushed a commit that referenced this pull request Jul 3, 2019
* Also add a global event loop for StyxHttpClient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants