-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix client thread leak after origin reload. #421
Fix client thread leak after origin reload. #421
Conversation
a15b411
to
fd0e850
Compare
return completedFuture(null); | ||
} | ||
|
||
return fromNettyFuture(eventLoopGroup.shutdownGracefully()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I understand this code correctly: if the eventLoopGroup has been injected externally (and thus could be potentially shared among several different connection factories) we close it here. That doesn't seem correct.
Also we are initializing a Default Global Event Loop that's not going to be used in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your criticism is justified. Let me see if the event loop can be closed outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading your comment again, and looking at code, I notice that eventLoopGroup
is not really injected. It is created with the factory that has been externally injected.
But with memoization it may have the effect you describe: The same eventLoopGroup
could end up being shared with different connection factories, and one of the factories may end up closing it on behalf of others.
This feels an incorrect usage of NettyConnectionFactory
rather than an error in the NettyConnectionFactory
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sure, that's right. In this case we might be safe ourselves as we don't try to close the connectionFactory except for healthchecks - and the StyxHttpClient used in healthchecks is using a different eventLoopGroup than the forwarding path... But it seems it could lead to some issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should add comments to the javadoc explaining what pitfalls one should avoid when using the class, so misuse can be prevented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check this commit: a5d70cd.
It is a different approach and removes the close
method.
Now the Netty eventLoopGroup
and accompanying channel type are injected from outside. They are never closed. The client event loop group is created when styx starts and stays alive until shut.
The Styx health checks & Styx standalone client are now permanently fixed on "global" event loop group without option to inject its own. This can be improved if necessary.
components/proxy/src/main/java/com/hotels/styx/StyxPipelineFactory.java
Outdated
Show resolved
Hide resolved
return completedFuture(null); | ||
} | ||
|
||
return fromNettyFuture(eventLoopGroup.shutdownGracefully()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should add comments to the javadoc explaining what pitfalls one should avoid when using the class, so misuse can be prevented?
components/proxy/src/main/java/com/hotels/styx/StyxPipelineFactory.java
Outdated
Show resolved
Hide resolved
components/proxy/src/main/java/com/hotels/styx/routing/handlers/BackendServiceProxy.java
Outdated
Show resolved
Hide resolved
components/proxy/src/test/java/com/hotels/styx/routing/StaticPipelineBuilderTest.java
Outdated
Show resolved
Hide resolved
components/proxy/src/test/java/com/hotels/styx/routing/StaticPipelineBuilderTest.java
Outdated
Show resolved
Hide resolved
...ests/ft-suite/src/test/kotlin/com/hotels/styx/admin/FileBasedOriginsFileChangeMonitorSpec.kt
Outdated
Show resolved
Hide resolved
system-tests/ft-suite/src/test/kotlin/com/hotels/styx/resiliency/OriginResourcesSpec.kt
Outdated
Show resolved
Hide resolved
system-tests/ft-suite/src/test/kotlin/com/hotels/styx/resiliency/OriginResourcesSpec.kt
Outdated
Show resolved
Hide resolved
...client/src/main/java/com/hotels/styx/client/netty/connectionpool/NettyConnectionFactory.java
Outdated
Show resolved
Hide resolved
...client/src/main/java/com/hotels/styx/client/netty/connectionpool/NettyConnectionFactory.java
Outdated
Show resolved
Hide resolved
...client/src/main/java/com/hotels/styx/client/netty/connectionpool/NettyConnectionFactory.java
Outdated
Show resolved
Hide resolved
components/proxy/src/main/java/com/hotels/styx/proxy/BackendServicesRouter.java
Outdated
Show resolved
Hide resolved
components/proxy/src/test/java/com/hotels/styx/proxy/BackendServicesRouterTest.java
Outdated
Show resolved
Hide resolved
components/proxy/src/test/java/com/hotels/styx/routing/StaticPipelineBuilderTest.java
Outdated
Show resolved
Hide resolved
76b626e
to
81748bd
Compare
Add global event loop for StyxHttpClient. Add clues to ExpiringConnectionSpec.
* Inject Netty eventLoopGroup and socketChannelClass from StyxServer down to NettyConnection Factory. * Tidy up.
81748bd
to
912c2ed
Compare
* Also add a global event loop for StyxHttpClient.
This pull request addresses two issues:
A thread leak associated with origin reloads.
The Styx
proxy.clientWorkerThreadsCount
setting was not honoured. Styx is supposed to have only one client thread pool that is shared among all backend services andproxy.clientWorkderThradsCount
is the number of threads in this pool.Due to an error in styx code, this pool was instantiated once per each Styx backend service, resulting in a high number of client threads. More than configuration allowed.
Secondly, this resulted a thread leak. When new backend services were added, additional thread pools were being created. Because the thread pool was supposedly shared, it was never closed when a backend service was removed.
This PR ensures that the client thread pool is shared, and not separately created for each new backend service. I am also adding an integration test to ensure this leak won't happen in future.
A "global" event loop for StyxHttpClient.
This PR also adds a "global" event loop for
StyxHttpClient
. It lives in a static context processes all IO events forStyxHttpClient
. Benefit is that a client is always ready to be used and doesn't need to be closed afterward. For example:As styx health check functionality uses this client, the health check IO events also go on to this global event loop.
The global pool size is derived from the CPU core count: two threads per CPU core.