-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in statusConnectionPool #20
Comments
Hi @kuningfellow, Thank you for this very detailed issue, I agree there seems to be a deadlock. I also think there's a race condition in your test code, runPar := func(n int, wg []*sync.WaitGroup, f func(*Client), c *Client) {
for i := 0; i < n; i++ {
for _, w := range wg {
w.Add(i)
}
go f(c)
}
} Otherwise there's a discrepancy between the number of active goroutines and the number declared in the WaitGroup. Using this version I believe locking the connections found in the dead slice is unneeded. The connection pool is already locked in the beginning of My tests are conclusive that removing the locks like this, solves the issue: sort.Slice(cp.dead, func(i, j int) bool {
c1 := cp.dead[i]
c2 := cp.dead[j]
res := c1.Failures > c2.Failures
return res
}) Can I ask you test on your end and report your findings ? |
Hi @Anaethelion There is no discrepancy between waitgroup counts. Had there been any, my test code would've panicked due to negative wait count. It's true that the connection pool is locked before Thanks for the reply! |
I've revamped the logic to better handle the discovery and prevent overwriting the connection pool while it is in use. |
@Anaethelion Couldn't we just save the pool to a variable like so? This fixes the race condition. But I'm not sure if it will break something else.
Sorry for the diff. I haven't got permission to push a branch. |
Saving the pool to a variable solves the race condition, it doesn't really solves the concurrency problem. Saving the current connection to a local variable feels more like a performance optimization. It could prove faster at a small memory cost, but that would need to be benchmarked! |
Deadlock occurs at
statusConnectionPool.OnFailure
.https://github.com/elastic/elastic-transport-go/blob/v8.5.0/elastictransport/connection.go#L184
Here is my hypothesis:
Here is my test function:
I can consistently get it to deadlock, and when I open
http://localhost:8080/debug/pprof/goroutine?debug=1
I get this profileSuggested fix: in
*Client.Perform
:.Next()
OnSuccess()
orOnFailure()
Thanks.
The text was updated successfully, but these errors were encountered: