-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] : Add goleak check to plugin/storage/integration #5468
Conversation
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
@yurishkuro Found some unexpected goroutines leak from go test
Do this leaks need to be fixed or ignored? |
It seems the goroutine leak is created by the glog package. It is part of glog's internal mechanism to periodically flush log data to files. It is in a select state, waiting for a condition to either flush the logs or terminate. |
I suggest you start with getting |
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Done!
|
It seems like the test failure is due to the detection of unexpected goroutines, particularly those related to the |
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
pkg/testutils/leakcheck.go
Outdated
// be stopped when the test finishes, leading to a detected but expected leak. | ||
func IgnoreElasticHealthCheckerLeak() goleak.Option { | ||
return goleak.IgnoreTopFunction("github.com/olivere/elastic.(*Client).healthchecker") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we don't want to do that. We ignore the other ones because those libs always leak goroutines with no ability to shut them down. In contrast, this ES client leaking is our fault, the unit test is kinda unstable and doesn't properly close all clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we look at a way of using the defer Close() method?
But I am not sure, there are already cleanup functions.
But yes, the tests look flaky as there are no client close methods used anywhere except the cleanup methods.
@yurishkuro Any suggestions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we don't want to do that. We ignore the other ones because those libs always leak goroutines with no ability to shut them down. In contrast, this ES client leaking is our fault, the unit test is kinda unstable and doesn't properly close all clients.
@yurishkuro Here's our conversation for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to track down the leaks and ensure Close are called as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuriksho The builds are failing due to this:
created by github.com/olivere/elastic.DialContext in goroutine 166
/home/runner/go/pkg/mod/github.com/olivere/[email protected]+incompatible/client.go:404 +0xcd7
Goroutine 239 in state select, with github.com/olivere/elastic.(*Client).healthchecker on top of the stack:
github.com/olivere/elastic.(*Client).healthchecker(0xc000525a00)
and
Goroutine 90 in state select, with net/http.(*persistConn).readLoop on top of the stack:
net/http.(*persistConn).readLoop(0xc002b32b40)
/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:2261 +0x154c
created by net/http.(*Transport).dialConn in goroutine 102
/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:1799 +0x2774
Goroutine 26 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
internal/poll.runtime_pollWait(0x7fc30f538b58, 0x72)
/opt/hostedtoolcache/go/1.22.3/x64/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xc000114520, 0x72, 0x0)
/opt/hostedtoolcache/go/1.22.3/x64/src/internal/poll/fd_poll_runtime.go:84 +0xb1
internal/poll.(*pollDesc).waitRead(...)
/opt/hostedtoolcache/go/1.22.3/x64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000114500, {0xc00051c000, 0x1000, 0x1000})
/opt/hostedtoolcache/go/1.22.3/x64/src/internal/poll/fd_unix.go:164 +0x466
net.(*netFD).Read(0xc000114500, {0xc00051c000, 0x1000, 0x1000})
/opt/hostedtoolcache/go/1.22.3/x64/src/net/fd_posix.go:55 +0x4b
net.(*conn).Read(0xc00010e138, {0xc00051c000, 0x1000, 0x1000})
/opt/hostedtoolcache/go/1.22.3/x64/src/net/net.go:179 +0xad
net/http.(*persistConn).Read(0xc000518240, {0xc00051c000, 0x1000, 0x1000})
/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:1977 +0xf9
bufio.(*Reader).fill(0xc00010baa0)
/opt/hostedtoolcache/go/1.22.3/x64/src/bufio/bufio.go:110 +0x2b0
bufio.(*Reader).Peek(0xc00010baa0, 0x1)
/opt/hostedtoolcache/go/1.22.3/x64/src/bufio/bufio.go:148 +0xc7
net/http.(*persistConn).readLoop(0xc000518240)
/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:2141 +0x354
created by net/http.(*Transport).dialConn in goroutine 24
/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:1799 +0x2[774](https://github.com/jaegertracing/jaeger/actions/runs/9208742224/job/25331628794?pr=5468#step:9:775)
Goroutine 27 in state select, with net/http.(*persistConn).writeLoop on top of the stack:
net/http.(*persistConn).writeLoop(0xc000518240)
/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:2444 +0x1c5
created by net/http.(*Transport).dialConn in goroutine 24
/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:1800 +0x27ff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried different methods of using close methods but they didn't turn around to work on. Any suggestions/alter approaches to try on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried different methods of using close methods
trying random things is not how to solve this problem. We need to identify which specific test is causing the leak, and understand what causes the ES client to leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Bumps the otel-collector group with 30 updates: Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
@yurishkuro It's done now! |
The CI is still broken |
I have tried different methods of using close methods but they didn't turn around to work on. |
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
@yurishkuro Once have a look at goroutine leaks of Kafka. Multiple parts of Kafka making the leaks. Can we have a close method for each component of it to better overcome this or any alternatives? |
@@ -133,5 +133,6 @@ func TestKafkaStorage(t *testing.T) { | |||
SkipUnlessEnv(t, "kafka") | |||
s := &KafkaIntegrationTestSuite{} | |||
s.initialize(t) | |||
defer s.CleanUp(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
@@ -418,6 +418,10 @@ func testPasswordFromFile(t *testing.T, f *Factory, getClient func() es.Client, | |||
require.NoError(t, os.WriteFile(newPwdFile, []byte(pwd2), 0o600)) | |||
require.NoError(t, os.Rename(newPwdFile, pwdFile)) | |||
|
|||
// Close the old client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not related to the PR title
Holding this PR for now, there are a lot of leaks in storage dir. Will be further investigate and make progress on this |
The tactic I would recommend is:
Once all tests are clean with individual checks, add a final PR to remove individual checks and only add a package level check (and fix the script to no longer having relaxed requirement for integration pkg) |
Which problem is this PR solving?
Description of the changes
How was this change tested?
Checklist
jaeger
:make lint test
jaeger-ui
:yarn lint
andyarn test