Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] : Add goleak check to plugin/storage/integration #5468

Closed
wants to merge 34 commits into from

Conversation

varshith257
Copy link
Contributor

@varshith257 varshith257 commented May 21, 2024

Which problem is this PR solving?

Description of the changes

  • Added goleak checks for to plugin/storage/integration test files

image

How was this change tested?

Checklist

Signed-off-by: Vamshi Maskuri <[email protected]>
@varshith257 varshith257 requested a review from a team as a code owner May 21, 2024 02:46
@varshith257 varshith257 requested a review from albertteoh May 21, 2024 02:46
@varshith257 varshith257 marked this pull request as draft May 21, 2024 02:48
Signed-off-by: Vamshi Maskuri <[email protected]>
@varshith257 varshith257 marked this pull request as ready for review May 21, 2024 03:11
@varshith257
Copy link
Contributor Author

varshith257 commented May 21, 2024

@yurishkuro Found some unexpected goroutines leak from go test ./plugin/storage/integration after adding check for goleak:

goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 6 in state select, with github.com/golang/glog.(*fileSink).flushDaemon on top of the stack:
github.com/golang/glog.(*fileSink).flushDaemon(0x2b76918)
        /home/vamshi/go/pkg/mod/github.com/golang/[email protected]/glog_file.go:351 +0xb9
created by github.com/golang/glog.init.1 in goroutine 1
        /home/vamshi/go/pkg/mod/github.com/golang/[email protected]/glog_file.go:166 +0x126
]
FAIL    github.com/jaegertracing/jaeger/plugin/storage/integration      0.467s

Do this leaks need to be fixed or ignored?

@varshith257
Copy link
Contributor Author

varshith257 commented May 21, 2024

It seems the goroutine leak is created by the glog package. It is part of glog's internal mechanism to periodically flush log data to files. It is in a select state, waiting for a condition to either flush the logs or terminate.

@yurishkuro
Copy link
Member

I suggest you start with getting make lint green

@varshith257
Copy link
Contributor Author

I suggest you start with getting make lint green

Done!

(zulip-py3-venv) vamshi@LAPTOP-L0PR20E0:~/jaeger$ go test ./plugin/storage/integration
ok      github.com/jaegertracing/jaeger/plugin/storage/integration      0.016s

@varshith257
Copy link
Contributor Author

It seems like the test failure is due to the detection of unexpected goroutines, particularly those related to the net/http package and the olivere/elastic package.

Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
// be stopped when the test finishes, leading to a detected but expected leak.
func IgnoreElasticHealthCheckerLeak() goleak.Option {
return goleak.IgnoreTopFunction("github.com/olivere/elastic.(*Client).healthchecker")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we don't want to do that. We ignore the other ones because those libs always leak goroutines with no ability to shut them down. In contrast, this ES client leaking is our fault, the unit test is kinda unstable and doesn't properly close all clients.

Copy link
Contributor Author

@varshith257 varshith257 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we look at a way of using the defer Close() method?
But I am not sure, there are already cleanup functions.
But yes, the tests look flaky as there are no client close methods used anywhere except the cleanup methods.

@yurishkuro Any suggestions on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we don't want to do that. We ignore the other ones because those libs always leak goroutines with no ability to shut them down. In contrast, this ES client leaking is our fault, the unit test is kinda unstable and doesn't properly close all clients.

@yurishkuro Here's our conversation for it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to track down the leaks and ensure Close are called as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuriksho The builds are failing due to this:

created by github.com/olivere/elastic.DialContext in goroutine 166
/home/runner/go/pkg/mod/github.com/olivere/[email protected]+incompatible/client.go:404 +0xcd7
 Goroutine 239 in state select, with github.com/olivere/elastic.(*Client).healthchecker on top of the stack:
github.com/olivere/elastic.(*Client).healthchecker(0xc000525a00)


and 

Goroutine 90 in state select, with net/http.(*persistConn).readLoop on top of the stack:
net/http.(*persistConn).readLoop(0xc002b32b40)
	/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:2261 +0x154c
created by net/http.(*Transport).dialConn in goroutine 102
	/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:1799 +0x2774
 Goroutine 26 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
internal/poll.runtime_pollWait(0x7fc30f538b58, 0x72)
	/opt/hostedtoolcache/go/1.22.3/x64/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xc000114520, 0x72, 0x0)
	/opt/hostedtoolcache/go/1.22.3/x64/src/internal/poll/fd_poll_runtime.go:84 +0xb1
internal/poll.(*pollDesc).waitRead(...)
	/opt/hostedtoolcache/go/1.22.3/x64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000114500, {0xc00051c000, 0x1000, 0x1000})
	/opt/hostedtoolcache/go/1.22.3/x64/src/internal/poll/fd_unix.go:164 +0x466
net.(*netFD).Read(0xc000114500, {0xc00051c000, 0x1000, 0x1000})
	/opt/hostedtoolcache/go/1.22.3/x64/src/net/fd_posix.go:55 +0x4b
net.(*conn).Read(0xc00010e138, {0xc00051c000, 0x1000, 0x1000})
	/opt/hostedtoolcache/go/1.22.3/x64/src/net/net.go:179 +0xad
net/http.(*persistConn).Read(0xc000518240, {0xc00051c000, 0x1000, 0x1000})
	/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:1977 +0xf9
bufio.(*Reader).fill(0xc00010baa0)
	/opt/hostedtoolcache/go/1.22.3/x64/src/bufio/bufio.go:110 +0x2b0
bufio.(*Reader).Peek(0xc00010baa0, 0x1)
	/opt/hostedtoolcache/go/1.22.3/x64/src/bufio/bufio.go:148 +0xc7
net/http.(*persistConn).readLoop(0xc000518240)
	/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:2141 +0x354
created by net/http.(*Transport).dialConn in goroutine 24
	/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:1799 +0x2[774](https://github.com/jaegertracing/jaeger/actions/runs/9208742224/job/25331628794?pr=5468#step:9:775)
 Goroutine 27 in state select, with net/http.(*persistConn).writeLoop on top of the stack:
net/http.(*persistConn).writeLoop(0xc000518240)
	/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:2444 +0x1c5
created by net/http.(*Transport).dialConn in goroutine 24
	/opt/hostedtoolcache/go/1.22.3/x64/src/net/http/transport.go:1800 +0x27ff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried different methods of using close methods but they didn't turn around to work on. Any suggestions/alter approaches to try on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried different methods of using close methods

trying random things is not how to solve this problem. We need to identify which specific test is causing the leak, and understand what causes the ES client to leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging

@varshith257
Copy link
Contributor Author

how is it done?
image

The \ has been added unexpectedly 😅

varshith257 and others added 3 commits May 23, 2024 18:30
Signed-off-by: Vamshi Maskuri <[email protected]>
Bumps the otel-collector group with 30 updates:

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
varshith257 and others added 5 commits May 23, 2024 18:31
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
@varshith257
Copy link
Contributor Author

@yurishkuro It's done now!

image

@yurishkuro
Copy link
Member

The CI is still broken

@varshith257
Copy link
Contributor Author

The CI is still broken

#5468 (comment)

I have tried different methods of using close methods but they didn't turn around to work on.

@varshith257 varshith257 marked this pull request as draft May 23, 2024 18:38
@varshith257 varshith257 changed the title Add goleak check to plugin/storage/integration [WIP] : Add goleak check to plugin/storage/integration Jun 5, 2024
@varshith257
Copy link
Contributor Author

@yurishkuro Once have a look at goroutine leaks of Kafka. Multiple parts of Kafka making the leaks. Can we have a close method for each component of it to better overcome this or any alternatives?

@@ -133,5 +133,6 @@ func TestKafkaStorage(t *testing.T) {
SkipUnlessEnv(t, "kafka")
s := &KafkaIntegrationTestSuite{}
s.initialize(t)
defer s.CleanUp(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

@@ -418,6 +418,10 @@ func testPasswordFromFile(t *testing.T, f *Factory, getClient func() es.Client,
require.NoError(t, os.WriteFile(newPwdFile, []byte(pwd2), 0o600))
require.NoError(t, os.Rename(newPwdFile, pwdFile))

// Close the old client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not related to the PR title

@varshith257
Copy link
Contributor Author

Holding this PR for now, there are a lot of leaks in storage dir. Will be further investigate and make progress on this

@yurishkuro
Copy link
Member

The tactic I would recommend is:

  • add goleak check to individual tests
  • verify that they are clean, or fix if they are not
  • submit a PR
  • repeat for other tests as independent PRs

Once all tests are clean with individual checks, add a final PR to remove individual checks and only add a package level check (and fix the script to no longer having relaxed requirement for integration pkg)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants