Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky] pkg/k8s.TestJobRunnerRun sometimes hangs #68

Closed
cardil opened this issue Oct 15, 2021 · 4 comments · Fixed by #69
Closed

[Flaky] pkg/k8s.TestJobRunnerRun sometimes hangs #68

cardil opened this issue Oct 15, 2021 · 4 comments · Fixed by #69
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@cardil
Copy link
Contributor

cardil commented Oct 15, 2021

Failed run: https://prow.knative.dev/view/gs/knative-prow/pr-logs/pull/knative-sandbox_kn-plugin-event/63/pull-knative-sandbox-kn-plugin-event-unit-tests/1448996998777671680

The same test is far less likely to hang when run with go test, when run via gotestsum fails more often. It fails on main branch, but was detected on #63 where the failures became quite common.

Reproduce from main branch:

$ go test -count=1 -race -timeout=2s -v -run '^TestJobRunnerRun$' ./pkg/k8s
=== RUN   TestJobRunnerRun
panic: test timed out after 2s

goroutine 12 [running]:
testing.(*M).startAlarm.func1()
	/home/ksuszyns/.gvm/gos/go1.16/src/testing/testing.go:1701 +0x11f
created by time.goFunc
	/home/ksuszyns/.gvm/gos/go1.16/src/time/sleep.go:180 +0x52

goroutine 1 [chan receive]:
testing.(*T).Run(0xc0006c2480, 0x293972a, 0x10, 0x29f36c0, 0x1)
	/home/ksuszyns/.gvm/gos/go1.16/src/testing/testing.go:1240 +0x610
testing.runTests.func1(0xc0006c2480)
	/home/ksuszyns/.gvm/gos/go1.16/src/testing/testing.go:1512 +0xa7
testing.tRunner(0xc0006c2480, 0xc0006bfce0)
	/home/ksuszyns/.gvm/gos/go1.16/src/testing/testing.go:1194 +0x203
testing.runTests(0xc0006192c0, 0x395be20, 0x2, 0x2, 0xc05284d8b55713ef, 0x79fa538d, 0x397c4c0, 0xc0001ac1a8)
	/home/ksuszyns/.gvm/gos/go1.16/src/testing/testing.go:1510 +0x613
testing.(*M).Run(0xc000239980, 0x0)
	/home/ksuszyns/.gvm/gos/go1.16/src/testing/testing.go:1418 +0x3b4
main.main()
	_testmain.go:47 +0x237

goroutine 19 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x397caa0)
	/home/ksuszyns/git/kn-event/vendor/k8s.io/klog/v2/klog.go:1164 +0xb1
created by k8s.io/klog/v2.init.0
	/home/ksuszyns/git/kn-event/vendor/k8s.io/klog/v2/klog.go:418 +0x1a7

goroutine 20 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc0003ecd80)
	/home/ksuszyns/git/kn-event/vendor/go.opencensus.io/stats/view/worker.go:276 +0x1c5
created by go.opencensus.io/stats/view.init.0
	/home/ksuszyns/git/kn-event/vendor/go.opencensus.io/stats/view/worker.go:34 +0xb2

goroutine 51 [chan receive]:
knative.dev/pkg/metrics.(*metricsWorker).start(0xc000406060)
	/home/ksuszyns/git/kn-event/vendor/knative.dev/pkg/metrics/metrics_worker.go:99 +0x65
created by knative.dev/pkg/metrics.init.0
	/home/ksuszyns/git/kn-event/vendor/knative.dev/pkg/metrics/exporter.go:39 +0xee

goroutine 8 [chan receive]:
k8s.io/klog.(*loggingT).flushDaemon(0x397c9c0)
	/home/ksuszyns/git/kn-event/vendor/k8s.io/klog/klog.go:1010 +0xb1
created by k8s.io/klog.init.0
	/home/ksuszyns/git/kn-event/vendor/k8s.io/klog/klog.go:411 +0x190

goroutine 10 [semacquire]:
sync.runtime_Semacquire(0xc000047378)
	/home/ksuszyns/.gvm/gos/go1.16/src/runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc000047370)
	/home/ksuszyns/.gvm/gos/go1.16/src/sync/waitgroup.go:130 +0xe5
knative.dev/kn-plugin-event/pkg/k8s_test.TestJobRunnerRun(0xc0006c2600)
	/home/ksuszyns/git/kn-event/pkg/k8s/jobrunner_test.go:35 +0x945
testing.tRunner(0xc0006c2600, 0x29f36c0)
	/home/ksuszyns/.gvm/gos/go1.16/src/testing/testing.go:1194 +0x203
created by testing.(*T).Run
	/home/ksuszyns/.gvm/gos/go1.16/src/testing/testing.go:1239 +0x5d8

goroutine 11 [chan receive]:
knative.dev/kn-plugin-event/pkg/k8s.(*jobRunner).watchJob(0xc0005298f0, 0x2c7ced8, 0xc0001b0d80, 0x29f36a8, 0x0, 0x0)
	/home/ksuszyns/git/kn-event/pkg/k8s/jobrunner.go:78 +0x35b
knative.dev/kn-plugin-event/pkg/k8s.(*jobRunner).Run(0xc0005298f0, 0xc0001b0d80, 0xc000122758, 0x0)
	/home/ksuszyns/git/kn-event/pkg/k8s/jobrunner.go:34 +0x3d2
knative.dev/kn-plugin-event/pkg/k8s_test.TestJobRunnerRun.func1(0xc000047370, 0xc0006c2600, 0x2c17860, 0xc0005298f0, 0xc0001b0d80)
	/home/ksuszyns/git/kn-event/pkg/k8s/jobrunner_test.go:28 +0xa2
created by knative.dev/kn-plugin-event/pkg/k8s_test.TestJobRunnerRun
	/home/ksuszyns/git/kn-event/pkg/k8s/jobrunner_test.go:26 +0x735
FAIL	knative.dev/kn-plugin-event/pkg/k8s	2.066s
FAIL
@cardil cardil changed the title Test pkg/k8s.TestJobRunnerRun sometimes hangs when run bu gotestsum [Flaky] pkg/k8s.TestJobRunnerRun sometimes hangs Oct 15, 2021
@cardil
Copy link
Contributor Author

cardil commented Oct 15, 2021

/kind bug

@knative-prow-robot knative-prow-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 15, 2021
@cardil
Copy link
Contributor Author

cardil commented Oct 15, 2021

@cardil
Copy link
Contributor Author

cardil commented Oct 15, 2021

I think doing what @dsimansk suggested in #45 should solve this issue.

@cardil
Copy link
Contributor Author

cardil commented Oct 18, 2021

I did identify the culprit. It turns out that sometimes test updates the Job to success state

https://github.com/cardil/kn-plugin-event/blob/645aafde5537801831a5e3579ba59366d9c74398/pkg/k8s/jobrunner_test.go#L38-L39

before the production code, start the watch, here:

https://github.com/cardil/kn-plugin-event/blob/645aafde5537801831a5e3579ba59366d9c74398/pkg/k8s/jobrunner.go#L30-L34

Adding any sleep, even 10msec before line 38 resolves the issue. But, maybe a better solution can be implemented in production code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants