Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O transcode loop never returns if the transcode session is ended via B #2589

Closed
iameli opened this issue Sep 16, 2022 · 2 comments · Fixed by #2591
Closed

O transcode loop never returns if the transcode session is ended via B #2589

iameli opened this issue Sep 16, 2022 · 2 comments · Fixed by #2591
Assignees

Comments

@iameli
Copy link
Contributor

iameli commented Sep 16, 2022

This O wasn't doing much, just transcoding the occasional segment. However its CPU usage (as well as the other staging orchestrators in the cluster) were stuck at 1500%. Here's a dump of all the available goroutines.. Looks like there's lots and lots of routines with traces something like this...?

goroutine 79303 [runnable]:
time.Time.Add({0xc0c127f998d0c387, 0x11b13e8736cf, 0x27c6860}, 0xdf8475800)
	/usr/local/go/src/time/time.go:820 +0x146
context.WithTimeout({0x87d330, 0xc00004a0d0}, 0xdf8475800)
	/usr/local/go/src/context/context.go:507 +0x32
github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop.func1()
	/src/core/orchestrator.go:634 +0x7c
created by github.com/livepeer/go-livepeer/core.(*LivepeerNode).transcodeSegmentLoop
	/src/core/orchestrator.go:631 +0x436
@github-actions github-actions bot added the status: triage this issue has not been evaluated yet label Sep 16, 2022
@yondonfu yondonfu added type: bug Something isn't working and removed status: triage this issue has not been evaluated yet labels Sep 16, 2022
@yondonfu yondonfu self-assigned this Sep 16, 2022
@yondonfu
Copy link
Member

The cause of this issue is that if a B triggers a session tear down on O the O's transcode loop will end up in an infinite loop that prevents the goroutine that contains the loop from being cleaned up.

When B triggers a session tear down on O, LivepeerNode.endTranscodingSession() will be called which closes the segment channel being used by the transcode loop. Then, loop's select statement will always execute the segment channel case since the channel has been closed and will immediately receive a nil value. As a result, each run of the loop will lead to a context being initialized and then immediately cancelled after the segment channel case is executed and this process repeats forever.

I was able to write a test where the goroutine for the transcode loop never gets cleaned up. I'm working on a fix to return from the transcode loop if the segment channel is closed.

@yondonfu yondonfu changed the title extremely high CPU usage on orchestrator: time.Time.Add()? O transcode loop never returns if the transcode session is ended via B Sep 16, 2022
@yondonfu
Copy link
Member

Fix implemented in #2589 which just needs review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants