Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: transcoders wait forever on orchestrator restart #2705

Merged
merged 1 commit into from
Jan 6, 2023

Conversation

stronk-dev
Copy link
Contributor

@stronk-dev stronk-dev commented Dec 30, 2022

What does this pull request do?

Fixes a bug where transcoders would wait forever on a orchestrator restart if they had a previous transcoding session. Fix simply moves the wg.Add(1) call so that the wait group is able to reach 0 again

Specific updates

My thinking is the bug got introduced in this commit: a1fb761#diff-0db7e4513a2e3eb16dedb22ed6a0920e6648e62733004c11dc0ed8a19f314464 so 0.5.34 should still be good, anything after that should have this issue

How did you test each of these updates

Ran a split O/T setup on my main Orchestrator and rebooted it while the connected transcoders were busy transcoding or recently completed a transcoding session. Repeated a couple of times and my transcoders reconnected successfully every time

Fix ran in a pool environment in production, all T's were able to restart themselves gracefully when restarting the O numerous times over the past few days!

Does this pull request close any open issues?

Fixes #2704

@thomshutt thomshutt requested a review from cyberj0g January 1, 2023 21:20
@codecov
Copy link

codecov bot commented Jan 3, 2023

Codecov Report

Merging #2705 (0c7e4a3) into master (cf95f00) will decrease coverage by 0.00589%.
The diff coverage is 0.00000%.

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #2705         +/-   ##
===================================================
- Coverage   56.35194%   56.34605%   -0.00589%     
===================================================
  Files             88          88                 
  Lines          19128       19130          +2     
===================================================
  Hits           10779       10779                 
- Misses          7761        7763          +2     
  Partials         588         588                 
Impacted Files Coverage Δ
cmd/livepeer_cli/wizard_stats.go 0.00000% <0.00000%> (ø)
server/ot_rpc.go 35.34483% <0.00000%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2c46a1...0c7e4a3. Read the comment docs.

Impacted Files Coverage Δ
cmd/livepeer_cli/wizard_stats.go 0.00000% <0.00000%> (ø)
server/ot_rpc.go 35.34483% <0.00000%> (ø)

Copy link
Contributor

@cyberj0g cyberj0g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! LGTM!

@eliteprox
Copy link
Collaborator

Looks like this PR still needs to be merged

@cyberj0g cyberj0g merged commit af9de74 into master Jan 6, 2023
@cyberj0g cyberj0g deleted the md/TranscoderWaitFix branch January 6, 2023 05:22
@stronk-dev stronk-dev mentioned this pull request Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

T does not automatically connect to O after restart or crash .35 and .36
3 participants