Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segments fail to upload on commit 73aaca8c3da1f5269fe059dd03b0cbf4d6796a1c #2582

Open
stronk-dev opened this issue Sep 7, 2022 · 14 comments
Assignees

Comments

@stronk-dev
Copy link
Contributor

All Orchestrator nodes were operating fine on 0.5.34, except for Boston, which was running on a special version version of 0.5.34

Almost all segments fail to upload, causing test stream scores to plummet. Weirdly enough, it did not seem to affect my actual transcoding work. One other Orchestrator also confirmed this issue while he was on above version and also rolled back to 0.5.34 to fix the issue

I don't much info to give you, as the code does not print the actual error message when it fails...

E0907 04:44:32.830831 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3934 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:31.397356 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1327 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:30.675425 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3933 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:29.525661 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1326 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:28.652747 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3932 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:27.490905 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1325 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:26.739795 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3931 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:25.228388 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1324 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:24.520305 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3930 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:23.276689 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1323 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:22.670958 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3929 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:21.369977 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1322 orchSessionID=8210a3e9 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=212.102.58.242 Could not upload segment
@github-actions github-actions bot added the status: triage this issue has not been evaluated yet label Sep 7, 2022
@stronk-dev
Copy link
Contributor Author

It looks like the branch has been removed, but it contained all of the commits which are now in the master branch which got added after v0.5.34. It seems that one of the commits added after this release was causing this issue

@stronk-dev
Copy link
Contributor Author

@Franck-UltimaRatio
Copy link

Franck-UltimaRatio commented Sep 14, 2022

i deployed 2 days ago the linux 0.5.34-3ff79bd7 version because this one fix the estimation gas error.
Since i ve deployed it, i noticed some difficulty to keep streams, and also some very bad streams tests.
So i rolled back our nodes with 0.5.34, and all is fine after that.
So i presume there is something wrong in one of the merge between 0.5.34 and 0.5.34-3ff79bd7.
Can' t be sure of the link with this version and can't share log about that. but i saw same error than stronk with the precedent fix, and have this kind of incident with 3ff79bd.

@thomshutt
Copy link
Contributor

@oscar-davids @cyberj0g The only commits that came in since 0.5.34 are #2381 and #2568 - I know there's not a lot to go on here, but would appreciate any thoughts on what might be that cause / what info we could ask for to help us debug

@yondonfu yondonfu added type: bug Something isn't working status: core contributors working on it in progress area: transcoding and removed status: triage this issue has not been evaluated yet labels Sep 20, 2022
@oscar-davids
Copy link
Contributor

oscar-davids commented Sep 21, 2022

The issue of difficulty to keep streams has been fixed by #2586.
Checked manually in our https://livepeer.studio/dashboard.

The uploading segment failure issue has been fixed by #2591.
Checked in grafana dashboard. here is comparison link. 09-17 vs 09-19

@stronk-dev
Copy link
Contributor Author

Nice! I'll run the latest master build on my orch nodes and will report if anything breaks

@stronk-dev
Copy link
Contributor Author

Immediately got 30 Could not upload segment errors on the latest commit in master, so there is still an issue somewhere

@oscar-davids
Copy link
Contributor

@stronk-dev thank you for testing immediately.
I added error logs when segment uploading failed in oc/adduploadfaillog branch.
Could you test it again with new branch in your side? I would like to see the exact reason why the upload failed.

@stronk-dev
Copy link
Contributor Author

Will take a while since i can't build go-livepeer from source on Arch, will probably give it a go later today

@stronk-dev
Copy link
Contributor Author

Just switched all of my nodes to commit 6c49c04 (retrieved the binary from discord builds channel). Results:

  • Seeing a whole bunch of EndTranscodingSession called, even if it is not transcoding at that moment. Probably unrelated to this issue, but maybe we can turn the verbosity down on this?
  • Could not upload segment errors seem to be much less frequent! Got only a single one in the past few minutes as:
    E0928 12:44:26.819619 1040347 segment_rpc.go:199] manifestID=4f51i9fb8iytrkqs seqNo=3 orchSessionID=825416ab clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
    
    But I'll keep an eye out and update here if anything changes

@stronk-dev
Copy link
Contributor Author

image
Past hour 10 unable to upload segment errors, this might be as expected but we'll get questions about it in the orchestrator-support channel for sure:

  |   | E0928 14:06:51.202184 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=16 orchSessionID=a2290bf9 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 14:06:50.658063 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=14 orchSessionID=a2290bf9 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 14:06:50.087492 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=12 orchSessionID=a2290bf9 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=89.187.188.237 Could not upload segment err="Session ended"
  |   | E0928 14:06:49.187477 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=10 orchSessionID=a2290bf9 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 14:06:48.354880 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=8 orchSessionID=a2290bf9 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 13:59:37.219060 1040347 segment_rpc.go:199] manifestID=894flijoblp8zoxj seqNo=4 orchSessionID=86effee4 clientIP=84.17.50.98 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 13:49:54.948044 1040347 segment_rpc.go:199] manifestID=b033vps9nhhj1l1e seqNo=135 orchSessionID=0250fc20 clientIP=84.17.50.99 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 13:19:09.182213 1040347 segment_rpc.go:199] manifestID=6851wpfm71k2cuii seqNo=37 orchSessionID=7648e99a clientIP=195.181.174.39 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 13:19:07.552985 1040347 segment_rpc.go:199] manifestID=6851wpfm71k2cuii seqNo=31 orchSessionID=7648e99a sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=195.181.174.39 Could not upload segment err="Session ended"
  |   | E0928 13:05:23.174115 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=107 orchSessionID=be5f7e80 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=89.187.188.237 Could not upload segment err="Session ended"
  |   | E0928 13:05:20.916489 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=102 orchSessionID=be5f7e80 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 13:05:20.203736 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=100 orchSessionID=be5f7e80 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 13:05:19.545365 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=98 orchSessionID=be5f7e80 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 13:05:18.774337 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=96 orchSessionID=be5f7e80 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
  |   | E0928 12:44:26.819619 1040347 segment_rpc.go:199] manifestID=4f51i9fb8iytrkqs seqNo=3 orchSessionID=825416ab clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"

@stronk-dev
Copy link
Contributor Author

stronk-dev commented Sep 29, 2022

Now that it has been running for a while:
image

Looks like the error does happen a bit excessively in US-East, with 39 occurences of could not upload segment per hour. Every single one of them ended with reason session ended

Looking at my transcode history, it does look like that specific node has way more trouble getting streams to stick:
image

@oscar-davids
Copy link
Contributor

@stronk-dev can you give me dashboard link?

@stronk-dev
Copy link
Contributor Author

Yea, my dashboard is publicly available at: https://grafana.stronk.tech/d/71b6OZ0Gz/orchestrator-overview

It has all info and error logs pulled from Loki and a counter for specific errors which you can unfold:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants