-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite retries even on CRITICAL severity #830
Comments
The reason we have It should be noted that we will pause at least 4 seconds between these retries and we only make one request per stream (for a max of three for audio/video/text); so it seems unlikely to cause DDOS problems. Since this should only happen for live streams, we could change the logic to only use |
Is it not a violation of abstraction for StreamingEngine to know whether the manifest is live or VOD? For this live content lag scenario you mention, is there a reason why we can't expect the user to feed in appropriate retry parameters to guard against that? Perhaps the fear is that there are too many people expecting the default retry parameters to just work? In that case, how about if the retry-parameters ARE supplied, we can honor only those and not call handleNetworkError_()? |
I don't think an abstraction violation is a big problem here. This is only for handling errors, not for normal streaming. One concern is that most apps will use the default values. Detecting whether they are provided or not can be hard especially since there are more than one; so changing one of the parameters may not be enough. Also we don't know if the app is considering this case when they provide these values. But there are a lot of reasons we may not be able to download segments now but can later. Clock sync is one that the app doesn't know about and can't set retry-parameters for. Also, if the manifest uses I think the best case would be to only use |
Do not infinitely retry on top of the retry-policy already defined by RetryParameters, especially for VOD. For live video the previous retry logic is still useful to be robust against clock-sync or asset availability issues. This commit adds a boolean configuration parameter, infiniteRetriesForLiveStreams, to allow the user to turn this behavior off even for live video. Fixes shaka-project#830 and shaka-project#762
Do not infinitely retry on top of the retry-policy already defined by RetryParameters, especially for VOD. For live video the previous retry logic is still useful to be robust against clock-sync or asset availability issues. This commit adds a boolean configuration parameter, infiniteRetriesForLiveStreams, to allow the user to turn this behavior off even for live video. Fixes shaka-project#830 and shaka-project#762
I created a pull-request for this. For the conf value, I chose the name "infiniteRetriesForLiveStreams". Not sure if you guys agree with the name, but the suggested "retryForLiveStreams" seemed misleading to me because even with it false, there will still be retries for live streams according to the passed in RetryParameters. I'm open to suggestions for alternatives. |
Do not infinitely retry on top of the retry-policy already defined by RetryParameters, especially for VOD. For live video the previous retry logic is still useful to be robust against clock-sync or asset availability issues. This commit adds a boolean configuration parameter, infiniteRetriesForLiveStreams, to allow the user to turn this behavior off even for live video. Fixes #830 and #762
Do not infinitely retry on top of the retry-policy already defined by RetryParameters, especially for VOD. For live video the previous retry logic is still useful to be robust against clock-sync or asset availability issues. This commit adds a boolean configuration parameter, infiniteRetriesForLiveStreams, to allow the user to turn this behavior off even for live video. Fixes #830 and #762
Cherry-picked to v2.1.3. |
Have you read the FAQ and checked for duplicate issues:
Yes, I'm filing a more general issue, which is the root cause of #762.
What version of Shaka Player are you using:
This issue exists in both 2.0.x and 2.1.x, and not before.
Can you reproduce the issue with our latest release version:
Yes
Can you reproduce the issue with the latest code from
master
:Yes
Are you using the demo app or your own custom app:
Yes
If custom app, can you reproduce the issue using our demo app:
Yes
What browser and OS are you using:
Chrome, Mac (but should be reproducible in others).
What are the manifest and license server URIs:
(you can send the URIs to [email protected] instead, but please use GitHub and the template for the rest)
Irrelevant. Only thing that matters is that segments return non 2XX http code. Even 401 or 403 reproduces.
What did you do?
Host segments on my local server, program server to return 401/403, or anything non 2XX for segments (manifest still served fine).
What did you expect to happen?
For 401/403, these are CRITICAL severity since #620 so I expect no retries to occur. For other non-2XX codes, I expect only up to "maxAttempts" retries to occur.
What actually happened?
Retries forever. This is counter-intuitive since it basically doesn't respect the maxAttempts in the retryParameters. Not to mention it can lead to unintentional DDOS'ing of the host...
I've debugged through the code, and I think I understand the issue and possible solutions. #390 was fixed by introducing another layer of retries in StreamingEngine (see 0d77ddf). However, the logic introduced there, by calling handleNetworkError_(), effectively leads to infinite retries. This double-layering of retries (both StreamingEngine and NetworkingEngine) is confusing and seems to defeat the point of having a retry-policy with the RetryParameters. I could be wrong, but it seems to me that #390 would have been better addressed differently, by telling the user to configuring shaka's RetryParameters appropriately to be robust to network failures, and closing the issue as "by-design". Or at the minimum, it seems that 0d77ddf is catching too many cases in the if-statement by including BAD_HTTP_STATUS, when #390 was really intended for bad network conditions (which would be HTTP_ERROR or TIMEOUT).
Here are the solutions I see for this in order of preference:
Let me know if I'm overlooking something...
The text was updated successfully, but these errors were encountered: