Improve traffic shaping to reduce packet loss and video stuttering #3

danstiner · 2021-03-28T20:54:31Z

This distills the changes I've been testing that improve how outgoing video packets are paced and also brings in some tweaked values from the battle-tested WebRTC project. This does not fix the root cause of video stuttering, see Glimesh/janus-ftl-plugin#101 for more details on that issue.

Changes:

Increase the limit on video send bitrate to the application-controller peak_kbps value as was originally intended in this code
Decrease the send token bucket size to align with WebRTC
Decrease the maximum transmission unit to align with WebRTC

The original goal of this change was to solve issues where some users reported 'stuttering' on streams where on every keyframe the stream would pause for a fraction of a second before skipping forward. In extreme cases the stutter can last for more than a second or video can drop out entirely.

I've since investigated further and confirmed the actual root cause of stutters is due to the behavior of the WebRTC stack on the client side, see Glimesh/janus-ftl-plugin#101 for details. I'd still like to come back to this change once that issue is addressed, I still think we can make some small changes here that reduce packet loss for some users with bad networks while also reducing latency for users with good networks.

See the individual commits for additional detail but as an overview, this centers on how FTL uses a token bucket to shape outgoing video packets. They call this the transmit_level. It fills at a rate of bytes_per_ms which is currently based on video->kbps currently and the bucket holds up to bytes_per_ms * 100ms. In practice this means for a 5000kbps target video bitrate you can send up to 62.5KB in a 100ms period before the video send thread will start sleeping to smooth out traffic.

The main issue with this is that keyframes can be much larger than 62.5KB depending on your encoder settings and type of content you are streaming. I've observed keyframes of up to 250KB at a 5000kbps bitrate which based on the current smoothing will take 320ms to fully send the keyframe. That is a huge jump in latency for a streaming protocol like FTL that aims for sub-second latency.

In theory this should just show up as significant delay as buffering is done on the viewer side to handle this large packet delivery jitter. However we've seen specially on mostly static streams that only have one small part changing (such as a countdown timer) that the video will pause or "stutter" a very noticeable amount on nearly every keyframe. I've confirmed what is happening is the user's WebRTC stack is not expanding the jitter buffer enough to handle the keyframe delay, see Glimesh/janus-ftl-plugin#101 for details.

As a final note, this codebase is honestly not of the best quality and OBS I think is right to try and drop support for it as soon as an alternative low-latency protocol exists. In my view the best longer-term option remains getting WebRTC or other modern low-latency protocol supported in OBS.

clone1018 · 2021-03-28T21:26:48Z

Are these changes from the newest build, or are they even more recent changes?

haydenmc · 2021-03-28T22:40:20Z

I know you've described it before, but could you explain the "magic numbers" a bit in the PR description or comments? :)

danstiner · 2021-03-29T06:08:57Z

Sorry shouldn't have added y'all before adding a proper description, added one. Also each commit has a detailed description as well.

I anticipate tightening up the language and adding examples of both the issue and bandwidth graphs to the PR when I open it against OBS

This is based on WebRTC which choose a MTU of 1200. This should improve streaming over tunnels that would encapsulate packets and push them past the size routers could handle (often 1500). https://groups.google.com/g/discuss-webrtc/c/gH5ysR3SoZI/m/zrnVHqtUAwAJ This reduces the amount of bytes that can fit in the NACK buffer by about 14%, to account for this and keep behavior the same the NACK buffer has been increased to the next largest size, doubling it.

Keyframes cause a burst of traffic that may exceed the streamer's connection speed and thus cause excessive buffering by their router or another hop along the path to their streaming service. To counteract this FTL does traffic shaping/smoothing for video packets. However the implementation is a bit broken. It allows a peak kbps rate to be set but entirely ignores it and limits outgoing bandwidth to exactly the video bitrate, calculated over a running 100ms window. This has very poor behavior on mostly static streams with large keyframes. For example take a 5000kbps stream that has 200KB keyframes every two seconds. Because the send rate is limited to 63.5KB every 100ms window it will take 320ms to send the keyframe. But because the stream is mostly static, non-keyframes will be very small and send without being delayed. That is a huge amount of jitter for the low-latency streaming FTL is trying to support, and it happens even if the user's connections can support sending at a faster rate. Using the measured speed solves this. Looking the speedtest and peak bandwidth calculation code, I can only conclude using the peak kbps value is what the FTL dev's meant to do.

This matches WebRTC which uses 5ms as described in: https://tools.ietf.org/html/draft-ietf-rmcat-gcc-02#section-5 Now instead of sending a large burst and then waiting 100ms before starting to trickle out packets, we will send a very small burst and start to trickle packets after 5ms. This may result in medium size video frames taking longer to send but overall traffic will be smoother and raising the allowed peak send bandwidth should more than make up for the smoothing delay on networks that can handle it. This new behavior seems to perform better in all situations. If you have a fast connection your peak kbps will be much higher than the video kbps and only minimal smoothing will be applied. If you are streaming at close to the speed of your connection then significant smoothing will be applied to keep within your network's capability. Doing this smoothing at the application level is preferred with RTP over trusting the user's router and connection to handle large bursts of packets.

danstiner requested review from clone1018 and haydenmc March 28, 2021 20:54

danstiner force-pushed the stuttering-fix branch from c7e9148 to 2ed3db4 Compare March 29, 2021 06:16

danstiner marked this pull request as ready for review March 29, 2021 06:20

danstiner mentioned this pull request Mar 29, 2021

Improve traffic shaping for the FTL protocol to eliminate video stuttering Glimesh/obs-studio#1

Draft

6 tasks

haydenmc approved these changes Mar 29, 2021

View reviewed changes

danstiner added 5 commits April 1, 2021 09:06

Fix log statement of sender report base ntp time

1d7744b

Keep more status messages so we don't loose any on startup

33fb475

danstiner force-pushed the stuttering-fix branch from 2ed3db4 to 51d5dcc Compare April 1, 2021 16:51

danstiner mentioned this pull request Apr 16, 2021

Large keyframes can result in consistent freezes on every keyframe Glimesh/janus-ftl-plugin#101

Open

danstiner changed the title ~~Improve traffic shaping to eliminate video stuttering~~ Improve traffic shaping to reduce packet loss and video stuttering Apr 16, 2021

Correctly initialize bytes_per_ms from initial peak_kbps value

d3b2e81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve traffic shaping to reduce packet loss and video stuttering #3

Improve traffic shaping to reduce packet loss and video stuttering #3

danstiner commented Mar 28, 2021 •

edited

Loading

clone1018 commented Mar 28, 2021

haydenmc commented Mar 28, 2021

danstiner commented Mar 29, 2021 •

edited

Loading

Improve traffic shaping to reduce packet loss and video stuttering #3

Are you sure you want to change the base?

Improve traffic shaping to reduce packet loss and video stuttering #3

Conversation

danstiner commented Mar 28, 2021 • edited Loading

clone1018 commented Mar 28, 2021

haydenmc commented Mar 28, 2021

danstiner commented Mar 29, 2021 • edited Loading

danstiner commented Mar 28, 2021 •

edited

Loading

danstiner commented Mar 29, 2021 •

edited

Loading