Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve traffic shaping to reduce packet loss and video stuttering #3

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

danstiner
Copy link
Collaborator

@danstiner danstiner commented Mar 28, 2021

This distills the changes I've been testing that improve how outgoing video packets are paced and also brings in some tweaked values from the battle-tested WebRTC project. This does not fix the root cause of video stuttering, see Glimesh/janus-ftl-plugin#101 for more details on that issue.

Changes:

  1. Increase the limit on video send bitrate to the application-controller peak_kbps value as was originally intended in this code
  2. Decrease the send token bucket size to align with WebRTC
  3. Decrease the maximum transmission unit to align with WebRTC

The original goal of this change was to solve issues where some users reported 'stuttering' on streams where on every keyframe the stream would pause for a fraction of a second before skipping forward. In extreme cases the stutter can last for more than a second or video can drop out entirely.

I've since investigated further and confirmed the actual root cause of stutters is due to the behavior of the WebRTC stack on the client side, see Glimesh/janus-ftl-plugin#101 for details. I'd still like to come back to this change once that issue is addressed, I still think we can make some small changes here that reduce packet loss for some users with bad networks while also reducing latency for users with good networks.

See the individual commits for additional detail but as an overview, this centers on how FTL uses a token bucket to shape outgoing video packets. They call this the transmit_level. It fills at a rate of bytes_per_ms which is currently based on video->kbps currently and the bucket holds up to bytes_per_ms * 100ms. In practice this means for a 5000kbps target video bitrate you can send up to 62.5KB in a 100ms period before the video send thread will start sleeping to smooth out traffic.

The main issue with this is that keyframes can be much larger than 62.5KB depending on your encoder settings and type of content you are streaming. I've observed keyframes of up to 250KB at a 5000kbps bitrate which based on the current smoothing will take 320ms to fully send the keyframe. That is a huge jump in latency for a streaming protocol like FTL that aims for sub-second latency.

In theory this should just show up as significant delay as buffering is done on the viewer side to handle this large packet delivery jitter. However we've seen specially on mostly static streams that only have one small part changing (such as a countdown timer) that the video will pause or "stutter" a very noticeable amount on nearly every keyframe. I've confirmed what is happening is the user's WebRTC stack is not expanding the jitter buffer enough to handle the keyframe delay, see Glimesh/janus-ftl-plugin#101 for details.

As a final note, this codebase is honestly not of the best quality and OBS I think is right to try and drop support for it as soon as an alternative low-latency protocol exists. In my view the best longer-term option remains getting WebRTC or other modern low-latency protocol supported in OBS.

@danstiner danstiner requested review from clone1018 and haydenmc March 28, 2021 20:54
@clone1018
Copy link
Member

Are these changes from the newest build, or are they even more recent changes?

@haydenmc
Copy link
Member

I know you've described it before, but could you explain the "magic numbers" a bit in the PR description or comments? :)

@danstiner
Copy link
Collaborator Author

danstiner commented Mar 29, 2021

Sorry shouldn't have added y'all before adding a proper description, added one. Also each commit has a detailed description as well.

I anticipate tightening up the language and adding examples of both the issue and bandwidth graphs to the PR when I open it against OBS

This is based on WebRTC which choose a MTU of 1200.

This should improve streaming over tunnels that would encapsulate
packets and push them past the size routers could handle (often 1500).

https://groups.google.com/g/discuss-webrtc/c/gH5ysR3SoZI/m/zrnVHqtUAwAJ

This reduces the amount of bytes that can fit in the NACK buffer by
about 14%, to account for this and keep behavior the same the NACK
buffer has been increased to the next largest size, doubling it.
Keyframes cause a burst of traffic that may exceed the streamer's connection
speed and thus cause excessive buffering by their router or another hop
along the path to their streaming service.

To counteract this FTL does traffic shaping/smoothing for video packets.

However the implementation is a bit broken. It allows a peak kbps rate
to be set but entirely ignores it and limits outgoing bandwidth to
exactly the video bitrate, calculated over a running 100ms window.

This has very poor behavior on mostly static streams with large
keyframes. For example take a 5000kbps stream that has 200KB keyframes
every two seconds. Because the send rate is limited to 63.5KB every
100ms window it will take 320ms to send the keyframe. But because the
stream is mostly static, non-keyframes will be very small and send
without being delayed.

That is a huge amount of jitter for the low-latency streaming FTL is
trying to support, and it happens even if the user's connections can
support sending at a faster rate. Using the measured speed solves this.

Looking the speedtest and peak bandwidth calculation code, I can only
conclude using the peak kbps value is what the FTL dev's meant to do.
This matches WebRTC which uses 5ms as described in:
https://tools.ietf.org/html/draft-ietf-rmcat-gcc-02#section-5

Now instead of sending a large burst and then waiting 100ms before
starting to trickle out packets, we will send a very small burst and
start to trickle packets after 5ms. This may result in medium size
video frames taking longer to send but overall traffic will be smoother
and raising the allowed peak send bandwidth should more than make up
for the smoothing delay on networks that can handle it.

This new behavior seems to perform better in all situations. If you have
a fast connection your peak kbps will be much higher than the video kbps
and only minimal smoothing will be applied. If you are streaming at
close to the speed of your connection then significant smoothing will be
applied to keep within your network's capability. Doing this smoothing
at the application level is preferred with RTP over trusting the user's
router and connection to handle large bursts of packets.
@danstiner danstiner changed the title Improve traffic shaping to eliminate video stuttering Improve traffic shaping to reduce packet loss and video stuttering Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants