Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retransmissions and send order #523

Closed
vasilvv opened this issue Jun 21, 2023 · 11 comments · Fixed by #595
Closed

Retransmissions and send order #523

vasilvv opened this issue Jun 21, 2023 · 11 comments · Fixed by #595

Comments

@vasilvv
Copy link
Contributor

vasilvv commented Jun 21, 2023

RFC 9000, Section 13.3 says "Endpoints SHOULD prioritize retransmission of data over sending new data, unless priorities specified by the application indicate otherwise; see Section 2.3." Since we are defining an API for specifying relative priorities of streams, we may want to define some behavior that developers could rely on.

I believe the main question here would be "should new data on higher sendOrder stream preempt retransmissions of data lost on a lower sendOrder stream?". I believe that the answer should be "yes", and at least one of the proposed MoQ priority schemes relies on that to be efficient (@kixelated should correct me if I am wrong). That said, as far as I understand, this is not what either Chrome or Firefox QUIC stack currently does.

(from my personal perspective, being able to control retransmission order is the main point of sendOrder API, since otherwise the API client can just order the writes itself)

@LPardue
Copy link

LPardue commented Jun 21, 2023

I agree that sendorder should be considered.

This spec probably wants to think about probe packets too, i.e. from RFC 9218 we said

Section 6.2.4 of [QUIC-RECOVERY] also highlights considerations regarding application priorities when sending probe packets after Probe Timeout timer expiration. A QUIC implementation supporting application-indicated priorities might use the relative priority of streams when choosing probe data.

Retransmissions also pose some interesting questions for fairness of bandwidth allocation between streams and datagrams but mayeb that's worth a separate issue.

@aboba
Copy link
Collaborator

aboba commented Jun 21, 2023

Pre-empting retransmissions can be problematic:

  • If it causes discardable frames to pre-empt non-discardable frames. Since discardable frames depend on non-discardable frames, pre-empting retransmission of non-discardable frames magnifies loss. For example, I-frames are typically 10+ times larger than P-frames, so that retransmission of I-frames is much more likely than P-frames. As a result, pre-empting retransmission of I-frames dramatically increases I-frame loss probabilities. For example, with a packet loss probability of 1 percent, a 25-packet I-frame will have a loss probability of more than 22 percent if re-transmission is pre-empted.
  • If it complicates implementation of partial reliability. By setting a timer, it is currently possible to limit the total transmission time for frames. The maximum transmission time may be set higher for non-discardable frames than for discardable frames. Pre-empting the retransmission of non-discardable frames makes it very difficult to set an appropriate timer.

@wilaw wilaw added the Discuss at next meeting Flags an issue to be discussed at the next WG working label Jun 21, 2023
@kixelated
Copy link

kixelated commented Jun 21, 2023

So I implemented this at Twitch. I went with retransmissions according to send order because it was easier to implement, but I could have gone either way.

Deprioritize retransmissions

Flow control can be tricky when you deprioritize retransmissions. Any gap in a stream means that the tail is not flushed to the application but counts towards flow control. With a high send order, a stream may be starved and stay in this state indefinitely.

Eventually you might hit the MAX_DATA limit, in which case you the endpoint must go back and retransmit these old streams exclusively for the purpose of freeing up flow control (if they haven't been reset by now). However you can't transmit new data while at the MAX_DATA limit, so you won't be able to send a full flight and throughput will suffer.

So I would say that if you're going to deprioritize retransmissions, you should only do so while there's a full flight available in MAX_DATA, otherwise you risk a momentary stall. I feel like something is wrong if you've hit this point though, and it's jarring to switch from transmitting new frames to retransmitting old frames once you hit some invisible line.

Prioritize retransmissions

Prioritizing retransmissions independent of send order could degrade the user experience. At the extreme, if you lose an entire flight of packets (ex. network outage), then you have no choice but to retransmit that same flight immediately after recovery. The congestion window may be significantly smaller so you're spending at least a full RTT on lower priority streams

This might be a mostly an academic concern. If the network outage is short, then the send order for streams has likely not changed dramatically by the time the ACK arrives. If the network outage is long, then I'm not sure the user will care about "poor" prioritization shortly after recovery. But I do think that it kind of sucks to receive data from 5s ago after emerging from a tunnel rather than new data, even if it's only for the first few RTTs.

@kixelated
Copy link

  • If it causes discardable frames to pre-empt non-discardable frames.  Since discardable frames depend on non-discardable frames, pre-empting retransmission of non-discardable frames magnifies loss.  For example, I-frames are typically 10+ times larger than P-frames, so that retransmission of I-frames is much more likely than P-frames.  As a result, pre-empting retransmission of I-frames dramatically increases I-frame loss probabilities.  For example, with a packet loss probability of 1 percent, a 25-packet I-frame will have a loss probability of more than 22 percent if re-transmission is pre-empted.

That doesn't happen when send order matches dependencies. The I-frame MUST have a lower send order than P-frames than depend on it. Otherwise, with even the slightest congestion (independent of packet loss), the I-frame could be starved by frames that depend on it (tail-of-line blocking).

Unless you're talking about an I-frame from the previous GoP versus a P-frame from a future GoP. It's debatable if you want to transmit the most recent P-frame or (re)transmit part of an I-frame from X seconds ago. In my opinion, the send order is the order that the application WANTS the frames to arrive in, independent of any network conditions, so the relay should transmit accordingly.

If it complicates implementation of partial reliability. By setting a timer, it is currently possible to limit the total transmission time for frames. The maximum transmission time may be set higher for non-discardable frames than for discardable frames. Pre-empting the retransmission of non-discardable frames makes it very difficult to set an appropriate timer.

That's would happen with any prioritization scheme. Even without packet loss, the congestion window can prevent the full stream from being transmitted, and by the time more window is available, higher priority streams are available that take precedent regardless of the expiration timer. But I agree that prioritizing retransmissions based on send order would amplify the effect.

But that sort of partial reliability scheme wouldn't use send order at all. You would send every stream with the same priority (equal bandwidth) and reset them after x ms. Throwing send order into the mix is extremely confusing, and I'm not even sure how both send order and deadlines can work in tandem for congestion response.

@martinthomson
Copy link
Member

I agree with @kixelated's comments on the substance of the challenges with retransmissions. My view is that a general implementation should prioritize retransmissions to avoid the flow control deadlock problem that doing otherwise might create. Offering a way to abandon (or set time limits) on retransmissions for lower priority (or lower send order) data would allow the stack to avoid the priority inversion that retransmissions might cause. Then, you can make the lifetime of a stream appropriate. For the video example, you might imagine key frames having a lifetime equal to the expected key frame interval, whereas other frames would have a lifetime closer to the frame rate.

@jan-ivar
Copy link
Member

Meeting:

  • Ideas: Wherever we set sendOrder, add a boolean toggle (prioritizeRetransmits) or a timeWindow for that? dunno
  • @vasilvv: might want to shelve this for now until people who do servers tell us if this matters or not

@kixelated
Copy link

kixelated commented Aug 4, 2023

I think I'm wrong about a few things.

However you can't transmit new data while at the MAX_DATA limit, so you won't be able to send a full flight and throughput will suffer.

Actually no, because retransmissions aren't subject to the MAX_DATA limit. You won't have a full flight of prioritized data, but you will still have a full flight of data.

So I would say that if you're going to deprioritize retransmissions, you should only do so while there's a full flight available in MAX_DATA, otherwise you risk a momentary stall.

So there still will be a sudden shift from the application's point of view. The receiver will be in a steady state where only high priority data is being received and then suddenly it receives old and low priority data instead for an RTT. The retransmissions free up flow control so you're back in the steady state until sufficient packet loss occurs again.

However my recommendation doesn't avoid that; it just pushes forward the timeline by an RTT. If this stall is really an issue, then you could amortize it by retransmitting a % based on how close you are to the MAX_DATA limit... but that seems even more complicated to implement.

Ideas: Wherever we set sendOrder, add a boolean toggle (prioritizeRetransmits) or a timeWindow for that? dunno

The application can implement the timeWindow itself and that's actually how my client works. I try to transmit media for up to 10s, after which I close the stream with a non-fatal error code because it was starved. This is also necessary if there's any local send buffer limit such as a maximum amount of data that can be queued in WebTransport.

A boolean for prioritizeRetransmits seems premature. I think we experiment on the server side to see if it has any impact first. If it does have an impact, I still don't think exposing that knob to the application makes sense. The more I think about it, the more I think the browser could transparently deprioritize transmissions.

@wilaw
Copy link
Contributor

wilaw commented Aug 14, 2023

Feedback from IETF #117 in San Francisco
Notes here: https://notes.ietf.org/notes-ietf-117-webtrans

Cullen Jennings: If we just did this by default, it would not be great. Lossy networks would have priority inversions from other networks. If you say that the application can set different send orders for the main packets and then a separate one for if it’s retransmitted, you have to let the application control it. .. If you want to separate send orders, you should have two, both set by the application.

Luke Curley: I’ve done two, one with and one without, didn’t see any difference really. There are some flow control issues, unless your packet loss rate is really high, you don’t really see any difference. Always retransmit first, otherwise flow control is hell, you have gaps in streams that still consume flow control.

Mo: I like Cullen’s idea of having specific retransmission order. If I can use it as a hack to get datagram semantics with streams, set it to the lowest possible.

@jan-ivar
Copy link
Member

Meeting:

  • Might be too early to see
  • Chrome: retransmissions always go first
  • Firefox: pretty sure the same
  • As a feature this seems like something that could be added later. tempting to defer
  • Argument against: Such a feature might merely allow folks to over-commit more
  • A thing that only happens when you're not having a great day already. You only benefit from this if you're expecting things out of order to help your performance

@jan-ivar
Copy link
Member

Can we mark this ready for PR with a note, e.g.

  • Ordering of retransmissions is at the discretion of user agents, but retransmissions are encouraged to happen at high priority.

@wilaw wilaw removed the Discuss at next meeting Flags an issue to be discussed at the next WG working label Sep 20, 2023
@wilaw wilaw added this to the Candidate Recommendation milestone Dec 20, 2023
@suhasHere
Copy link

I shared a talk on Simulcast video delivery over MoQ and shared some learnings from the same here with respect to retransmissiosn and priorities:

https://datatracker.ietf.org/meeting/119/materials/slides-119-moq-simulcast-video-delivery-learnings-01

The gist as shared by Christian Hiutema in picoquic implementatiom is

Retransmit is done at the same priority as the original stream. SO if we have seen packet losses on both a low def high priority stream and a high def low priority, the order will be: retransmit of high priority stream, then new data on high priority stream, then retransmit for low priority stream, then new data for low priority

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants