-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Byte stream formats #38
Comments
Discussed with @sandersdan, we lean toward a won't fix here. Bytestreams still have discrete frame boundaries such that "chunks" can be identified and provided the codec. Alternatives seem to make the API much more complicated for little benefit. |
AV1 also uses a byte stream format. Are you saying that WebCodecs won't be able to support AV1?? |
A different way to say what @chcunningham is saying: if it can be packaged in an MP4, then we do not need a ReadableByteStreamController. The terms are different between streams and media codecs:
There are unpacketized bitstreams (eg. H.264 Annex B), but I am unaware of any that don't also have standard packetizations. There is a spectrum of possible implementations in WebCodecs:
I prefer the last one because it allows us to accept frame metadata (such as timestamp) alongside the bytestream chunks, but it's conceivable that there exists (or will exist) a format for which this doesn't make sense. |
The AV1 bitstream specification is packetized as specified in the AV1 RTP payload specification. The AV1 bitstream format uses OBUs (similar to H.264 NAL units), including Time Delimiter (TD), Sequence Header (SH), MetaData (MD), Tile Group (TG) and Frame Header (FH) OBUs. As an example, the following bitstream:
would typically be packetized as follows: [ This seems like it might qualify as "arbitrary chunks" or "meaningful chunks", but probably not "chunks that are exactly one sample". |
AV1 is also packetized in chunks that are exactly one sample in the ISO BMFF binding. |
Also worth noting that a decision here could affect #13, and theoretical future video formats that support progressive decoding. My gut instinct is that progressive decoding is for still images and should be a separate API, but I'd like to understand that design space better. |
And one more note: for low-latency streams, it may be beneficial to submit slices/tiles individually as they arrive from the network, and the opposite for encoding. (So 'meaningful chunks'.) If we support that, it's important to make sure we don't also make muxing harder for less latency-sensitive cases. A 'partial' flag for input and output chunks may be enough (and could be added in a v2). |
I would be against byte streaming progressive decoding, that is, feeding the decoder with a byte stream without explicit boundaries (it may be inline as in h264 with the nal start header sequence "001") and let the decoder decide where are the relevant start/end bytes for each decodable chunk. I think that the question is really if we serialize the encoding units that the encoder produces(would be the group of nals in h264 or obus in av1, or particions in vp8) into a byte array (i.e. the byte stream format) or if we just output an array of chunks so the app packetizes it at will. Note that typically encoders provide is the later, for example in vp8 you encode the frame and then return each partition:
x264 the same, providing the array of nals as ouput of There are pros and cons about doing it this way (which would also affect as what we accept as input in the decoder). The good part is that providing the individual encoding units (nals/obus/partitions) it is easier to convert it to any frame-based stream format (for example to h264 annex b format) and it is easier to do an rtp packetization (if not you would have to typically parse the byte stream to find the nals/obus and apply packetization afterward). The bad part is that this requires that the serialization is done on the app side before passing it to the appropriate transport (webrtc could be different as the packetization should be done inside of it). |
Also, as a side note, SVC codecs (like vp9) produces several "frames" per input video frame, so it would not be easy to produce a single chunk from the encoder. |
Hello! For h264, does it mean we have to group NAL units by ourselves before creating |
That's correct. If your source is not framed then you will need to identify access unit boundaries. If your source includes AUD (Access Unit Delimiter) units then that's quite easy (break right before each AUD). It's also relatively easy if you know there is only once slice per frame and no redundant or auxiliary slices (break after each slice). Beyond that you'll probably want to read the H.264 spec. |
Note for VP9 spatial SVC: My current understanding is that the several frames should in fact be separate frames, but they have the same timestamp. There is an asymmetry here; for encoding you should only be passing in the highest-resolution version of each frame. I expect our encoders will output multiple chunks (one for each resolution) but they will have the same timestamp. I still need to do some research to figure out if its technically valid to bundle them into a single chunk. (Presumably libvpx is/would already be bundling them like that if it's valid.) |
To the core issue of slices/tiles vs 'meaningful' chunks, Chrome's longstanding behavior has been 'meangingful' chunks and this has been demonstrated to work great for a variety of use cases (RTC, Low latency streaming, Video editing, etc...). If slices/tiles is later desired, we should do this without breaking the API (e.g. specified as an option in VideoDecoderConfig, for which the default is 'meaningful' chunks). Hence I've marked the issue as 'extension'. Having said that, we've had no real demand for this from users and I vote to just close the issue until demand arrives. @sandersdan WDYT?
The codec registry should document this. Work tracked in #155.
We discussed this more w/ SVC folks and learned separate chunks is how its done. |
Closing is acceptable to me. Even if there is demand, breaking a stream into chunks may fit better in a containers API anyway. |
As currently defined, WebCodecs supports packetized codecs, where we expect one decoded frame per encoded chunk. For some codecs (eg. H.264 in Annex B format), it makes sense to use a byte stream instead.
This changes the interface of an encoder or decoder, so it's not a trivial change. It doesn't seem to be compatible with our flush or configure model unless streams gain support for flush.
The text was updated successfully, but these errors were encountered: