Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebCodecs (again!) #612

Closed
1 task done
chcunningham opened this issue Feb 18, 2021 · 27 comments
Closed
1 task done

WebCodecs (again!) #612

chcunningham opened this issue Feb 18, 2021 · 27 comments
Assignees
Labels
Resolution: satisfied The TAG is satisfied with this design Topic: media Venue: WICG

Comments

@chcunningham
Copy link

Ya ya yawm TAG!

I'm requesting a TAG review of WebCodecs.

An early review was conducted in #433. The API has changed a lot since then and we now have a draft specification. Please see discussion of deadlines (relatively soon) and and "you should also know that" below (upcoming work, open questions).

Thank you for reviewing!

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: Chrome would like to ship this API version 91, which is anticipated to release May 25, 2021. The code freeze for this release will occur around April 14, 2021.
  • The group where the work on this specification is currently being done: WICG
  • The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Migrating to Media WG
  • Major unresolved issues with or opposition to this specification: There are no major objections. Our Github tracks a number of open design questions. The most pressing are tagged with milestones.
  • This work is being funded by: Google/Mozilla (as employers of it's editors/implementers).

You should also know that...

We'd prefer the TAG provide feedback as (please delete all but the desired option):
🐛 open issues in our GitHub repo for each point of feedback

@cynthia cynthia self-assigned this Feb 19, 2021
@kenchris kenchris self-assigned this Feb 19, 2021
@padenot
Copy link

padenot commented Feb 19, 2021

* The code freeze for this release will occur around April 14, 2021.

This seems way too early (about two month) considering the rather fundamental discussions happening around memory allocations and time units, among others (there is a large number of unsolved issues, and some groups of users haven't given feedback yet). Getting those two wrong (and again, there is more to discuss), or not perfectly right will mean the API cannot be used for some of its stated use-cases, or will have performance implications that mean that some use-cases won't be possible on particular devices (notably and predictably, non-high-end devices).

Considering this is a low-level API that is supposed to be the bottom-most layer of the entire media decoding and encoding stack of the web platform, I would be sad to not get this right, especially considering the expectations of developers in various domains.

@chcunningham
Copy link
Author

@padenot, apologies for the surprise - I thought we were more aligned on timing. I agree we have a number of issues to resolve before shipping, but I think 2 months sounds workable. Its not my intention to rush this out the door. Lets discuss more in our upcoming call and we can revise the timeline as needed.

Timing aside, this is still a great time to gather TAG feedback. Reviewers, please proceed :)

@cynthia
Copy link
Member

cynthia commented Feb 20, 2021

Strange! I swear I wrote early feedback on this yesterday, but apparently I was imagining I did or I got distracted and forgot to press the green button. Abridged summary of what I wanted to write:

  • Explainer and spec disagree on AudioFrame vs AudioPacket?
  • VideoFrame has destroy/clone but AudioFrame|Packet does not - why?
  • In VideoFrame, wild guess is that plane is intended to give the original framebuffer (e.g. in YCbCr, as multiple grayscale images?) while the image bitmap method is expected to return something interoperable with the rest of the platform?

Still need to spend more time with this.

@chcunningham
Copy link
Author

Thanks so much @cynthia

Explainer and spec disagree on AudioFrame vs AudioPacket?

Just updated to use AudioFrame.

VideoFrame has destroy/clone but AudioFrame|Packet does not - why?

Sorry, this part needs work. The plan is for both interfaces to have matching clone() and close() methods, removing destroy(). The semantics of clone() and close() will be akin to incrementing/decrementing a ref count on the underlying media data. I should send a PR by end of week. I'll update this thread when its ready.

In VideoFrame, wild guess is that plane is intended to give the original framebuffer (e.g. in YCbCr, as multiple grayscale images?)

Your idea is roughly correct. VideoFrame gives access to the raw pixel data via its Planes interface. For YCbCr you would get a plane for each of Y, Cb, and Cr. For the I420 pixel format, the Cb and Cr planes are subsampled. The spec will soon mention additional pixel formats. For ex: RGB (1 plane) and NV12 (Y plane + interleaved CbCr plane).

while the image bitmap method is expected to return something interoperable with the rest of the platform?

Yes, though we have some pending updates to mention here as well. We initially envisioned VideoFrame -> ImageBitmap -> Canvas as a primary means of painting VideoFrames. More recently, we intend to support painting of VideoFrames directly via WebGL tex(Sub)Image(2D|3D)() and Canvas drawImage(). The Chrome work for this is tracked here. We'll file PRs/bugs on the relevant specs shortly.

We still plan to support VideoFrame -> ImageBitmap for other interop, but VideoFrame.createImageBitmap() will be removed from the spec in favor of window.createImageBitmap(videoFrame). The former was just easier for us to prototype.

@chcunningham
Copy link
Author

chcunningham commented Feb 23, 2021

Also, I to highlight the time units discussion @padenot mentioned.
w3c/webcodecs#122

This and other top priority issues are all assigned the 2021 Q1 milestone.
https://github.com/WICG/web-codecs/milestone/2

Input welcome!

@chcunningham
Copy link
Author

The previous tag review mentioned ImageDecoding. I will send a PR shortly to add this to the spec: https://github.com/dalecurtis/image-decoder-api

This PR is now out here: w3c/webcodecs#152

A preview of the Image Decoding sections can be viewed here: https://pr-preview.s3.amazonaws.com/WICG/web-codecs/pull/152.html#image-decoding

@chcunningham
Copy link
Author

@cynthia @kenchris, just checking in. How is the review going? Have you had a chance to discuss w3c/webcodecs#104 and w3c/webcodecs#122?

I know its a big spec and the ImageDecoder portion is new as of last week. We really appreciate your work!

@chcunningham
Copy link
Author

@cynthia @kenchris, friendly ping. I'm hoping to hear from you soon so we can respond to your feedback before we ship. I've delayed the Chrome timeline by one milestone (now with code freeze around May 20), but we'll want to make any substantial changes well in advance.

@chcunningham
Copy link
Author

Also, I to highlight the time units discussion @padenot mentioned.
w3c/webcodecs#122

@padenot @aboba @sandersdan and I discussed this offline and resolved to use integer microseconds. Please see a summary of how that decision was made here: w3c/webcodecs#122 (comment)

@chcunningham
Copy link
Author

@kenchris @cynthia - here's another issue that TAG could help us resolve

Should WebCodecs be exposed in Window environments?
w3c/webcodecs#211

@kenchris
Copy link

kenchris commented May 7, 2021

Generally, I think the points in w3c/webcodecs#122 (comment) make a lot of sense.

As I understand we could always add an alternative attribute with values represented by DOMHighResTimestamp. With that in mind, I support you using a different type (here unsigned microseconds).

@dbaron also made the point (in email thread) that the timestamps you are defining are within the time of the video or audio stream/file, and is independent of the playback rate as that doesn't change the timestamp but just causes it to advance at a different rate

Both David and I agree that if that is understood correctly, it is pretty reasonable that it is a different type, differently-defined.

@kenchris
Copy link

Regarding exposing to Window environments, we are very sympathetic to @youennf's comment

It seems safer though to gradually build the API. Exposing to Worker at first covers major known usecases and allows early adopter through workers or JS shims to start doing work in Window environment. This might help validating the model is right also for window environments.

@cynthia
Copy link
Member

cynthia commented May 11, 2021

@chcunningham @padenot, Sorry this took so long!

@kenchris @rhiaro and I discussed this in our F2F. Here is a summary of our discussion.

  1. Temporal representation, we've discussed this and an integer representation seems to be the most adequate given the tradeoffs and risks that are associated with using a real representation.
  2. Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.
  3. Transfer / detach? We read the discussion and making the transfer opt-in seems reasonable - although we did lean more towards implicit transfer and make it so that opt-out is explicit. (Basically, zero garbage unless requested) But there is the risk of this behavior being inconsistent with the rest of the platform.

ImageDecoder was an interesting review. API-design-wise, this felt a lot more like a video decoder API that "supports" images (not surprising given that this is exactly what it is), but we did have some mixed feelings about the ergonomics of the API. What mechanism would allow the user to say, decode an image and put it into an HTMLImageElement? (This seemed like something that people might do)

@dalecurtis
Copy link

ImageDecoder was an interesting review. API-design-wise, this felt a lot more like a video decoder API that "supports" images (not surprising given that this is exactly what it is), but we did have some mixed feelings about the ergonomics of the API. What mechanism would allow the user to say, decode an image and put it into an HTMLImageElement? (This seemed like something that people might do)

If you didn't see it, https://github.com/dalecurtis/image-decoder-api/blob/master/explainer.md#providing-image-decoders-through-the-videodecoder-api talks a bit about the reasons for why we didn't just add 'image codecs' to VideoDecoder.

I think it's unlikely someone would want to interoperate with the image element in that way -- we certainly haven't had any requests for it. Canvas is instead the natural rendering point for authored content like this and folks reimplementing aspects of HTMLImageElement with Canvas+ImageDecoder seems more likely. What we have also had requests for is something like a 'image-decoder-worklet' where folks can bring their own image decoders - but that's a bit orthogonal to what this API is trying to do and again is likely served more simply by canvas.

@chcunningham
Copy link
Author

@kenchris @cynthia thank you so much for the review

  1. Temporal representation, we've discussed this and an integer representation seems to be the most adequate given the tradeoffs and risks that are associated with using a real representation.

SG, I'll update the spec shortly.

  1. Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.

In the WG call Tuesday it was suggested that we could check with participants in the Chrome origin trial to see about use cases for window-exposed interfaces. Should take about a week. I'll provide updates on the issue. If the WG cannot come to consensus by Chrome's v1 launch, we can fallback to worker-only exposure while the discussion continues.

  1. Transfer / detach? We read the discussion and making the transfer opt-in seems reasonable - although we did lean more towards implicit transfer and make it so that opt-out is explicit. (Basically, zero garbage unless requested) But there is the risk of this behavior being inconsistent with the rest of the platform.

We considered opt-out for the same reasons and were sensitive to the consistency risk. We also concluded that transfer is often undesirable/infeasable in a few common scenarios like passing ArrayBufferView's (particularly those that are backed by the WASM heap). We felt this made automatic transfer even riskier, as the behavior would vary based on the type of the provided BufferSource.

@chcunningham
Copy link
Author

@cynthia @kenchris

Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.

In the WG call Tuesday it was suggested that we could check with participants in the Chrome origin trial to see about use cases for window-exposed interfaces.

The replies from our user survey are now summarized in this comment. Please let me know if this changes your recommendation.

Also, for other points mentioned above (ImageDecoder, temporal representation, transfer / detach), please let me know if the replies above have adequately addressed the concern. On temporal representation, the spec is now updated as described.

@cynthia
Copy link
Member

cynthia commented Jun 7, 2021

@chcunningham

The replies from our user survey are now summarized in this comment. Please let me know if this changes your recommendation.

Since the users have spoken, it seems like it would make sense to expose this to Window - unless there is strong opposition from the group. (Even if there is opposition, there seems to be a clear preference here so I think it's worth negotiating towards the direction of Window exposure.)

Also, for other points mentioned above (ImageDecoder, temporal representation, transfer / detach), please let me know if the replies above have adequately addressed the concern. On temporal representation, the spec is now updated as described.

I personally believe so, we'll discuss this in a plenary call and close this if everyone agrees. Thanks a lot for being patient during the long process!

@hober
Copy link
Contributor

hober commented Jun 10, 2021

Since the users have spoken, it seems like it would make sense to expose this to Window - unless there is strong opposition from the group.

The WG met the other day, and it was clear (at least to me) that there isn't consensus in the group on whether or not to expose this to Window.

(Even if there is opposition, there seems to be a clear preference here so I think it's worth negotiating towards the direction of Window exposure.)

I like the idea of starting with just exposing it on Worker. It's always easier to add additional exposure in the future than it is to remove it. Best to start small and iterate over time than to start out with too wide exposure & find you regret it in the future.

@dalecurtis
Copy link

During the WG call today, @youennf expressed concern that he's unsure of the official TAG position on window vs worker. Presumably because @hober seems to have contradicted @cynthia's (presumed official TAG) position above. Can y'all affirm the official TAG position?

@hober
Copy link
Contributor

hober commented Jun 15, 2021

During the WG call today, @youennf expressed concern that he's unsure of the official TAG position on window vs worker. Presumably because @hober seems to have contradicted @cynthia's (presumed official TAG) position above. Can y'all affirm the official TAG position?

Hi,

Neither @cynthia nor my comment is "the TAG position", we each just expressed our personal opinions.

@dalecurtis
Copy link

Thanks for the clarity, that's a bit confusing from our perspective then. We'll keep that in mind for future comments. Should we expect an official TAG position here (presumably labeled as such)?

@cynthia
Copy link
Member

cynthia commented Jun 16, 2021

We can do that if that helps - generally we tend to lean towards what users demand/prefer, in this particular case it's also probably worth clarifying if @hober's position is with or without the implementor hat on.

(I'm not an implementor, so I'm fairly neutral on this)

@jan-ivar
Copy link

... the users have spoken ...

A survey of participants who jumped on and invested resources in an experimental Chrome origin trial, is perhaps a skewed sample of users to rely on for a question of whether to exercise caution when it comes to exposing APIs prematurely.

@chcunningham
Copy link
Author

@cynthia, having a clear TAG position here would be helpful to us in resolving the discussion. We are quite stuck.

A survey of participants who jumped on and invested resources in an experimental Chrome origin trial, is perhaps a skewed sample of users to rely on for a question of whether to exercise caution when it comes to exposing APIs prematurely.

We polled these users at the suggestion of the WG chair. In our view it seems correct that this savvy group of innovators and early adopters should have a strong voice in shaping the API.

@jan-ivar
Copy link

jan-ivar commented Jun 18, 2021

We are not "stuck". We have broad consensus to expose this API in workers immediately. There still exists concerns, trepidation, and questions around the urgency of additionally exposing this API on Window at this time.

We generally reserve "users" to describe end-users, not web developers. End-users deserve well-designed applications that work without jank across devices of all abilities, and regardless of user agent. I think the WG with its current stance (which it is in the process of attempting to assert through CfC) is being cautious with the tools it exposes to web developers in different environments, and when, to help ensure this outcome for end-users.

@chcunningham
Copy link
Author

We generally reserve "users" to describe end-users, not web developers. End-users deserve well-designed applications that work without jank across devices of all abilities, and regardless of user agent.

I respect that. I think we should also consider web developers as advocates for their users.

I think the WG with its current stance (which it is in the process of attempting to assert through CfC) is being cautious with the tools it exposes to web developers in different environments, and when, to help ensure this outcome for end-users.

At this time the WG does not have a stance. The members of the WG are in disagreement.

@torgo torgo added the Progress: propose closing we think it should be closed but are waiting on some feedback or consensus label Aug 26, 2021
@cynthia
Copy link
Member

cynthia commented Sep 14, 2021

Thank you all for your patience, we hope the feedback we provided on the issue was useful at reaching a resolution. We believe that the feature itself would be a useful addition to the platform and would like to see this move forward. Thanks again for the patience and the apologies for the mixed messages. We'll discuss this in the rollups today during our VF2F and close.

@cynthia cynthia closed this as completed Sep 14, 2021
@torgo torgo added Resolution: satisfied The TAG is satisfied with this design and removed Progress: in progress Progress: propose closing we think it should be closed but are waiting on some feedback or consensus labels Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: satisfied The TAG is satisfied with this design Topic: media Venue: WICG
Projects
None yet
Development

No branches or pull requests

9 participants