-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate the possibility to transfer MediaStreamTrack #16
Comments
@youennf, as per your request, some of the challenges Chrome would have in implementing this:
|
For my use case (w3c/webrtc-extensions#64 (comment)), just handling mic & camera would itself be useful -- really I just need a parent iframe to be able to mediate access to these from a child, though it would be nice to be able to supply apps with other sources at the users' discretion; with other resources apps can request, the platform lets users supply any API-compatible object (which can be supplied by the Sandstorm itself, or by other apps). It would be nice to have that property extend to browser-provided objects too. But perhaps if the general case is a much bigger engineering effort, we should settle for just mic & camera in the short term. I suppose my need may theoretically apply to other things that generate a browser permission prompts as well, though for stuff like location data it's not prohibitive to just proxy, which we obviously can't do with mic & camera; I'm not sure if there's anything else where we'd really need the browser's help. |
To triage discussion, this is https://github.com/w3c/mediacapture-main/issues/529.
I don't think there's a general case. Camera capture is already over IPC in most implementations, so transferring a What's true though is the Transferring a I think the main challenge is figuring out concurrent access to the underlying source, since tracks can be cloned (normally, when an object is transferred, it's a hand-off, but |
We would need to decide whether the transferred MediaStreamTrack would get ended if its origin document is being destroyed. Transferred Readable/WritableStreams work this way for instance.
Agreed. Some 'transfer' operations for some MediaStreamTrack might have a very limited cost. But others might have a greater cost. |
Yes. On one end, there's precedence with same-origin popups (which can be handed A driving factor behind that behavior may be A) accidental, and B) all the privacy indicators are still in the original document. On the other hand, being able to temporarily hand off That said, extending camera/mic capture past tab close I think would be scary and surprising. 😨
Those in contrast seem designed specifically for cross-realm piping, so they inherently need to work that way. But if I postMessage(track.clone()) ...then that's different from that I think. It's more like passing a resource handle around. |
It seems to me like (1) mic/camera and (2) screen-capture are special, in that the real source there is the browser itself. These contrast with cases where the application is the source of the data; the examples I can cite include (a) Canvas, (b) PeerConnection, (c) Breakout Box, (d) Web Audio and (e) HTMLMediaElement. Four out of these five are in the slide you've referenced. I am not sure if this list is exhaustive or not. By "general" I mean to say {1, 2, a, b, c, d, e, ...}. Of these, we agree that 1 and 2 can be done without an additional IPC. But all other cases would, AFAICT, require additional IPCs. What is suggested? To support transferability of only 1 and 2, and make transferring fail for the other cases? Or to support all cases, but have it be cheap in some cases and expensive in others? |
That seems about right.
Well, there are potentially same-process use cases like posting to a worker that would still work for all. For getTabMedia, my preference would be to assign it to an RTCRtpSender directly, instead of postMessaging to a different iframe just to do the same thing. |
For most of these existing cases, there needs to be a good way to share audio and video across processes anyway.
I would first define what we want out of MediaStreamTrack transfer, and how we want to define it. I would hope we can add support for these cases in a reasonably cheap way, and have it almost free in a few known cases (in-process transfer for instance). |
References to MediaStreamTrack "types" in this comment will refer to the enumeration from a previous comment.
There are probably simple cases, but I think a definition will be necessary for all cases. It is not clear to me how the suggestion discussed here intends to handle cross-origin transfer of MediaStreamTracks of types a-e. If the definition that ends up being chosen specifies that transferability only applies to MediaStreamTracks of type 1-2, and that attempting to transfer tracks of types a-e would fail, then I might be able to incorporate implementing this into my work plan. (My work plan currently centers on cropping MediaStreamTracks of type-2.) But if the specification that is chosen allows the transfer of MediaStreamTracks of any arbitrary type, then the required engineering effort would be quite substantial, and I am not aware of anyone working on Chromium that is currently interested in the engineering investment that would be required for implementation.
Transfer between different types of processes comes at different engineering and CPU costs. It is not immediately obvious to me that Chromium has an easy-to-implement, efficient way to transfer video frames from one render-process to another, when these frames originate in the render process itself (i.e. MediaStreamTrack types a-e). This of course does not mean that making MediaStreamTracks transferable is not desirable. It does sound like an overall good thing to me. I just don't think that the investment necessary to implement transferring of arbitrary tracks would be within scope for me atm. And I am not sure who else might need it enough to implement it in the near future. Perhaps @alvestrand knows. |
I have some interest in b (peer connection) as well as 1 and 2. The reason for the latter is that I also want to be able to block the inner iframe from establishing webrtc network connections itself (which requires changes to CSP, see w3c/webappsec-csp#457), which would mean that the networking bits would also need to be mediated by the parent frame, and I'd need some way to pass network-obtained streams into the inner iframe. My gut tells me that that should be much easier than e.g. canvas, but I'm not familiar with the relevant internals. |
I think we'd first have to decide whether using this to circumvent CSP is a desirable or a concerning property. |
For my purposes it definitely falls into "desirable", but I can see the behavior might be surprising to someone who set afaik, csp currently doesn't govern anything where it has to interact with transferability (correct me if I'm wrong), so perhaps that conversation is broader than just webrtc: it applies to anything that csp touches that might be transferable. |
@annevk, interested in your thoughts on the CSP interaction here as well. (It's a big awkward trying to coordinate issues across several repos that are interrelated like this, sorry for the disorganization...) |
I think it depends on where CSP is enforced whether it would work or not (e.g., if it was enforced in the constructor of transferred things this would not work), but it seems okay for me that this bypasses CSP. |
For my part, my preference would be to keep it simple and just allow this, though it would be sad if later somebody came along with a use case where they wanted to block even postMessage transfers, and for compatibility sake we again had to build another corner case where just setting |
If you want to be that restrictive you really ought to disable |
Quoting Anne van Kesteren (2021-02-04 01:17:14)
If you want to be that restrictive you really ought to disable
postMessage() though as all kinds of data can flow through there if you
allow it.
Then perhaps for the purposes of webrtc's CSP policy we shouldn't try to
block postMessage(), and if someone wants to do that they can raise the
issue of disabling postMessage() somehow separately.
|
Started a PR at defining the transferring steps. The basic principles are:
|
Doesn't that allow observing GC? Does that work across agent clusters? |
Commentary inline.
This is good, and reflects that the relationship between track-in-R1 and track-in-R2 is exactly the same as if you call track-in-R2 = track-in-R1.clone(); track-in-R1.stop().
This I'm nervous about. I'd rather phrase it as "If the source of the track created in R1 and transferred to R2 goes away, the track in R2 gets ended".
Agreeed; again, capture indicators belong to sources, not tracks directly.
If we move to the definition above, this happens by default. |
We can make sure to keep a strong reference like done for message ports. |
In the current spec, we have track ended -> source stopped. I haven't seen source stopped -> track ended, hence the current phrasing. |
See https://html.spec.whatwg.org/#agents-and-agent-clusters. They're the conceptual process boundary, if you will. |
Are there already web objects for which transfer works in process but not over process boundary, as a design decision? I think there are use cases for transferring MediaStreamTrack over process boundary. In practice, the capture is already living more and more out-of-process so User Agents have ways to transfer media over processes efficiently. Implementations might start with a limited subset though, workers for instance. |
|
The most likely case of a transferrable source is probably a canvas capture (https://www.w3.org/TR/mediacapture-fromelement/). OffscreenCanvas is already defined as Transferable, so if you generate a MediaStreamTrack from an OffscrenCanvas (using HTMLCanvasElement.captureStream()), and then transfer the OffscreenCanvas elsewhere, it doesn't seem reasonable to stop the track just because the original context goes away. I'm sure there are dragons here somewhere, though. |
I see, so there might be a future API that allows generating MediaStreamTrack from an OffscreenCanvas. |
Question about what model to choose for this operation: Should we make MediaStreamTrack Serializable instead of Transferable? We have precedent (RTCCertificate, for instance) for objects that are Serializable without their innards being observable; that example is also an example of an object that can be transferred between origins, but only used in its original origin, so we can build in the restrictions we want to have on the object. If we define it in such a way that Serialize / Deserialize is the exact equivalent of Clone, except that the two may happen in different context, I think a lot of our definitional issues go away, and the PR defining behavior can be a lot shorter. (call out to @dogben for coming up with the idea) |
|
I am not sure what we would gain in terms of simplification by using serializable. My understanding is that RTCCertificate is serializable so that it can be stored in say IDB. I do not think we want to store MediaStreamTrack in IDB so would keep using transferable. |
@youennf you can use forStorage for that when defining the serializable steps, no? The key question around transferable is whether you need detach semantics. |
Maybe, by updating the algorithm then? Focusing on semantics, I am unclear what would be the relationship between the original track and the serialized track. My understanding is we want them to be fully independent, stopping one would not stop the other for instance. We can achieve this by using transferable. |
Conceptually, a certificate is a dead object, a read-only written record/contract. Serializing it into a storable form (e.g. as bytes + maybe encrypted bytes on a disk) makes sense, because imagining it in a stored form makes sense. A MediaStreamTrack is a live object often representing a realtime source that's been negotiated with the user right now, maybe unplugged tomorrow, and whose state may have live impact on things like hardware camera lights and browser privacy indicators. Having such a handle object exist in a serializable/storable form seems like a tenuous concept, as it would appear to challenge whether this handle represents a legitimate reason to keep the device open — storing an handle that keeps devices open to disk seems like a bad idea — and for how long. It also seems harder to track all open references. But maybe I missed the problem being solved? |
Perhaps we should rename [Serializable] since it essentially comes down to a copy (and the object that is copied still functions; there's the separate aspect of storing that copy which is something each object can decide for itself through forStorage). [Transferable] is essentially a move (and the object that is moved becomes detached and is no longer usable). cc @domenic |
So by detach semantics (if I understand correctly), we mean transfer of ownership, which I think is the most natural and conservative semantic here, since tracks explicitly ref-count their (often hardware) sources. We already have copy-semantics of handles with Copy semantics would require JS to remember to Move semantics would require JS only to remember to The former seems easy to forget and hard to detect that it was forgotten, while the latter seems hard to ignore and easy to detect one's mistake (the non-performing/dead track being quite obvious). |
Also, every track object carries its own set of constraints on the source, which means copies get out of sync over time with |
Re #16 (comment) - the argument that "anyone who wants copy semantics can do track.clone" is completely analogous to "anyone who wants move semantics can do track.stop". If, for instance, an application that uses a custom codec in a worker for encoding (an use case considered many times) also does a self-view, the naturally desired semantics of a message operation is the copy semantic, not the move semantic. |
It's not completely analogous, right? Because the latter would take up twice the resources. |
When we already support copy semantics (by .clone), supporting move semantics through a different mechanism is more conceptual clutter. I come down to this: Both cases can be supported by either Serializable or Transferable, but the definition of Transferable requires three things to happen at once (destruction, moving and recreation), while Serializable provides you with the toolbox to build the operations you need out of conceptually simpler pieces. Small, sharp tools. |
I would mostly look from a web developer point of view. |
If you need both, you can also use both Serializable and Transferable. They are not mutually exclusive. (See ArrayBuffer for instance.) And they have a different invocation, so you can also add one of them later. (Given that you have |
Excessive copies are potentially harmful, so transferable seems preferable. We don't need serialization since we have |
Fixed by #24 |
This might be useful in case identified in w3c/mediacapture-screen-share#158.
If we go with media capture insertable streams, JavaScript could potentially shim such a postMessage by getting access to individual frames and sending them through postMessage to recreate a MediaStreamTrack.
Implementing transfer by the user agent could make it easier to developers and potentially more efficient.
The text was updated successfully, but these errors were encountered: