Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move "global" internal slots to MediaDevices, and fix "relevant object" links #939

Merged
merged 8 commits into from
Mar 23, 2023

Conversation

jan-ivar
Copy link
Member

@jan-ivar jan-ivar commented Mar 6, 2023

Fixes #938. Using hide whitespace when reviewing is recommended.


Preview | Diff

@jan-ivar jan-ivar self-assigned this Mar 6, 2023
@jan-ivar
Copy link
Member Author

jan-ivar commented Mar 7, 2023

In cleaning this up, I came across a couple of realities that I've tried to handle better with this PR.

  1. There's more than one mediaDevices object to consider. E.g. in iframes.
  2. A source only stops when ALL its tracks across the entire user agent have ended, so stop all sources on ONE global object using [[mediaStreamTrackSources]] doesn't work. I reworked it to lean on existing ends algorithms instead, which seems to work well even for transfered tracks.
  3. Some of the internal slots were (and still are) accessed off main thread
  4. For Privacy Indicator Requirements I added a note saying dealing with multiple iframes is an exercise left to UAs.

@youennf
Copy link
Contributor

youennf commented Mar 7, 2023

2. A source only stops when ALL its tracks across the entire user agent have ended

A UA can stop a source and all tracks will get ended.
This happens when for instance the camera is unplugged.

, so stop all sources on ONE global object using [[mediaStreamTrackSources]] doesn't work.

I do not see why it would not work, getUserMedia creates a source that gets associated to ONE global object.
It is true that in the current spec, ending the tracks is needed to not trigger needless ended events at unload time.
For transferred tracks, stopping the source will trigger the ended event, which does not seem bad.

@jan-ivar
Copy link
Member Author

jan-ivar commented Mar 7, 2023

I do not see why it would not work, getUserMedia creates a source that gets associated to ONE global object.

My understanding is a source "can be a physical webcam", making it a singleton in the entire user agent.

gUM doesn't "create" it, as much as it: "Let[s] grantedDevice be finalCandidate's source device. ... Let track be the result of creating a MediaStreamTrack with grantedDevice", which initializes track.[[Source]] to grantedDevice.

E.g. in this iframe example (Once I hit Allow and the Camera button in the iframe) two video element self-views will play, but there is ONE source (the singular physical camera) associated with TWO global objects (two navigator.mediaDevices objects with distinct relevant global objects, one per JS realm).

For transferred tracks, stopping the source will trigger the ended event

But stop all sources isn't handling that correctly, as written. It says: "For each MediaStreamTrack object track whose relevant global object is globalObject, set track's [[ReadyState]] to "ended". ... For each source in globalObject's [[mediaStreamTrackSources]], stop source."

This fails to:

  1. queue a task
  2. consider the same source may be tied to multiple global objects, and thus shouldn't be stopped
  3. include tracks that have been transferred away to other global objects that now should end (since track source lifetime management remains tied to the context that created it)
  4. consider tracks that have been transferred here from other global objects (which get included here), but if this was the last reference, it needs to end that reference and stop the source in those other global objects)

In the PR this is fixed like this.

@jan-ivar
Copy link
Member Author

jan-ivar commented Mar 7, 2023

A UA can stop a source and all tracks will get ended.
This happens when for instance the camera is unplugged.

That sounds like ends for any reason other than the stop() method being invoked, not stop all sources of a global object. Unplugging a camera should stop ONE source, ending all tracks of that source regardless of how many global objects it is associated with.

@youennf
Copy link
Contributor

youennf commented Mar 8, 2023

My understanding is a source "can be a physical webcam", making it a singleton in the entire user agent.

Ah, this is where we are not aligned.
I do not think the spec works well with source as a singleton.
For instance, it is useful to be able to have all tracks of a document tracks being muted while tracks of another document tracks are not muted, even if they both are using the same camera. Safari is using this for instance to have only one tab capturing at a time. If the source was a singleton, it would not be possible.

AIUI, the main (only) requirement is that cloned tracks have to share the same source.
This is the minimum granularity that all UAs adhere to and this is what we can use as a basis.

This fails to:

  1. queue a task

We should dig into this, stopping the sources happen when the document gets unloaded, it is not clear whether queueing a task at this time will ensure the task will be executed.

  1. consider the same source may be tied to multiple global objects

It works as follows:

  • the source has tracks in different global objects
  • only one global object will trigger stopping the source when global object gets unloaded
  • all tracks that are not in the source global object will get ended after the source is stopped (with an ended event).
  1. include tracks that have been transferred away to other global objects that now should end (since track source lifetime management remains tied to the context that created it)

See above, this is handled.

  1. consider tracks that have been transferred here from other global objects (which get included here), but if this was the last reference, it needs to end that reference and stop the source in those other global objects)

As per above, I am not sure why this would be needed.

@jan-ivar
Copy link
Member Author

jan-ivar commented Mar 8, 2023

I do not think the spec works well with source as a singleton.
For instance, it is useful to be able to have all tracks of a document tracks being muted while tracks of another document tracks are not muted, even if they both are using the same camera. Safari is using this for instance to have only one tab capturing at a time. If the source was a singleton, it would not be possible.

As you say, tracks are muted, not sources, so I don't immediately see what's not possible. But let's dig in:

§ 4.3.1.2 Media Flow begins with "Muted refers to the input to the MediaStreamTrack.", BUT this ultimately just describes the track's muted state, and when the UA should update it, which it accomplishes (almost!) without mentioning "source".

The section has just two mentions of "source" near the end:

  1. "media from the source only flows when a MediaStreamTrack object is both unmuted and enabled.", which is fine
  2. "For a newly created MediaStreamTrack object, the following applies: the track is always enabled unless stated otherwise (for example when cloned) and the muted state reflects the state of the source at the time the track is created."

The second one does mention "the muted state reflects the state of the source", which seems a tacit admission that the source has some (unnamed) state. But I think s/source/input/ here would have been more self-consistent with the rest of this section.

I think what happened here was that all the user agent examples we could think of at the time were global to the user agent ("the user pushing a physical mute button on the microphone, the user closing a laptop lid with an embedded camera, the user toggling a control in the operating system"), and we wanted to express the invariant that creation of new tracks shouldn't sidestep those user-directed privacy measures, so we had no choice but to tacitly refer to some state outside of the track.¹

I think the challenge here is how to express this invariant while at the same time allowing a user agent to mute all tracks in some contexts and not others. — But I don't think defining multiple instances of source is the answer, because having an instance live inside a single (per realm) global object seems too limiting, because in my view, the idea that gUM for the same camera from a page and from an iframe in that page (see my example above) refers to distinct source objects seems confusing for little benefit. At least in Firefox, privacy indicators act as if this is the same source (yes I gave up fixing that section in the spec).

AIUI, the main (only) requirement is that cloned tracks have to share the same source. This is the minimum granularity that all UAs adhere to and this is what we can use as a basis.

Transferred tracks also share the same [[Source]] of their detached counterpart. Such tracks may be transferred not just to workers but to other documents, in which case their relevant global object (which will be in the destination realm) won't have it in its [[mediaStreamTrackSources]].

This, iframes, and the way privacy indicators have been implemented in all browsers, I think suggest the minimal granularity is a page. E.g. if an iframe calls gUM, transfers the resulting video track up to the document containing it, and then the iframe navigates, does the the video track end? If so, why should it? The permissions were delegated to the iframe in the first place, so why can capture not continue? So perhaps the source should live in the top-level document? Or maybe we can write algorithms that don't depend on where sources live?

This fails to:

  1. queue a task

We should dig into this, stopping the sources happen when the document gets unloaded, it is not clear whether queueing a task at this time will ensure the task will be executed.

Good point. I can remove that.

It works as follows:

* the source has tracks in different global objects

* only one global object will trigger stopping the source when global object gets unloaded

* all tracks that are not in the source global object will get ended after the source is stopped (with an ended event).

My problem with this and the current text is it only works correctly if we ASSUME sources are different per global object, which at best isn't obvious and at worst seems to lack support in the spec, which means when I read this (thinking sources are singletons) it reads to me as doing the WRONG thing (stopping other concurrent sources). In contrast, I hope my PR does not rely on that assumption (since it was written without it).


1. There's also the more recent: "[[Muted]], initialized to true if source is muted", which I think erroneously refers to the track muted state.

@youennf
Copy link
Contributor

youennf commented Mar 9, 2023

if an iframe calls gUM, transfers the resulting video track up to the document containing it, and then the iframe navigates, does the the video track end?

Yes, this is the current expected behavior, which aligns with what JS can do by transferring manually VideoFrames.
We can of course relax this rule but we should open a new issue to discuss this.

@youennf
Copy link
Contributor

youennf commented Mar 9, 2023

My problem with this and the current text is it only works correctly if we ASSUME sources are different per global object, which at best isn't obvious

I think the intent of the spec is clear, as we discussed this when designing transferred streams.
We can always clarify this in the spec.

@jan-ivar
Copy link
Member Author

jan-ivar commented Mar 9, 2023

I see my mistake now. I misread what the original text was trying to do (I read it as inadvertently stopping all tracks of a physical source in the user agent). But in fixing it, the PR fails to preserve the original intent, which I failed to interpret and we wish to to preserve, which is effectively to end all tracks "tied" to the mediaDevices object they came from (through mediaDevices.getUserMedia()), so that even tracks transferred elsewhere end when this document unloads. Or as mediacapture-extensions says: "When originalContext goes away, trackSource gets ended, thus transferredTrack gets ended."

I'll try to update my PR to not break that. To be fair, I believe the existing language only accomplishes that under ONE interpretation of the scope of source (one per global object), NOT the other (source is a singleton of the user agent), but I believe we can express this while being agnostic of source scope.

I think the intent of the spec is clear, as we discussed this when designing transferred streams.
We can always clarify this in the spec.

This is all I'm trying to do. :)

@youennf
Copy link
Contributor

youennf commented Mar 9, 2023

To be fair, I believe the existing language only accomplishes that under ONE interpretation of the scope of source (one per global object), NOT the other (source is a singleton of the user agent)

The assumption is more one source per getUserMedia call (well per track created by getUserMedia).
This mirrors well the principle of one source/capture controller per getDisplayMedia call.
Could we get a consensus on this model from an editorial point of view?

@jan-ivar
Copy link
Member Author

jan-ivar commented Mar 9, 2023

The assumption is more one source per getUserMedia call (well per track created by getUserMedia).

Where in the spec is such an assumption relied on?

Rather, the source stopped algorithm relies on a 1-1 between a source and device:

  • "When all tracks using a source have been stopped or ended by some other means, the source is stopped. If the source is a device exposed by getUserMedia(), then when the source is stopped, the UA MUST run the following steps:
  1. Let deviceId be the device's deviceId.
  2. Set [[devicesLiveMap]][deviceId] to false."

If JS calls gUM twice and get two video tracks with the same deviceId, and then I end one of them, a per-track source interpretation would turn off the privacy indicator for this realm even though the other video track is still live.

The [[devicesLiveMap]] is on the global object (which is 1-1 with navigator.mediaDevices),

That seems like one too many 1-to-many relationships.

@jan-ivar
Copy link
Member Author

I've simplified this PR to focus on the move of internal slots, which should hopefully be editorial now. @youennf PTAL.

@@ -693,7 +692,9 @@ <h2>{{MediaStreamTrack}}</h2>
<a data-cite="!HTML/#concept-relevant-global">relevant global object</a> is <var>globalObject</var>,
set <var>track</var>'s {{MediaStreamTrack/[[ReadyState]]}} to
{{MediaStreamTrackState/"ended"}}.</p></li>
<li><p>For each <var>source</var> in <var>globalObject</var>'s {{MediaDevices/[[mediaStreamTrackSources]]}},
<li><p>If <var>globalObject</var> is a {{Window}}, then for each <var>source</var> in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could replace the {{Window}} check here by a check on whether there is an associated MediaDevices object, which could be stored in a specific internals lot.
That way, we do not need to come back here if we ever expose things in workers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we should follow HTML here, which has 15 hits for "is a Window", e.g. "If global is a Window object, then return global's associated Document." — I'd similarly avoid "If X has internal slot Y".

Preserving this invariant allows readers to infer types of passed-in arguments in algorithms (for which there often are no other nearby clues) from their associated properties and internal slots.

@@ -2658,6 +2657,9 @@ <h2>Enumerating Local Media Devices</h2>
camera or a headset).</p>
<section>
<h3>`Navigator` Interface Extensions</h3>
<p>Each {{Window}} has an <dfn>associated `MediaDevices`</dfn>, which is a {{MediaDevices}} object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use an internal slot?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTML seems to prefer this format to describe inherent (constant) 1-1 relationships, which I cribbed from:

Each Window has an associated UserActivation, which is a UserActivation object. Upon creation of the Window object, its associated UserActivation must be set to a new UserActivation object created in the Window object's relevant realm.

...
partial interface Navigator {
  [SameObject] readonly attribute UserActivation userActivation;
};

While HTML does use internal slots in places, its use seems limited to variables repeatedly modified by algorithms (beyond initialization). E.g. I don't see "set X's associated Y to newValue" where it replaces an old value.¹

Since we never change the mediaDevices object on navigator, I think the "associated" language gives readers one less variable to worry about (like const over let in JS), so I think I prefer that.


1. This doesn't appear to be a 100% rule, as I do see things like: "each Window W has a last activation timestamp", where it seems to rely on language to differentiate whether something can change or not (e.g. last vs. associated). But this is tipping in the other direction, away from using the [[]] format.

to the [=User Agent=].</p>
</li>
<li>
<p>Create one internal slot: <dfn data-dfn-for="MediaDevices">[[\canExposeCameraInfo]]</dfn>, initialized
<p><dfn data-dfn-for="MediaDevices">[[\canExposeCameraInfo]]</dfn>, initialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct but I am not sure we have tests for this.
Say one same origin iframe has access to camera and microphones, what about its same origin parent for instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR merely moves internal slots from window (one per realm) to window.navigator.mediaDevices (one per realm) which should be behavior neutral in all JS-observable ways, so blocking on tests to merge this cleanup would be unfortunate I think.

That said, more tests in this area would be great! Happy to try to add some later, maybe in bug 1528042.

Say one same origin iframe has access to camera and microphones, what about its same origin parent for instance?

Two same-origin iframes/parents are different realms. So both before and after this PR, enumerateDevices exposure in the parent would not be enabled by getUserMedia calls in iframes, or vice versa, as the spec is written.

If this is something we'd want to enable, we should open a new issue to discuss. On the one hand, it seems like an oversight in that it's stricter than gUM permissions which are inherited (in some browsers). OTOH, I'm not super excited to enable something for consistency whose primary use case would likely be tracking.

@jan-ivar
Copy link
Member Author

jan-ivar commented Mar 23, 2023

The PR now should be true to its title and only move internal slots that were on the global object window to the window.navigator.mediaDevices object. There should be no behavioral difference from this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Global object relevant to what?
3 participants