Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track transferability requirement adds complexity to some common use cases #116

Open
guidou opened this issue Nov 19, 2024 · 9 comments
Open

Comments

@guidou
Copy link
Contributor

guidou commented Nov 19, 2024

In many use cases it is better to keep all tracks on Window and use the worker just for the processing.

Currently, mediacapture-transform requires transferring the track to be processed to the worker, which adds complexity to these use cases.

Consider the use case of a before/after preview window where the original camera track is shown together with the processed track. With the spec as currently written, something like this is needed (omitting some variable declarations, for simplicity):

// main.html
function setUpWorker() {
  worker = new Worker("worker.js");
  worker.onmessage = async event => {
    if (event.data.command == "processed-track") {
      let stream = new MediaStream([event.data.param]);
      processedVideo.srcObject = stream;
      await processedVideo.play()
    } else if (event.data.command == "stopped-track") {
      track.stop();
    }
  }
}

async function startCapture() {
    let stream = await navigator.mediaDevices.getUserMedia({video:true});
    originalVideo.srcObject = stream;
    await originalVideo.play()
    track = stream.getVideoTracks()[0];
    let cloneTrack = track.clone();
    worker.postMessage({command: "process-track", param: cloneTrack}, [cloneTrack]);
}

function stopCapture() {
  worker.postMessage({command: "stop-track"}, [track]);
}

// worker.js
onmessage = event => {
    if (event.data.command == "process-track") {
        track = event.data.param;
        let processor = new MediaStreamTrackProcessor({track});
        let generator = new VideoTrackGenerator();
        processor.readable.pipeThrough(GetTransform()).pipeTo(generator.writable);
        postMessage({command: "processed-track", param: generator.track}, [generator.track]);
    } else if (event.data.command == "stop-track") {
        track.stop();
        postMessage({command: "stopped-track"});
    }
}

Compare to how it would be without forcing track transferability:

// main.js
function setUpWorker() {
  worker = new Worker("worker.js");
}

async function startCapture() {
    let stream = await navigator.mediaDevices.getUserMedia({video:true});
    originalVideo.srcObject = stream;
    await originalVideo.play();
    track = stream.getVideoTracks()[0];
    processedTrack = MediaDevices.createVideoTrackGeneratorAndProcessor(worker, track);
    processedVideo.srcObject = new MediaStream([processedTrack]);
    await processedVideo.play();
}

function stopCapture() {
  track.stop();
  processedTrack.stop();
}

// worker.js
oncapturetransform = event => {
    event.processor.readable.pipeThrough(GetTransform()).pipeTo(event.generator.writable);
}

Even though it is a minimal demo, the difference in complexity is considerable. Forcing track transferability requires cloning the camera track and a communication protocol between Worker and Window to keep the state consistent between them, all of which is unnecessary if the tracks can be kept on Window.

Note that this issue is not about advocating for removing track transferability. Use cases where transferring tracks to a worker makes sense can continue to be supported without problem even if this issue is addressed.

A runnable implementation of this demo (spec version) is available at https://guidou.github.io/mct/spec-main.html

Source code for the spec and proposed variants is available at https://github.com/guidou/guidou.github.io/tree/master/mct

@guidou
Copy link
Contributor Author

guidou commented Nov 19, 2024

@youennf
Copy link
Contributor

youennf commented Nov 19, 2024

How does it relate to #113?
Should we close one or the other?

@youennf
Copy link
Contributor

youennf commented Nov 19, 2024

Some quick notes:

  • The difference in code complexity is not that big in these two examples.
  • For stopping track, I would think the first example to be easier to handle: stopping the post transform track will stop the writable, which the worker is notified and would stop the original track.

@jan-ivar
Copy link
Member

Forcing track transferability requires cloning the camera track and a communication protocol between Worker and Window to keep the state consistent between them, all of which is unnecessary if the tracks can be kept on Window.

Transfer does not require cloning. It's the before/after use case showing the original camera track that needs it. Without that, no clone is needed (and one less track to stop).

How does it relate to #113?

This issue focuses on the web developer aspect only, phrased as a problem with the existing spec. Thanks @guidou!

  • The difference in code complexity is not that big in these two examples.

Agree. 25 vs 40 lines doesn't seem enough to justify a bespoke API.

  • For stopping track, I would think the first example to be easier to handle: stopping the post transform track will stop the writable, which the worker is notified and would stop the original track.

@youennf's tip in #117 (comment) lets us simplify further (32 lines):

// main.html
function setUpWorker() {
  const worker = new Worker("worker.js");
  worker.onmessage = async ({data}) => {
    processedVideo.srcObject = new MediaStream([data.track]);
    await processedVideo.play();
  }
}

async function startCapture() {
  const stream = await navigator.mediaDevices.getUserMedia({video:true});
  originalVideo.srcObject = stream.clone();
  await originalVideo.play();
  const [track] = stream.getVideoTracks();
  worker.postMessage({track}, [track]);
}

function stopCapture() {
  [originalVideo, processedVideo].forEach(v => v.srcObject.getTracks()[0].stop());
}

// worker.js
onmessage = async ({data: {track}}) => {
  const processor = new MediaStreamTrackProcessor({track});
  const generator = new VideoTrackGenerator();
  postMessage({track: generator.track}, [generator.track]);
  try {
    await processor.readable.pipeThrough(GetTransform()).pipeTo(generator.writable);
  } finally {
    track.stop();
  }
}

@guidou
Copy link
Contributor Author

guidou commented Nov 27, 2024

Transfer does not require cloning.
No one is saying that Transfer, which is a separate feature, requires cloning.
The argument is that use cases that require or prefer the tracks on Window (which are arguably the majority) and just want the worker for media processing, need to transfer the track because the API requires it, even though media processing does not need the track. In some cases, like before/after a clone is an additional requirement.
If a the use case requires just media processing on worker, there should not be an additional requirement to have da track in the worker.

It's the before/after use case showing the original camera track that needs it. Without that, no clone is needed (and one less track to stop).

This is a common use case, and there are many others that require or prefer the track on window and don't need or want a track in the worker. The current API shape forces a clone in such cases, which shouldn't be necessary.

  • The difference in code complexity is not that big in these two examples.

Agree. 25 vs 40 lines doesn't seem enough to justify a bespoke API.

You're including code that is common to both cases in your counts, which makes the relative difference look smaller.
If you count just the code related to track management, the relative difference is bigger.

Also, the difference in complexity is not just line counts, but also the complexity of the abstractions involved.
With the track only on Window, operations on the track are simple. You have the track, and you operate on it directly. No further abstractions are required.

With the track split on two execution contexts, what used to be just a track for the application has to be modeled differently. Now you have to think in terms of an abstract track of which there are two replicas running on two different execution contexts, so you need to keep track of all those objects and define a consistency protocol.
In a complex real-world application, this complexity compounds.
If you want to use multiple tracks, and do processing on different workers, it becomes even more complex.

Some use cases where this additional complexity is necessary might exist, but today there are a lot use cases where this extra complexity isn't needed and it just makes life difficult for developers.

@jan-ivar
Copy link
Member

jan-ivar commented Dec 2, 2024

This is a common use case,

Seems more common not to have a "before" preview: None of the big sites with video effects have it.

... and there are many others that require or prefer the track on window and don't need or want a track in the worker.

Which websites require the "before" video on window? How do you know what they "prefer"?

As #117 clarified, having a track tucked away in the worker using it is neither a burden nor a liability. It's quite clean: it lets the worker stop the source once piping ends. This cleanly reacts to downstream JS closing the "after" track, regardless of what thread it's on.

The current API shape forces a clone in such cases, which shouldn't be necessary.

A "before" preview element is a separate sink with potentially separate concerns. Using clone to separate those concerns should be encouraged. For instance, tying their lifespans together seems like a mistake.

In contrast, your alternative API seems flawed:

  1. poor encapsulation: the worker has no way to stop the gUM track it's consuming
  2. redundancy: the window has to manage two tracks, even when no "before" preview is needed
  3. tight coupling: if the website forgets to clone its "before" track then stopping it stops the "after" track as well

IOW leaving the window to manage the source discourages encapsulation and encourages repurposing without cloning, creating tight coupling that may surprise developers.

@jan-ivar
Copy link
Member

jan-ivar commented Dec 2, 2024

Let's compare without a "before" preview element, which seems more realistic. Current spec (22 lines):

start.onclick = async () => {
  const stream = await navigator.mediaDevices.getUserMedia({video: true});
  const [before] = stream.getVideoTracks();
  const worker = new Worker(`data:text/javascript,(${work.toString()})()`);
  worker.postMessage({before}, [before]);
  const {data: {after}} = await new Promise(r => worker.onmessage = r);
  video.srcObject = new MediaStream([after]);
  stop.onclick = () => after.stop(); 
};

function work() {
  onmessage = async ({data: {before}}) => {
    const {writable, track} = new VideoTrackGenerator();
    self.postMessage({after: track}, [track]);
    const {readable} = new MediaStreamTrackProcessor({track: before});
    try {
      await readable.pipeThrough(new TransformStream({transform})).pipeTo(writable);
    } finally {
      before.stop();
    }
  };
}

vs. bespoke API (17 lines):

start.onclick = async () => {
  const stream = await navigator.mediaDevices.getUserMedia({video: true});
  const [before] = stream.getVideoTracks();
  const worker = new Worker(`data:text/javascript,(${work.toString()})()`);
  const after = navigator.mediaDevices.createVideoTrackGeneratorAndProcessor(worker, before);
  video.srcObject = new MediaStream([after]);
  stop.onclick = () => {
    before.stop();
    after.stop();
  }
};

function work() {
  oncapturetransform = async ({processor, generator}) => {
    await processor.readable.pipeThrough({transform}).pipeTo(generator.writable);
  }
}

Two new redundant APIs (navigator.mediaDevices.createVideoTrackGeneratorAndProcessor and oncapturetransform) to trade 5 lines of code for an encapsulation problem does not seem like a win.

@guidou
Copy link
Contributor Author

guidou commented Dec 11, 2024

This is a common use case,

Seems more common not to have a "before" preview: None of the big sites with video effects have it.

Are you insinuating this use case shouldn't be supported?

... and there are many others that require or prefer the track on window and don't need or want a track in the worker.

Which websites require the "before" video on window? How do you know what they "prefer"?

All VC sites I know of allow you to switch between the processed and unprocessed track when configuring which effects you want to apply. They don't necessarily show both at the same time, but they allow you to switch between effects/no effects interactively and both tracks are on Window at the same time. Demos and tools for developing effects are more likely to show the before and after views simultaneously.

Also, all Web sites that do video processing for MediaStreamTracks do it with the track on Window. They keep both the processed and unprocessed tracks on Window, regardless of whether they show the unprocessed one or not.

Can you mention one that is doing it with the track on the worker, without at least having a clone on Window?
Or at least one that has expressed their preference to do it that way?

As #117 clarified, having a track tucked away in the worker using it is neither a burden nor a liability. It's quite clean: it lets the worker stop the source once piping ends. This cleanly reacts to downstream JS closing the "after" track, regardless of what thread it's on.

The current API shape forces a clone in such cases, which shouldn't be necessary.

A "before" preview element is a separate sink with potentially separate concerns. Using clone to separate those concerns should be encouraged. For instance, tying their lifespans together seems like a mistake.

That's at best an opinion unsupported by any evidence.

In contrast, your alternative API seems flawed:

  1. poor encapsulation: the worker has no way to stop the gUM track it's consuming

For the use case when the application wants to stop the gUM track from the Window, this is a better design. This covers all applications I'm aware of. Also, the proposal does not prevent a hypothetical application that wants to stop the gUM track from the Worker from transferring the track to the Worker and managing its lifetime here. I am just not aware of any such application.

  1. redundancy: the window has to manage two tracks, even when no "before" preview is needed

Why is managing one track on Window and one on Worker better, especially if all the application logic is on Window?

  1. tight coupling: if the website forgets to clone its "before" track then stopping it stops the "after" track as well

If that's what the application wants I don't see why it should be forbidden. My proposal does not prevent cloning and transferring. It just doesn't force it. Your argument seems to be that forcing it is a good idea.

IOW leaving the window to manage the source discourages encapsulation and encourages repurposing without cloning, creating tight coupling that may surprise developers.

That contradicts real world applications, where all the track lifetime management and other logic is on Window and the worker is used only for processing logic.

@jan-ivar
Copy link
Member

jan-ivar commented Jan 7, 2025

Are you insinuating this use case shouldn't be supported?

No, I'm saying it's one of several use cases, is already supported, and therefore doesn't need its own redundant API.

The argument for a dedicated API seems to rest solely on the existing API creating significant hardship. I believe I've shown with multiple working demos and shims in this issue and #113 that the difference in behavior is quite small and manageable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants