-
Notifications
You must be signed in to change notification settings - Fork 5
how would this work on non-video-mixed devices? #1
Comments
Correct - for initial prototype, we wanted to explore the best way to surface the required information cross-processes internally to see if we run into any issues & check what are the constraints we have. The API shape is still very much TBD (there are other issues with it as well related to the texture's lifetime). I took a look at the article, I really like the idea of using an Anchor to work around the asynchronous access issue. Let me see how to best incorporate it in the proposal. |
HoloLens 2's native support for this kind of thing is generally done with the developer starting from the media API to capture photos/videos, and we then annotate individual frames with the view/projection matrices. The front-facing photo/video camera on HoloLens 2 happens to have the same framerate as the app's render rate, though there is no promise that camera frames arrive synchronized with the time at which we wake up the app. In fact, because of forward head prediction, the target pose of the views in any given frame could never match the pose at which a camera frame is being captured, since the head is not there yet! We have an interesting design problem here to align an API across two scenarios:
For the latter, I would expect to augment something like partial interface XRSession {
XRCameraPose? getCameraPose(ImageBitmap bitmap);
};
[SecureContext, Exposed=Window] interface XRCameraPose {
readonly attribute Float32Array projectionMatrix;
[SameObject] readonly attribute XRRigidTransform transform;
}; We would likely need an additional mechanism when the app spins up that |
Do we think it would be possible to come up with an API shape that caters well both synchronous and asynchronous scenarios? It seems to me that the synchronous scenario could fit into the asynchronous model but would be weakened by it (in sync model we deliver a frame within rAFcb & the app could use it when rendering the scene since the frame would be animated). In general, it seems that we have 3 cases regarding frame rate:
We also have to consider the time delta between the camera and animation frame for which that camera frame is relevant: Can that delta ever be negative, i.e. is it ever possible for the animation frame to be delivered before we would be able to deliver the camera image to the app? If "no", then converting from asynchronous model to synchronous API shape seems to be possible - we would always deliver the most recent camera image that we have to the rAFcb. For case 2) it means we're dropping some camera frames, and for case 3) it means some rAFcbs won't get a camera image. We will also need to somehow communicate the |
Based on today's IW call, looks like that's actually a "yes" and I did not understand your comment about forward head prediction. |
Pinging @Manishearth, @blairmacintyre and @cabanier who may have thoughts here... There are two key kinds of scenarios I know of here for these images on AR devices:
In practice, apps doing rendering effects really need to ensure they condition those effects to only kick in for views with an environment blend mode of Given that, perhaps it's not so bad for rendering effects and computer vision to be handled two different ways in the API?
Each scenario is then consistent across the devices where it applies. Thoughts? |
Is it fair to say that the main problem with the currently proposed API is that it may force some implementations to deliver camera images that are slightly out of date, but since they are available on partial interface XRFrame {
// Contains all the cameras, including the ones that are already
// exposed on XRViews (those would align exactly with the XRViews).
readonly attribute FrozenArray<XRCamera> cameras;
XRCameraPose getCameraPose(XRCamera camera, XRReferenceSpace space);
};
interface XRCameraPose : XRPose{
// Interits all the goodness from XRPose, including velocities.
// For cameras exposed on the views, the pose relative to viewer space
// pose would be identity, and `time` matches the XRFrame time.
readonly attribute DOMHighResTimeStamp time;
}; This would be a non-breaking change - the existing users of raw camera access API would use the XRView variant that offers stronger guarantees about timings (the |
On headsets, it would be wrong to use the primary view view and projection matrix - the app must use not just the camera-specific view pose, but also a camera-specific projection matrix. For example, HoloLens 2 has a wildly different photo/video camera FOV vs. primary view FOV. If the app does any rendering or CV on the camera image using the primary view's FOV, it will be completely wrong, and so the app would also need to grab a projection matrix from For devices where the system's XR render cadence and the camera's capture cadence accidentally align, the approach you suggest above could work if we add
One interesting thing to note about your latest proposal here is that it still offers apps two alternate paths to the //////
// For temporally and spatially view-aligned camera images:
//////
partial interface XRView {
// Non-null iff there exists an associated camera that perfectly aligns with the view:
[SameObject] readonly attribute XRCamera? camera;
};
interface XRCamera {
// Dimensions of the camera image:
readonly attribute long width;
readonly attribute long height;
};
partial interface XRWebGLBinding {
// Access to the camera texture itself:
WebGLTexture? getCameraImage(XRCamera camera);
};
// TBD mechanism for app to opt out of automatic environment underlay if app is rendering full-screen effect already
//////
// For asynchronous media-based camera images:
//////
// Mechanism for app to opt into XR hologram overlay and/or XR poses
// for a given XRSession during MediaDevices.getUserMedia(constraints):
dictionary XRMediaTrackConstraints : XRMediaTrackConstraintSet {
sequence<XRMediaTrackConstraintSet> advanced;
};
dictionary XRMediaTrackConstraintSet {
// Enable secondary view for XRSession to enable system-composited hologram overlay:
ConstrainBoolean xrSecondaryViewOverlay;
// Enable getting per-frame camera view/projection matrices:
ConstrainBoolean xrPoses;
};
partial interface XRSession {
XRCameraPose? getCameraPose(ImageBitmap bitmap);
};
interface XRCameraPose {
readonly attribute Float32Array projectionMatrix;
[SameObject] readonly attribute XRRigidTransform transform;
}; With either your proposal or this proposal, an app that just powers ahead to fetch With your proposal, we could encourage engines to generally ignore the per-view approach and enumerate
Generally, I'm a huge fan of us fully unifying a given WebXR scenario across phones, tablets and headsets to enable maximum content compatibility! However, across the three scenarios above, only scenario 1 benefits from the per- |
At one point in the distant past, we talked about having an async API for delivering frames, but explicitly guaranteeing that if the frames were synchronous with the view (smartphones) that would be delivered before the rAFcb ... each frame would have "all the info" (timestamp, view, projection, ability to request gl texture or bytes). There would likely need to be a way for the app to determine this was happening (e.g., a capability or property?). It has it's downsides, but it might simplify things. |
Given how many variables the UA and the author need to account for (framerate, fov, camera framerate, camera characteristics, etc), wouldn't it be better to not spend too much time on raw camera access but instead build out native APIs for computer vision? In addition, the number one complaint with WebXR is performance. If we push CV and sync compositing to Javascript, I doubt that authors can build good experiences on immersive devices that are built on mobile technologies. |
Based on the discussion, I think the best way to proceed would be to plan on having 2 distinct API shapes that are split based on the scenarios they are solving. In Chrome for Android, we will pursue the per- |
I think this was largely covered during the last IW call - the need for a way of accessing the camera API is there and the consumers of the API seem to be willing to pay the performance cost for this. That is not to say that we can't work on the native APIs for CV, but the problem with that approach is that without the camera access API, we're preventing people from innovating and trying various things that we may not even think about now (I seem to recall "shoe detection" being brought up? 😃). As a bonus, we could influence what we're working on based on what solutions crop up in the wild - if it turns out some scenarios / algorithms are very common, we can try to make them a part of the platform. |
I see. So is this just an API for experimentation and not meant to ship out to the general public? |
It is meant to be shipped out to the general public, who will then amaze us with what they are able to come up with, and hopefully will give us a chance to learn from those experiments & influence what CV algorithms we can then attempt to standardize. At this point, based on what @nbutko shared during the last IW call, I'm worried that we're hurting the developers by not giving them access to the camera pixels in a way that could be decorated with information coming from WebXR. And, given that we prioritized & launched the privacy-preserving APIs first, the developers should be able to pick those and only choose raw camera access API once they are forced to. |
Now that the discussion slowed down a bit and the dust settled, I think this will be a good moment to archive this repository and move future discussions to https://github.com/immersive-web/raw-camera-access/ repo. |
Devices that do not have a 1:1 mapping of camera frames to XR frames cannot implement this synchronous API approach, so it's only implementable on handheld AR (e.g., phones, tablets).
VR and AR HMDs that have cameras do not run the camera at the same frame rate as the XR API.
See, for example, the discussion of the implementation we did 2 years ago of an asych API in the WebXR Viewer (https://blog.mozvr.com/experimenting-with-computer-vision-in-webxr/).
Also, see the discussion of why asynchronous access is needed in the CV repo you link to in the explainer. There is nothing to prevent promises from resolving immediately on a handheld platform, but any WebXR camera API needs to support all devices.
The text was updated successfully, but these errors were encountered: