Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genlock returning previous frame instead of latest frame #9528

Closed
xc-racer99 opened this issue Jul 29, 2021 · 34 comments
Closed

Genlock returning previous frame instead of latest frame #9528

xc-racer99 opened this issue Jul 29, 2021 · 34 comments

Comments

@xc-racer99
Copy link

Required Info
Camera Model D435
Firmware Version 05.12.14.50
Operating System & Version Windows 10
Platform PC
SDK Version 2.47.0
Language C
Segment Robot

Issue Description

While attempting to use genlock to trigger frames, we are finding that a frame gets stuck in the pipeline somewhere, eg we get the previous frame as opposed to the current one. Reproducible test case, with inter cam sync mode set to 4:

  1. Send rising edge to camera with no hand in front of camera
  2. Put hand in front of camera and send rising edge - displayed frame does not show hand (ie it is the previous frame from step 1)
  3. Remove hand from in front of camera and send rising edge - displayed frame shows hand (ie it is the previous frame from step 2)

This has been reproduced on several cameras with librealsense versions 2.36 and 2.47 with various firmware versions in both our custom software (using rs::pipeline) as well as in the Realsense Viewer with inter-cam sync mode set to 4.

Is there some internal buffering done in librealsense that can be adjusted to prevent this from happening?

@MartyG-RealSense
Copy link
Collaborator

Hi @xc-racer99 As a starting point in investigating your case, could you provide information about the questions below, please?

  • How are you generating the trigger signal, please - with a Master camera whose Inter Cam Sync Mode is set to '1' or with a signal generator device?

  • When you refer to sending the rising edge to camera, do you mean transmitting the trigger pulse to the slave camera?

  • In genlock mode 4, the slave camera waits indefinitely for a trigger pulse until it receives one and then captures upon receiving the trigger. Are you holding your hand up in front of the camera and then initiating a trigger pulse with the expectation that it will result in an image of the currently held-up hand being captured?

  • What FPS speed do you have the slave camera set to?

@xc-racer99
Copy link
Author

  • How are you generating the trigger signal, please - with a Master camera whose Inter Cam Sync Mode is set to '1' or with a signal generator device?

With a signal generator device, basically a 1.8V signal into pin 5.

  • When you refer to sending the rising edge to camera, do you mean transmitting the trigger pulse to the slave camera?

That's correct.

  • In genlock mode 4, the slave camera waits indefinitely for a trigger pulse until it receives one and then captures upon receiving the trigger. Are you holding your hand up in front of the camera and then initiating a trigger pulse with the expectation that it will result in an image of the currently held-up hand being captured?

Yes, that's what the expectation is. Result is that we get an old frame displayed, ie the one that was previously captured. If you set the genlock mode 5, then the second image that gets pushed through has the hand (ie first one won't, but second one of the set of two does). It looks as if there is a buffer of one frame somewhere.

  • What FPS speed do you have the slave camera set to?

90fps at 848x480. It happens whether we have the RGB stream running or not.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 31, 2021

I recall the case #8313 in which a RealSense user had frame skips when using genlock mode '4'. They experienced it with both a master camera ('1') and with an external microcontroller trigger device. They tried it with different cameras and found that the problem only occurred with one particular camera, leading them to believe that the problem might be with the camera rather than the sync setup. They also set their camera's depth stream to 848x480 at 90 FPS in their scripting.

If your external trigger is configured for 30 FPS, do you experience any improvement in results with mode '4' if you use 60 FPS for the slave instead of 90?

It is also worth mentioning that the External Synchronization (Genlock) system is considered 'unvalidated' by Intel, meaning that whilst you are free to experiment with it, it is not officially 'validated and mature' like the original multi-camera hardware sync system (modes 1 and 2 only) in the link below is.

https://dev.intelrealsense.com/docs/multiple-depth-cameras-configuration

@agrunnet
Copy link
Contributor

agrunnet commented Aug 1, 2021

@xc-racer99 also please check your LibRS queue size setting. The chip itself has no buffer so captures and reads after trigger. But on SW side there could be a buffer. So if you stream only depth then queue should be 1. If you stream depth and color you can set it to 2. Then you should get immediately the latest.

@xc-racer99
Copy link
Author

@xc-racer99 also please check your LibRS queue size setting. The chip itself has no buffer so captures and reads after trigger. But on SW side there could be a buffer. So if you stream only depth then queue should be 1. If you stream depth and color you can set it to 2. Then you should get immediately the latest.

@agrunnet Are you referring to RS2_OPTION_FRAMES_QUEUE_SIZE? I will test this explicitly in our custom app, right now we are leaving the default (1 I believe?). Testing with 2 (while the colour stream is running) or 1 (without colour stream running) in our app shows the same behaviour.
Since we're using rs::pipeline it should have a queue size of 1 assuming https://github.com/IntelRealSense/librealsense/wiki/Frame-Buffering-Management-in-RealSense-SDK-2.0 is up to date. I assume the realsense viewer also uses pipeline internally?

I recall the case #8313 in which a RealSense user had frame skips when using genlock mode '4'. They experienced it with both a master camera ('1') and with an external microcontroller trigger device. They tried it with different cameras and found that the problem only occurred with one particular camera, leading them to believe that the problem might be with the camera rather than the sync setup. They also set their camera's depth stream to 848x480 at 90 FPS in their scripting.

This has been seen on all the cameras (~5) that we have tested, so it's not just one camera.

If your external trigger is configured for 30 FPS, do you experience any improvement in results with mode '4' if you use 60 FPS for the slave instead of 90?

The external trigger isn't configured for any particular frame rate, we're trying to use it on-demand while making sure we don't trigger too often. Setting the genlocked camera to 30fps instead of 90fps also shows the same issue.

Note that we also have the IR stream enabled, it shows the same problem as the depth stream. Is there anything internal to rs2::pipeline that could be delaying frames which normally doesn't matter/isn't noticeable when free-running?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Aug 4, 2021

My understanding is that if you are using an Inter Cam Sync Mode value of 4 or higher then the slave camera will be in Genlock Slave mode, where depth is hardware synced but RGB is not (as RGB sync is part of sync mode 3, Full Slave).

It is also my understanding that Infrared should not be affected by hardware sync, as sync usually only involves depth streams (and additionally RGB in mode 3).

In general, you may experience an FPS drop if you are streaming both depth and RGB at the same time with Auto-Exposure enabled and the separate option Auto-Exposure Priority also enabled. So you could test disabling Auto-Exposure Priority under the 'Controls' sub-section of the Viewer's RGB side-panel controls whilst leaving Auto-Exposure enabled and see if it has any effect on performance. Auto-Exposure Priority is disabled when you left-click on the blue box beside it (indicating an On state) to change the box color to black (Off).

image

@xc-racer99
Copy link
Author

My understanding is that if you are using an Inter Cam Sync Mode value of 4 or higher then the slave camera will be in Genlock Slave mode, where depth is hardware synced but RGB is not (as RGB sync is part of sync mode 3, Full Slave).

My understanding too :) This is confirmed by the fact that using wait_for_frames() with the RGB stream enabled is always sending a fresh supply of RGB frames while the depth frame stays the same frame number until triggered, as checked using metadata.

It is also my understanding that Infrared should not be affected by hardware sync, as sync usually only involves depth streams (and additionally RGB in mode 3).

This is not my understanding, nor what I have seen in practice. In practice, I'm seeing the frame number/frame change for the infrared stream only when triggered, the same as the depth stream. Is the depth stream not created from the infrared streams?

In general, you may experience an FPS drop if you are streaming both depth and RGB at the same time with Auto-Exposure enabled and the separate option Auto-Exposure Priority also enabled. So you could test disabling Auto-Exposure Priority under the 'Controls' sub-section of the Viewer's RGB side-panel controls whilst leaving Auto-Exposure enabled and see if it has any effect on performance. Auto-Exposure Priority is disabled when you left-click on the blue box beside it (indicating an On state) to change the box color to black (Off).

We don't have Auto-Exposure enabled.

Maybe the attached images will show the issue better. These are three consecutive triggers, with the RGB image being saved whenever the depth frame number increased. The RGB shows what was actually in front of the camera, while the depth and infrared appear to be delayed by a singular frame. Note that RGB imagery is also put through rs::align (hence why we have both aligned and non-aligned versions), but I have verified that the issue is present even when not using the frame aligner.

Genlock-Frame-Delay.zip

@MartyG-RealSense
Copy link
Collaborator

The depth frame is generated by the camera hardware using the raw left and right infrared frames captured by the camera before data is transmitted along the USB cable to the computer / computing device.

However, the depth stream in the 'High Level API' used by librealsense programs is not reliant on the 'Infrared' and 'Infrared 2' streams being enabled at the same time. This is demonstrated by how depth can be streamed in tools such as the RealSense Viewer even if all other stream types are not enabled.

The difference between low-level (hardware level) and high-level API operations is described in the SDK's API Architecture documentation linked to below.

https://dev.intelrealsense.com/docs/api-architecture

In regard to your second question about the three consecutive triggers, @agrunnet will be able to answer that question better than myself.

@xc-racer99
Copy link
Author

So I've done some experiments, and I am not noticing the same issue on Ubuntu 20.04 - here the latest image is in fact the latest image and colour lines up with depth. Attached is a simple reproduction case in c++.

rs-save-to-disk.zip

@MartyG-RealSense
Copy link
Collaborator

So the original problem with the current frame not being updated was on Windows, but you found that the same problem did not occur in Ubuntu 20.04.

Assuming that the Windows version of the SDK was installed with the installer file from the SDK Releases page, which installation method did you use to install Ubuntu 20.04 please?

@xc-racer99
Copy link
Author

So the original problem with the current frame not being updated was on Windows, but you found that the same problem did not occur in Ubuntu 20.04.

Yep, that's correct.

Assuming that the Windows version of the SDK was installed with the installer file from the SDK Releases page, which installation method did you use to install Ubuntu 20.04 please?

I followed the direction from https://github.com/IntelRealSense/librealsense/blob/master/doc/distribution_linux.md - it was a fresh install of Ubuntu 20.04.02, installed a supported kernel (v5.4.x IIRC) so the DKMS patches worked and we had metadata support, and then built my test program. The Windows was indeed from the SDK Releases page, v2.47.

@MartyG-RealSense
Copy link
Collaborator

It may therefore be worth checking if you have metadata support enabled in the Windows version. You can do this on Windows through the RealSense Viewer tool. If you launch the Viewer with a camera plugged in then a pop-up box may automatically appear in the top corner stating that frame metadata is disabled and asking if you want to enable it. You can then click the Enable button on the pop-up to enable metadata support for that camera.

image


The other method of enabling metadata support on Windows is to use the instructions in the link below to build the Windows version of the SDK from source code and then edit the Windows registry.

https://github.com/IntelRealSense/librealsense/blob/master/doc/installation_windows.md

The registry editing method of adding metadata support predates the introduction of the easier method of enabling it in the Viewer. Editing the Windows registry should also only be done if you are confident in registry editing due to the risk of breaking Windows if a mistake is made.

@xc-racer99
Copy link
Author

Yep, metadata is definitely enabled on Windows and on Linux - the sample reproduction code posted above will crash if metadata isn't enabled (because of the colour stream, it is being used to only save frames when the frame counter increases and not each time a new colour frame arrives).

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Aug 6, 2021

Does the Windows version still not work if you disable the counter-increase check and let it save frames every time a new color frame arrives?

My thinking behind this is that there are events that can cause the frame counter to reset to zero, meaning that a frame-save could potentially be missed when that occurs if an increase in the count value is required to trigger the mechanism. The link below describes some of the events that can cause a frame counter reset.

#7819 (comment)

Another cause of frame counter reset can be in a multi-camera hardware sync setup. If the sync cables are longer than 3 meters and do not have electrostatic discharge (ESD) protection components built into the cable then an ESD discharge (such as static electricity) could reset the counter. Though ESD and ESD protection components are described in the External Synchronization white-paper document, the original (mode 1 and 2) hardware sync white paper provides additional commentary about this phenomenon:

"It is very important to note that if proper care is not taken, it has been observed that ESD / EMI events (example: static electricity) can cause the frame counters to reset even though streaming will continue. For syncing of depth only, this counter reset can mostly be ignored. However, in some cases the third color imager (the RGB camera) has been observed to freeze up during these ESD events. The D435 has been observed to be more robust to this issue, than the D415 under the same conditions".

@xc-racer99
Copy link
Author

Does the Windows version still not work if you disable the counter-increase check and let it save frames every time a new color frame arrives?

My thinking behind this is that there are events that can cause the frame counter to reset to zero, meaning that a frame-save could potentially be missed when that occurs if an increase in the count value is required to trigger the mechanism. The link below describes some of the events that can cause a frame counter reset.

I don't think this would change anything, as we're not explicitly looking for an increase in the frame number, but rather for a change. It successfully detects the frame counter resets as we've previously had issues with the frame counter resetting :) But that's distinct from this issue and was related to our electrical setup.

Additionally, I've now tested saving the frameset that arrives due to the new colour frame after the frame number has changed, but this also gives the same erroneous result.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Aug 7, 2021

I reviewed the case from the beginning again and recalled a phenomenon known as stale frames, where the timestamp of the acquired frame can be behind wait_for_frames()

#7837 (comment)

Another factor that could be considered is the difference between the backends in Windows and in Linux. By default, the Windows version of the SDK uses a backend based on Windows Media Foundation, whilst Linux is based on the V4L2 backend.

#6300 (comment)

#6300 (comment)

Changing the backend of the Windows version of the SDK from Windows Media Foundation to a UVC-compatible one may involve building it from source code instead of using the automated installer, and using the CMake build flag FORCE_RSUSB_BACKEND

https://dev.intelrealsense.com/docs/build-configuration
https://github.com/IntelRealSense/librealsense/blob/master/doc/installation_windows.md

image

@xc-racer99
Copy link
Author

Another factor that could be considered is the difference between the backends in Windows and in Linux. By default, the Windows version of the SDK uses a backend based on Windows Media Foundation, whilst Linux is based on the V4L2 backend.

Yes! This is great, I had no idea this other option existed. It does work as expected (ie the same as Linux's V4L2 backend), the frames line up properly. I did have to make a change such that the frame counter didn't reset all of the time due to timeouts when using genlock mode.

So I guess that means this is a bug with Media Foundation backend?

diff --git a/src/uvc/uvc-streamer.cpp b/src/uvc/uvc-streamer.cpp
index 678acbc..c428f53 100644
--- a/src/uvc/uvc-streamer.cpp
+++ b/src/uvc/uvc-streamer.cpp
@@ -100,7 +100,7 @@ namespace librealsense
              {
                  _action_dispatcher.invoke([this](dispatcher::cancellable_timer c)
                    {
-                       if(!_running || !_frame_arrived)
+                       if(!_running || !_frame_arrived || true)
                            return;
 
                        LOG_ERROR("uvc streamer watchdog triggered on endpoint: " << (int)_read_endpoint->get_address());

@MartyG-RealSense
Copy link
Collaborator

Hi @xc-racer99 Do you require further assistance with this case, please? Thanks!

@xc-racer99
Copy link
Author

Hi @xc-racer99 Do you require further assistance with this case, please? Thanks!

Yes please, due to the issues surrounding multicam with the RSUSB backend, we'd like to get this working with the main Media Framework backend.

@MartyG-RealSense
Copy link
Collaborator

If your concern is about previous mentions that RSUSB is suited to single-camera applications, this situation was changed at the start of 2020 in SDK version 2.35.2 when improvements to multicam were incorporated into the SDK.

#6467

@xc-racer99
Copy link
Author

If your concern is about previous mentions that RSUSB is suited to single-camera applications, this situation was changed at the start of 2020 in SDK version 2.35.2 when improvements to multicam were incorporated into the SDK.

#6467

Yes, I noticed that, but there's still (or again?) some difficulties, eg with librealsense 2.47 the rs-multicam example will crash out of the box. Applying #8046 prevents these crashes, but there's still issues such as #6921 where the connected/disconnected events aren't always sent (decreasing the timeout makes this less likely, but there's still the possibility of missing an event). There's also the difficulty of having to install the Win7 drivers.

@MartyG-RealSense
Copy link
Collaborator

I conducted extensive further research about your question above but could not find much that I could suggest that would improve your current situation described above, unfortunately.

Of the information that I did find, a RealSense team member provides advice in the link below that adds to the knowledge that you gained from the links that you have already quoted above.

#9157 (comment)

@xc-racer99
Copy link
Author

So I started diving into the Media Foundation backend, as the instability of the RSUSB backend was simply too much. I'm thinking that it is Window's UVC implementation that is buffering the frames, not librealsense. This is based on the fact that the OnReadSample (https://github.com/IntelRealSense/librealsense/blob/master/src/mf/mf-uvc.cpp#L167) callback isn't being called when the first set of frames should be arriving. Only after the second trigger does it get called (and this time with the first set of frames). It's not missing calls to ReadSample - I tried spawning a thread that basically spammed calls to ReadSample with no result. I also tried using the ReadSample directly without setting the callback (ie not setting MF_SOURCE_READER_ASYNC_CALLBACK) as described in https://docs.microsoft.com/en-us/windows/win32/api/mfreadwrite/nf-mfreadwrite-imfsourcereader-readsample but it appeared that there was buffering for the synchronous path as well.

At this point I'm rather out of ideas on what to try, it's clearly an issue with the Media Foundation (or some place that the docs for it are inaccurate, leading to a subtle bug in librealsense) but I haven't been able to find any tunable or anything to change.

@MartyG-RealSense
Copy link
Collaborator

Hi @agrunnet Do you have any thoughts about the statement of @xc-racer99 above regarding the Meda Foundation backend and use of it with genlock, please?

@ev-mp
Copy link
Collaborator

ev-mp commented Aug 24, 2021

@xc-racer99 , this is a known bug in MediaFoundation (MF) that it starts passing frames to application only after the 2nd frame arrives.
And your finding refines it - the MF actually buffers the frames internally and "ejects" them one by one with one frame latency, e.g when frame N arrives, the MF sends you frame N-1 and so on (kind of double-buffer).
Obviously, this is hard to notice when you get a continuous 30fps stream, but with "frame on-demand" generation this would definitely pose an issue.

This is also consistent with the USB-Host latency we profiled with a dedicated SDK tool: with Linux we measure ~80msec for Device->USB->Kernel->Application data propagation pipe, while with Windows (MF) it is about 110msec when streaming at 30 fps.

One way I may suggest is to circumvent it by "shooting twice" - e.g. send two GENLOCK events to generate pair of frame, then read both of them but use only the second one. It may provide a sufficient w/a for your use-case.
The other alternatives are RSUSB or switching to Linux

@xc-racer99
Copy link
Author

@xc-racer99 , this is a known bug in MediaFoundation (MF) that it starts passing frames to application only after the 2nd frame arrives.
And your finding refines it - the MF actually buffers the frames internally and "ejects" them one by one with one frame latency, e.g when frame N arrives, the MF sends you frame N-1 and so on (kind of double-buffer).
Obviously, this is hard to notice when you get a continuous 30fps stream, but with "frame on-demand" generation this would definitely pose an issue.

Yep, exactly what's being seen here. Kind of glad it's not genlock specific and kind of not :)
I've also come across EnableDependentStillPinCapture and UvcFlags (as described at https://docs.microsoft.com/en-us/windows-hardware/drivers/stream/providing-a-uvc-inf-file) for enabling "Method 2 or Method 3 still capture support" and UvcFlags is a bitmask for various quirks. I'm guessing that based on how librealsense is written the alternate still capture support wouldn't be used, even if the hardware supported it (do you know if it does)? I tried adding this registry entries in the same place as MetadataBufferSizeInKB - but I'm not sure if this is correct. It made no difference.

One way I may suggest is to circumvent it by "shooting twice" - e.g. send two GENLOCK events to generate pair of frame, then read both of them but use only the second one. It may provide a sufficient w/a for your use-case.

We've been trying a variant of this (genlock mode=5, so two frames are captured per input pulse), but it's difficult to line up both the IR and depth frames reliably - we periodically get one from the first trigger/a stale frame in our pipeline. We've been attempting to use the frame counter metadata, but it seems that depth lags more than IR, so we often end up with no valid matching frames while using an rs::pipeline. Additionally, it seems that frame counter wise, the left IR is always one ahead of the depth (depth frame N matches with IR frame N + 1). This did not appear to be the case when we were single triggering (genlock mode = 4). Sending two triggers might be a possibility and alleviate some of these issues, thanks for the idea.

The other alternatives are RSUSB or switching to Linux

Sadly, we're way to far into this to switch the OS and RSUSB is too unstable with multiple cameras/disconnection events to be reliable.

@MartyG-RealSense
Copy link
Collaborator

Hi @xc-racer99 Do you have an update that you can provide about this case, please? Thanks!

@xc-racer99
Copy link
Author

No, no real update at this point. Frames appear to line up properly when using single-trigger twice, but haven't been able to integrate it into our system for complete testing yet.

@MartyG-RealSense
Copy link
Collaborator

Okay, thanks very much @xc-racer99

@xc-racer99
Copy link
Author

xc-racer99 commented Sep 10, 2021

So here's my update (note where not stated, these refer to depth frame numbers):

Genlock=4

  • First trigger - nothing happens
  • Second trigger - nothing happens
  • Third trigger - first frame arrives
  • 4+ trigger - 2+ triggered frame arrives

Genlock=5

  • Depth frame 1 is actually the second frame from the first trigger set
  • This is matched with IR frame 2 (usually)
  • Frame counters usually then stay off by one for the same timestamp, but occasionally more depth frames are missed (ie the first frame isn't there, but the second one is)
  • Depth frame 2 is in fact the first of the pair of frames captured by the second trigger (ie depth frame 1 was the second frame from the first trigger)

Genlock=6

  • First trigger - depth frame 1 and IR frame 3 (this is a matched TS pair)
  • Second trigger - depth frame 4 and IR frame (also a matched pair)
  • etc

So with genlock=5, we can more or less get what we want. However accounting for the random times that the first frame fails to capture is difficult.

I have also come across genlock =259 and 260, where one of the frames as the emitter on and the other off. However, I've been unable to make this work in practice, by reading RS2_FRAME_METADATA_FRAME_EMITTER_MODE it seems to be rather random if the emitter is actually on or not. Is there any documentation on what other settings need to be done for this to work properly?

@MartyG-RealSense
Copy link
Collaborator

You could control the alternating emitter on-off function outside of the genlock system by using scripting to set the instruction RS2_OPTION_EMITTER_ON_OFF to '1' (enabled).

#9450

This mode though can also be prone to erratic results.

@MartyG-RealSense
Copy link
Collaborator

Hi @xc-racer99 Do you require further assistance with this case, please? Thanks!

@xc-racer99
Copy link
Author

We appear to have found a workaround for our specific use case, where we set the camera to master (ie outputting a signal) and using that for synchronization. It means we can't get images exactly when we want them, but it's good enough and way more reliable.

The root issue here is still present, but I'll leave it up to you if you want to close this or not.

@MartyG-RealSense
Copy link
Collaborator

Hi @xc-racer99 It's great to hear that you found a workaround! As suggested, I will close the issue. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants