-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rs2::pipeline::stop freezes/hangs/deadlocks sometimes #9184
Comments
Hi @svebert Ideally a modern firmware would be used with SDK 2.42.0, but I understand the challenge of updating hundreds / thousands of cameras. As a starting point in investigating your case, could you follow the steps below please. STEP ONE If only one camera is plugged in then there should be only two drivers under the Cameras category of the Device Manager: RGB and Depth. There may be a multitude of hidden copies of RealSense drivers that are revealed by the View option though. STEP TWO STEP THREE When all RealSense drivers have been removed, unplug the camera from the USB port, wait a couple of seconds and plug it back in. Windows should automatically reinstall the RGB and Depth driver, and the Device Manager should then correctly show only one pair of drivers. If there were multiple RealSense drivers revealed by the View option, does removing them improve the stability of your Windows test computer please? |
Hi MartyG-RealSense! I run through all steps. Unfortunatly this did not change the behaviour. After approx 6 restarts, the program hangs in oPipeline.stop() |
I ran some further tests using the Device Manager. I found that the drivers disappear in the Device Manager if a Hardware Reset of the camera is performed in the RealSense Viewer . They should then return automatically to the Cameras category of the Device Manager after the reset. A reset of the camera has the same effect as unplugging it from the USB port and re-inserting it. I found though that with one of the USB ports, if the Device Manager is open during the reset then the drivers would not reappear and the camera could not complete its reset in the Viewer - it would disappear from the options side panel but not return after reset. It would only reset successfully if the Device Manager was closed during the reset. The other USB 3 port on the same computer had no such problems though, with the camera returning in both the Device Manager and Viewer side-panel. |
Hi Marty, to state it clear. My problem is not, that the cameras appear or disappear in the device manager. The major problem is, that if this happens, the call of rs2::pipeline::stop can enter a deadlock state. And this is a state, which I cannot recover from. In the end, the whole application has to be killed and restarted. |
My understanding is that if the camera disconnects during an active stream then the pipeline can automatically recover and continue without having to restart the pipeline if reconnection occurs within 5 seconds. I have seen some past cases where an application freezes when the pipeline is closed. A solution that worked for some cases was to start the pipeline and stop the sensor instead of stopping the pipeline. I see that you have already tried this though. Could you test whether it makes a difference if you use a pipeline.Close() instruction after pipeline.stop() please? |
In the C++ API the pipeline object has no Close() member |
In March 2021 a fix was added to the SDK for an infinite freeze after close problem with T265 that was very similar to yours. The fix was to perform a short sleep period before stopping the pipeline.
|
Hi Marty, I just tried what you suggested above. Unfortunately, the fourth restart did result in the described deadlock.
|
There has also been reports this year of a memory leak each time the pipeline is stopped. Conceivably this could lead to a freeze of the program after multiple stops if the computer's available memory capacity gets used up. This issue seemed more likely to occur on camera models with an IMU though (D435i / D455). Would it be possible to monitor memory usage in the Task Manager interface of Windows (under its Performance tab) and see if available memory reduces significantly after each close? |
Hi Marty-G, I can't observe any memory leak. Occupied RAM is steady. |
I have seen some cases where a program fails if a break is included but works fine if there is not a break. Could you test whether it makes a difference if the break in your script is removed:
|
Hi Marty!
Here I could not observe the deadlock problem. But it is not a solution for me, because I want to restart the pipe, if an error occured in I changed my example a little bit and did not break the loop on the first error You mentioned a "Hardware reset" above. How could I trigger a hardware reset before the call of oPipeline.stop()? |
The Python discussion in the link below explores testing a connection and initiating a hardware reset if Frame didn't arrive within 5000 occurs because of a freeze. |
Hi Marty, I did further testing: If I call the following block
before the I am still testing, what happens, if I put this block after the oPipeline.stop() call. |
Thanks very much @svebert - I look forward to hearing your test results from putting the block after oPipeline.stop() |
The test did run over night. Unfortunately it again did freeze in the oPipeline.stop() function. So this does not help. |
I looked through your script again. I am not aware of a past situation in which a letter has been placed in the wait_for_frames bracket (5000U). What is the reason for the 'U' please, and what happens if the U is removed? |
The U stands for unsigned integer. There is no "U" passed into this function at runtime. It is just the C++ language, how you tell the compiler what data type the number before the letter is. There are also other valid letters, like "L" for long |
Are you using the official 1 meter USB cable supplied with the camera or a longer cable of your own choice? I noted the mention of 'a fully USB3 compliant cable', which made me wonder about this. |
It is 1m USB3 cable (it is not the supplied cable). But we tested a lot of cables and the cable is here not the problem. |
I located a past C++ case on Windows where a camera would become unresponsive and the only way to correct it was to unplug-replug the camera or to completely power off and reboot the PC with the power button (restarts did not work). In that case, I provided a link to information for using a Microsoft tool called DevCon to reset the entire USB port on a Windows computer instead of just resetting the camera. Resetting the entire port means that it is not necessary to detect the camera in order to do so. |
Hi @svebert Do you require further assistance with this case, please? Thanks! |
The problem is not resolved, yet. I still wanted to test a variation of sleep + hardware-reset before and after the stop(). |
Okay, thanks for the update. Please do provide your tests results once you have them. Good luck! |
@svebert I had a similar problem with pipeline.stop() hanging but in the case of corrupted bag files. I found that doing a hardware reset, followed by a pipe.start(cfg) and then immediately pipe.stop() managed to get out of the deadlock (it takes 2-3 seconds to exit probably because of the reset but it seems to work). I'm using this as a temporary workaround. Not sure if it will work in your case with actual USB hardware but I thought it would be worth a shot. |
@shivak7: Thank you. I will give it a shot.
|
Thanks very much for the update @svebert - it's good to hear that your situation has improved! |
Hi @svebert Do you require further assistance with this case, please? Thanks! |
My problem is solved, for now. |
Okay, thanks very much @svebert for the update! As your problem is solved for now, I will close the case. Feel free to open a new case if problems re-occur at a future date. |
Issue Description
I am debugging a very nasty issue with randomly disconnecting realsense d415 from the usb controller. On the test pc the realsense disconnects randomly and temporarily for unknown reason. You hear the typical windows "unplugging sound", the camera disappears in the device manager and shortly after reappears.
When the camera disappears in the device manager,
oPipeline.wait_for_frames(5000U)
throws the errorFrame didn't arrive within 5000
. In case of this error, i stop the pipeline and restart it. This works like 3 or 4 times in a row and would solve my problem. But then after some sucessful restarts in the next restart,oPipeline.stop()
just blocks and deadlocks.I tried some workarounds
*instead of stopping the pipeline I stopped and closed the underlying rs2::sensor objects. Here the same issue occurs. The call to
oSensor.close()
hangs/blocks infinitly.*call
oPipeline.stop()
within deferred thread. But this crashes with unpredictable memory access violations later on, when trying to calloPipline.start()
or sometimes, just randomly.My issue is not the random disconnect on my test pc. My problem is, that I can't recover from it in some cases. How to stop and dispose all rs2 resources in a proper way? How to avoid this nasty infinity-blocking?
Here is my minimal example:
Realsense Viewer itsself shows also major reconnecting issues on this test pc. I have a USB3 connection and a fully USB3 compliant cable.
(And yes, maybe the firmware is too old or there is a problem with the camera and/or the pc. But our company uses hundreds or even thousands of these cameras on various customer sites. I need to understand how to avoid the deadlock in the software. I don't care about broken cameras, but i don't want to have deadlocks when connecting a broken camera. It should just not work but not block to infinity.)
Any suggestions? Any fixes?
The text was updated successfully, but these errors were encountered: