Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture PIDs die silently (possibly due to out of disk space condition). #1232

Closed
transplier opened this issue Jun 16, 2021 · 7 comments
Closed
Labels

Comments

@transplier
Copy link

Happened contemporaneously to #1231

I ran out of disk space, and all of my capture processes seem to have silently died. The UI is still running, no errors are reported, but the video shows up as pure green.

If I look at the debug panel, capture PIDs are still reported but they don't correspond to actual processes on the system.

 * Starting nginx nginx
   ...done.
Starting migrations
peewee_migrate                 INFO    : Starting migrations
There is nothing to migrate
peewee_migrate                 INFO    : There is nothing to migrate
frigate.mqtt                   INFO    : MQTT connected
detector.cpu1                  INFO    : Starting detection process: 34
frigate.mqtt                   INFO    : Turning off detection for cave via mqtt
detector.cpu2                  INFO    : Starting detection process: 35
detector.cpu3                  INFO    : Starting detection process: 37
detector.cpu6                  INFO    : Starting detection process: 42
frigate.app                    INFO    : Camera processor started for driveway: 47
frigate.app                    INFO    : Camera processor started for cave: 48
detector.cpu5                  INFO    : Starting detection process: 41
detector.cpu4                  INFO    : Starting detection process: 40
frigate.app                    INFO    : Camera processor started for backyard: 49
frigate.app                    INFO    : Camera processor started for garage: 50
frigate.app                    INFO    : Capture process started for driveway: 51
frigate.app                    INFO    : Capture process started for cave: 52
frigate.app                    INFO    : Capture process started for backyard: 53
frigate.app                    INFO    : Capture process started for garage: 58
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55a7c54f9140] moov atom not found
/tmp/cache/driveway-20210614112325.mp4: Invalid data found when processing input
frigate.events                 INFO    : bad file: driveway-20210614112325.mp4
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x558bb3719140] moov atom not found
/tmp/cache/cave-20210614112325.mp4: Invalid data found when processing input
frigate.events                 INFO    : bad file: cave-20210614112325.mp4
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x5631c935f140] moov atom not found
/media/frigate/recordings/cave-20210614112315.mp4: Invalid data found when processing input
frigate.record                 INFO    : bad file: cave-20210614112315.mp4
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x56342460b140] moov atom not found
/media/frigate/recordings/driveway-20210614112315.mp4: Invalid data found when processing input
frigate.record                 INFO    : bad file: driveway-20210614112315.mp4
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x5583b7893140] moov atom not found
/media/frigate/recordings/garage-20210614112315.mp4: Invalid data found when processing input
frigate.record                 INFO    : bad file: garage-20210614112315.mp4
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x556c390d8140] moov atom not found
/media/frigate/recordings/backyard-20210614112315.mp4: Invalid data found when processing input
frigate.record                 INFO    : bad file: backyard-20210614112315.mp4
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x555613f44140] moov atom not found
/tmp/cache/garage-20210614112325.mp4: Invalid data found when processing input
frigate.events                 INFO    : bad file: garage-20210614112325.mp4
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55d2eb3d3140] moov atom not found
/tmp/cache/backyard-20210614112326.mp4: Invalid data found when processing input
frigate.events                 INFO    : bad file: backyard-20210614112326.mp4
frigate.mqtt                   INFO    : Turning on detection for cave via mqtt
frigate.mqtt                   INFO    : Turning off detection for cave via mqtt
frigate.mqtt                   INFO    : Turning on detection for cave via mqtt
frigate.mqtt                   INFO    : Turning off detection for cave via mqtt
frigate.events                 WARNING : More than 90% of the cache is used.
frigate.events                 WARNING : Consider increasing space available at /tmp/cache or reducing max_seconds in your clips config.
frigate.events                 WARNING : Proactively cleaning up the cache...
Exception in thread event_processor:
Traceback (most recent call last):
  File "/opt/frigate/frigate/events.py", line 193, in run
    event_type, camera, event_data = self.event_queue.get(timeout=10)
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 108, in get
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/frigate/frigate/events.py", line 196, in run
    self.refresh_cache()
  File "/opt/frigate/frigate/events.py", line 121, in refresh_cache
    oldest_clip = min(self.cached_clips.values(), key=lambda x:x['start_time'])
ValueError: min() arg is an empty sequence

For example: Capture process started for driveway: 51 however PID 51 no longer exists on my system.

I'm not sure what the best course of action is here, but logging when capture processes die would probably be a good start, and updating the debug screen so it no longer shows dead PID.

I'm not sure if it makes sense to automatically restart dead capture processes. It would be great to expose the service health via the API so for instance users can add monitoring to be notified when their cameras go dead.

@transplier
Copy link
Author

I ended up building a health check script for my instance that just loops through each camera and computes an average pixel color. If any are showing a green screen, I run some notification scripts.

curl -s 'http://$FRIGATE_HOST_PORT/api/$CAMERA_NAME/latest.jpg' | convert - -resize 1x1 txt:- | grep '#009A00'

#009A00 is the hex color of the green screen.

@randyg503
Copy link

without knowing what exception handling capabilities might already be in place; maybe something like Tenacity (https://tenacity.readthedocs.io/en/latest/) would be useful to consider?

@transplier
Copy link
Author

Edit: I cleaned up my disk (hundreds of GB free now) and restarted frigate. Unfortunately my cameras did not come back up - still got a green screen. I restarted docker - same issue. I ended up having to restart the entire host to get my feeds back. I'm a little at a loss.
I also still don't see the capture PIDs listed in the logs. Perhaps there's something I'm misunderstanding about how PIDs are reported within docker. I do see the processing PIDs, though.

@transplier
Copy link
Author

My camera feeds died overnight and I'm nowhere close to a full disk. Restarting Frigate did not get the feeds back, had to reboot the machine again. I'm beginning to wonder if this is somehow a GPU hwaccel driver issue, since the issue even survived restarting the docker daemon. Next time this happens I'll try to go down this particular garden path. This theory is further supported by the fact that my recordings never went down (they don't get decoded before being written to disk, therefore not going through ffmpeg's gpu-accelerated decoding). I think the RTMP streams served by Frigate also kept working fine.

Good news is my hacky healthcheck listed above totally worked and notified me the cameras were down.

@randyg503
Copy link

randyg503 commented Jun 18, 2021

observing the same behavior where stale/defunct capture PIDs are reported in the debug panel.

I'm curious if others might be able to recreate this behavior by simply consoling into a running Frigate container, issuing a 'kill {active capture PID}' and then observing any resulting UI behavior.

@transplier
Copy link
Author

I'm beginning to wonder if this is somehow a GPU hwaccel driver issue

Happened again, all cams went green, restarting Frigate didn't help. I disabled hwaccel and everything came back. So I think at least part of this is caused by something in the intel hwaccel stack.

$ lspci|grep VGA
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)

@stale
Copy link

stale bot commented Jul 24, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 24, 2021
@stale stale bot closed this as completed Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants