Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nix-daemon process leaking on nix-darwin Mac build machine #3294

Closed
johnalotoski opened this issue Dec 30, 2019 · 4 comments · Fixed by #6052
Closed

Nix-daemon process leaking on nix-darwin Mac build machine #3294

johnalotoski opened this issue Dec 30, 2019 · 4 comments · Fixed by #6052
Labels

Comments

@johnalotoski
Copy link

With nix-darwin on a mac-build machine, the nix-daemon appears to be spawned repeatedly and not close the process out during builds. After some time, when the ulimit process limits have been hit, the build machine yields resource errors, such as:

Could not spawn trampoline /usr/libexec/xpcproxy: 35: Resource temporarily unavailable
Deferred spawn of service failed: 35: Resource temporarily unavailable

Some diagnostic commands below:

-sh-3.2# nix-channel --list
darwin https://github.com/input-output-hk/nix-darwin/archive/master.tar.gz
nixpkgs https://nixos.org/channels/nixos-19.03
# nix run nixpkgs.nix-info -c nix-info -m
 - system: `"x86_64-darwin"`
 - host os: `Darwin 18.7.0, macOS 10.14.6`
 - multi-user?: `no`
 - sandbox: `no`
 - version: `nix-env (Nix) 2.1.3`
 - channels(root): `"darwin, nixpkgs-19.03.173672.5f7eae4bbb1"`
 - channels(nixos): `"darwin, nixpkgs-19.03.173672.5f7eae4bbb1"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixpkgs`
-sh-3.2# ps aux | grep nix-daemon
root            86871   0.0  0.3  4327444  17056   ??  Ss    3:34PM   0:00.50 /nix/var/nix/profiles/default/bin/nix-daemon
root            84285   0.0  0.3  4335860  18944   ??  Ss    3:33PM   0:00.61 /nix/var/nix/profiles/default/bin/nix-daemon
root            71620   0.0  0.2  4328276  14804   ??  Ss    3:31PM   0:00.22 /nix/var/nix/profiles/default/bin/nix-daemon
root            56956   0.0  0.3  4327500  17756   ??  Ss    3:29PM   0:00.45 /nix/var/nix/profiles/default/bin/nix-daemon
root            49187   0.0  0.3  4335848  17736   ??  Ss    3:27PM   0:00.31 /nix/var/nix/profiles/default/bin/nix-daemon
...
(gdb) thread apply all bt

Thread 2 (Thread 0x1003 of process 77402):
#0  0x00007fff606f49de in __ulock_wait () from /usr/lib/system/libsystem_kernel.dylib
#1  0x00007fff607b56de in _pthread_join () from /usr/lib/system/libsystem_pthread.dylib
#2  0x000000010ec976e8 in std::__1::thread::join() () from /nix/store/g5avn9k0sdn51rdwji3maikkl7a4sqaf-libc++-5.0.2/lib/libc++.1.0.dylib
#3  0x000000010e9c49a0 in nix::CurlDownloader::~CurlDownloader() () from /nix/store/dkjlfkrknmxbjmpfk3dg4q3nmb7m3zvk-nix-2.1.3/lib/libnixstore.dylib
#4  0x000000010e97b111 in std::__1::shared_ptr<nix::Store>::~shared_ptr() () from /nix/store/dkjlfkrknmxbjmpfk3dg4q3nmb7m3zvk-nix-2.1.3/lib/libnixstore.dylib
#5  0x00007fff606633cf in ?? ()
#6  0x00007fb009400000 in ?? ()
#7  0x0000000000000008 in ?? ()
#8  0x000000000940bb10 in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 0xd03 of process 77402):
#0  0x00007fff606f4f06 in __psynch_mutexwait () from /usr/lib/system/libsystem_kernel.dylib
#1  0x00007fff607b1d52 in _pthread_mutex_firstfit_lock_wait () from /usr/lib/system/libsystem_pthread.dylib
#2  0x00007fff607af4cd in _pthread_mutex_firstfit_lock_slow () from /usr/lib/system/libsystem_pthread.dylib
#3  0x000000010ec83749 in std::__1::mutex::lock() () from /nix/store/g5avn9k0sdn51rdwji3maikkl7a4sqaf-libc++-5.0.2/lib/libc++.1.0.dylib
#4  0x000000010ebce483 in nix::InterruptCallbackImpl::~InterruptCallbackImpl() () from /nix/store/dkjlfkrknmxbjmpfk3dg4q3nmb7m3zvk-nix-2.1.3/lib/libnixutil.dylib
#5  0x000000010ebce44e in nix::InterruptCallbackImpl::~InterruptCallbackImpl() () from /nix/store/dkjlfkrknmxbjmpfk3dg4q3nmb7m3zvk-nix-2.1.3/lib/libnixutil.dylib
#6  0x000000010e9ba733 in nix::CurlDownloader::workerThreadMain() () from /nix/store/dkjlfkrknmxbjmpfk3dg4q3nmb7m3zvk-nix-2.1.3/lib/libnixstore.dylib
#7  0x000000010e9b958a in nix::CurlDownloader::workerThreadEntry() () from /nix/store/dkjlfkrknmxbjmpfk3dg4q3nmb7m3zvk-nix-2.1.3/lib/libnixstore.dylib
#8  0x000000010e9b94ed in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, nix::CurlDownloader::CurlDownloader()::{lambda()#1}> >(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, nix::CurlDownloader::CurlDownloader()::{lambda()#1}>) ()
   from /nix/store/dkjlfkrknmxbjmpfk3dg4q3nmb7m3zvk-nix-2.1.3/lib/libnixstore.dylib
#9  0x00007fff607b12eb in _pthread_body () from /usr/lib/system/libsystem_pthread.dylib
#10 0x00007fff607b4249 in _pthread_start () from /usr/lib/system/libsystem_pthread.dylib
#11 0x00007fff607b040d in thread_start () from /usr/lib/system/libsystem_pthread.dylib
#12 0x0000000000000000 in ?? ()
@roberth
Copy link
Member

roberth commented Jan 20, 2020

This happened on a Hercules CI agent machine too.

ps aux | grep nix-daemon | wc -l
792

@RocketPuppy
Copy link

I'll try to find more diagnostic info if it happens again, but I'm seeing this behavior (resource limit reached) on two 20.03 NixOS machines. One is a laptop that gets tweaked and rebuilt frequently. The other is a computer managed by NixOps.

@johnalotoski
Copy link
Author

@stale
Copy link

stale bot commented Jun 3, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Jun 3, 2021
roberth added a commit to hercules-ci/nix that referenced this issue Feb 6, 2022
This changes the representation of the interrupt callback list to
be safe to use during interrupt handling.

Holding a lock while executing arbitrary functions is something to
avoid in general, because of the risk of deadlock.

Such a deadlock occurs in NixOS#3294
where ~CurlDownloader tries to deregister its interrupt callback.

This happens during what seems to be a triggerInterrupt() by the
daemon connection's MonitorFdHup thread. This bit I can not confirm
based on the stack trace though; it's based on reading the code,
so no absolute certainty.
roberth added a commit to hercules-ci/nix that referenced this issue Feb 6, 2022
This changes the representation of the interrupt callback list to
be safe to use during interrupt handling.

Holding a lock while executing arbitrary functions is something to
avoid in general, because of the risk of deadlock.

Such a deadlock occurs in NixOS#3294
where ~CurlDownloader tries to deregister its interrupt callback.

This happens during what seems to be a triggerInterrupt() by the
daemon connection's MonitorFdHup thread. This bit I can not confirm
based on the stack trace though; it's based on reading the code,
so no absolute certainty.
roberth added a commit to hercules-ci/nix that referenced this issue Feb 6, 2022
This changes the representation of the interrupt callback list to
be safe to use during interrupt handling.

Holding a lock while executing arbitrary functions is something to
avoid in general, because of the risk of deadlock.

Such a deadlock occurs in NixOS#3294
where ~CurlDownloader tries to deregister its interrupt callback.

This happens during what seems to be a triggerInterrupt() by the
daemon connection's MonitorFdHup thread. This bit I can not confirm
based on the stack trace though; it's based on reading the code,
so no absolute certainty.
roberth added a commit to hercules-ci/nix that referenced this issue Feb 6, 2022
This changes the representation of the interrupt callback list to
be safe to use during interrupt handling.

Holding a lock while executing arbitrary functions is something to
avoid in general, because of the risk of deadlock.

Such a deadlock occurs in NixOS#3294
where ~CurlDownloader tries to deregister its interrupt callback.

This happens during what seems to be a triggerInterrupt() by the
daemon connection's MonitorFdHup thread. This bit I can not confirm
based on the stack trace though; it's based on reading the code,
so no absolute certainty, but a smoking gun nonetheless.
edolstra added a commit that referenced this issue Feb 21, 2022
…lback-deadlock

Fix deadlocked nix-daemon zombies on darwin #3294
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants