Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move daemon process into sub-cgroup #11412

Merged
merged 4 commits into from
Sep 9, 2024
Merged

Conversation

parkerhoyes
Copy link
Contributor

Motivation

Move the daemon process into its own sub-cgroup when using --use-cgroups. This is necessary to abide by the no-processes-in-inner-nodes rule necessary for delegation of resource control (https://github.com/systemd/systemd/blob/main/docs/CGROUP_DELEGATION.md).

Context

Closes: #9675

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

The daemon process is now moved into a new sub-cgroup called nix-daemon when the
daemon starts. This is necessary to abide by the no-processes-in-inner-nodes
rule, because the service cgroup becomes an inner node when the child cgroups
for the build are created (see LocalDerivationGoal::startBuilder()).

See NixOS#9675
@edolstra
Copy link
Member

edolstra commented Sep 4, 2024

This raises the question what happens when we're using cgroup-based resource control and we're not using the daemon. E.g. when running nix in an interactive root shell, the cgroup might be something like /user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-ee86645353a54fe8b466994824fedfa8.scope, where presumably it's not possible to obey the no-processes-in-inner-nodes rule.

@parkerhoyes
Copy link
Contributor Author

@edolstra If I'm understanding things correctly, it's not really good practice to create cgroups except as a child of a cgroup that you yourself manage. We probably shouldn't be creating cgroups in the case where builds are run from a user shell. With systemd, typically only services would create cgroups, and you'd use systemd-run --user --scope if you needed a transient service to create cgroups in.

One solution would be to just leave it up to the user to decide when to pass --use-cgroups (which is how things work now). Typically you'd want to enable it for the daemon but leave it disabled otherwise. In this case, you'd have to add --use-cgroups to your service configuration and not enable it for nix system-wide (eg. in nix.conf).

We could add a check in LocalDerivationGoal::startBuilder() that only uses cgroups if its invoked from the daemon. However, there may be cases where someone might actually want to run the builder directly and still have cgroups.

Perhaps the most user-friendly solution would be to add an option --use-cgroups-in-daemon which enables cgroups for builds run by the daemon, and then you could pass --use-cgroups when you want cgroups in other cases. That way you could configure the use of cgroups by the daemon in your nix.conf rather than having to modify the service configuration.

@edolstra edolstra merged commit 5e337ee into NixOS:master Sep 9, 2024
11 checks passed
Artturin pushed a commit to NixOS/nixpkgs that referenced this pull request Sep 13, 2024
When `use-cgroups` is enabled, the nix daemon creates sub-cgroups for the build processes (and itself if NixOS/nix#11412 is merged, see NixOS/nix#9675). `Delegate` should be set to prevent systemd from messing with the nix service's cgroups (https://github.com/systemd/systemd/blob/main/docs/CGROUP_DELEGATION.md) and ensure the OOM killer only targets the offending derivation and not the entire service (NixOS/nix#10374).
@arianvp
Copy link
Member

arianvp commented Sep 15, 2024

We should ideally not be doing this and just allocate cgroups using the Systemd API. Then we don't need all these delegation shenanigans

@zimbatm
Copy link
Member

zimbatm commented Sep 18, 2024

Do we want to create a hard dependency between nix-daemon and systemd? It might be worth looking at what docker and runc are doing since they face similar problems.

@arianvp
Copy link
Member

arianvp commented Sep 18, 2024

I think we do. In cgroups V2 there is a single writer rule. That writer is systemd. Containerd and friends also depend on the systemd dbus API for managing cgroups. I think they support running in two modes. Either delegation where containerd creates the cgroups themselves and sets properties like memory and CPU, or containerd will create scope units through the StartTransientUnit() dbus call and use SetUnitProperty for setting resource options. This is abstracted in the containerd/cgroups golang package. They also abstract away cgroups V1 and V2. At cost of great complexity.

We could do something similar where we support both. But it sounds complex. I don't know of any other cgroup management agents than Systemd and I haven't seen adoption of groupsv2 on non-systemd linuxes at all. So to be honest i think it makes sense to use dbus to create and edit cgroups. (Or varlink if they ever release that. Varlink we can implement without adding any new code dependencies to nix. As its just json over Unix domain sockets)

Please see https://systemd.io/CONTROL_GROUP_INTERFACE/ for the rationale of why systemd needs to manage writes to cgroups centrally.

Linuxes that are not systemd based can then disable the dbus dependency / cgroups feature with a meson flag and the groups feature will not be supported on those platforms.

Or we can keep the mix of delegation and direct integration. Or we can only do delegation and keep managing cgroups ourselves

@arianvp
Copy link
Member

arianvp commented Sep 18, 2024

Maybe we can keep delegation for now until clear usecase shows up on why the leaf cgroups should be Systemd managed.

One usecase I can think of that is quite interesting:

If a nix build is running a full OS, e.g. If we want to run lightweight nixos tests without kvm, we need to manually add code to delegate that cgroup down so the nested systemd running in the build doesn't run into issues. That might be complex? Whilst with systemd-managed cgroups we get it out of the box . (Just set Delegate=yes on the scope unit of the build)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-cli Relating to the "nix" command
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nix creates processes in cgroup inner nodes
4 participants