Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/nix-daemon: Enable cgroups delegation #339310

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

parkerhoyes
Copy link
Contributor

Motivation for change

When use-cgroups is enabled, the nix daemon creates sub-cgroups for the build processes (and itself if NixOS/nix#11412 is merged, see NixOS/nix#9675). Delegate should be set to prevent systemd from messing with the nix service's cgroups (https://github.com/systemd/systemd/blob/main/docs/CGROUP_DELEGATION.md) and ensure the OOM killer only targets the offending derivation and not the entire service (NixOS/nix#10374).

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Sep 3, 2024
@NixOSInfra NixOSInfra added the 12. first-time contribution This PR is the author's first one; please be gentle! label Sep 3, 2024
@yshui
Copy link
Contributor

yshui commented Sep 10, 2024

Hi, thanks for working on this!

I checked systemd documentation but can't really find much information on how Delegate= interact with systemd-oomd, do you have more information you can share? thanks.

@parkerhoyes
Copy link
Contributor Author

@yshui I'm afraid not - I only found out about that behavior through trial and error. I don't know if there's any documentation on it.

@zimbatm
Copy link
Member

zimbatm commented Sep 12, 2024

From my understanding, this change shouldn't affect back-compat and could be merged already. Any objections?

@@ -198,6 +198,7 @@ in
IOSchedulingClass = cfg.daemonIOSchedClass;
IOSchedulingPriority = cfg.daemonIOSchedPriority;
LimitNOFILE = 1048576;
Delegate = "yes";
Copy link
Member

@Artturin Artturin Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be conditional on cfg.settings.use-cgroups? or does this just not afect anything if use-cgroups = false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it conditional on use-cgroups would be acceptable, although as far as I can tell from the documentation, Delegate is specific to cgroups and shouldn't affect anything else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feature can also be automatically enabled, making the detection logic a bit tricky at Nix eval time:
https://github.com/NixOS/nix/blob/46339db18db562607977523aa8cb552a1e9b08d3/src/libstore/globals.hh#L497-L498

@Artturin Artturin merged commit c960ba4 into NixOS:master Sep 13, 2024
26 checks passed
@Artturin
Copy link
Member

Squash merged to add the message in the description to the commit

@arianvp
Copy link
Member

arianvp commented Sep 16, 2024

Though better than the status quo (previously cgroups feature didn’t really work at all) this is not great.

it would be better if nix-daemon would create cgroups (scopes) using the systemd dbus api in a dedicated slice. Instead of managing cgroups itself. (Easiest way would be if we wrap each build in a systemd-run command)

problem now is that we can only configure cgroups settings like limits and systemd-oomd on cgroups managed by systemd. In practise this means we can only configure settings on nix-daemon.service

this means that it’s possible that nix-daemon itself gets OOM killed if one of it builds are the one causing memory pressure. Which is obviously not desired.

@arianvp
Copy link
Member

arianvp commented Sep 16, 2024

Reading the systemd source code and oomd man page I need to slightly correct myself:

Note that only descendant cgroups are eligible candidates for killing; the unit with its property set to kill is not a candidate (unless one of its ancestors set their property to kill). Also only leaf cgroups and cgroups with memory.oom.group set to 1 are eligible candidates; see OOMPolicy= in systemd.service(5).

For Delegate=yes units OOMPolicy defaults to continue instead of kill.

So the end result is oomd and the kernel OOM killer will work as follows:

  1. It is always a leaf cgroup that is killed
  2. The unit itself will never get killed (nix-daemon)
  3. Only the leaf cgroup is killed ant not the whole cgroup tree (due to OOMPolicy=continue being default when Delegate=yes).

So this configuration is valid and works as expected.

@zimbatm
Copy link
Member

zimbatm commented Sep 17, 2024

thanks for checking @arianvp. For reference, since there are a lot of similarities, here is what the Docker service looks like: https://gist.github.com/zimbatm/acec1b6f4e970d52b42dc0b761f9d114 . More specifically, it sets KillMode=process and OOMScoreAdjust=-500.

@arianvp
Copy link
Member

arianvp commented Sep 17, 2024

I think we don't want KillMode=process unless nix-daemon stores bookkeeping of what nix-build processes it spawned as on disk state.

If nix-daemon dies we probably want the builds to die too? Unless it's smart enough to carry on after a restart like docker is .

The OOM adjust makes sense. We might also want to set MemoryMin= on nix-daemon to give it some dedicated memory to work with under memory pressure.

That makes me think though. Does eval run in nix-daemon or does it run in a separate process that the nix daemon spawns under a new cgroup? As eval is quite extremely heavy on memory it might be interesting to have it in a cgroup so we can make sure an eval eating all memory gets killed promptly without killing the entire daemon and any other builds in progress.

Or does nix-daemon not do eval at all? It happens in the nix-build command itself right? In that case we're all good.

@zimbatm
Copy link
Member

zimbatm commented Sep 18, 2024

Sounds good. I'll send a follow-up OOMAjust score. For MemoryMin, I need to do more instrumentation to find the correct value.
Luckily, the evaluation happens on the client, but I need to check if, for example, it needs an extra buffer when large files are sent over to be added to the store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 12. first-time contribution This PR is the author's first one; please be gentle!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants