migrate cpuset `reserved` partition when upgrading to 1.7+ #19847

drofloh · 2024-01-30T13:51:09Z

Nomad version

$ nomad version
Nomad v1.7.3
BuildDate 2024-01-15T16:55:40Z
Revision 60ee328

Operating system and Environment details

CentOS 7

Issue

When upgrading clients from 1.6.1 - 1.7.3 we are getting the below error:

Jan 30, '24 13:35:00 +0000 | Setup Failure | failed to setup alloc: pre-run hook "cpuparts_hook" failed: open /sys/fs/cgroup/cpuset/nomad/reserve/cpuset.cpus: no such file or directory
-- | -- | --

The file, as per the error doesn't exist, but it does at
/sys/fs/cgroup/cpuset/nomad/reserved/cpuset.cpus

Nomad Client logs (if appropriate)

{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-01-30T13:35:00.979723Z","alloc_id":"6652d206-ea14-a6be-f0f2-dbf21db54424","failed":false,"msg":"Task received by client","task":"web-server","type":"Received"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-01-30T13:35:00.984050Z","alloc_id":"6652d206-ea14-a6be-f0f2-dbf21db54424","error":"pre-run hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/reserve/cpuset.cpus: no such file or directory"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-01-30T13:35:00.984099Z","alloc_id":"6652d206-ea14-a6be-f0f2-dbf21db54424","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/reserve/cpuset.cpus: no such file or directory","task":"web-server","type":"Setup Failure"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-01-30T13:35:00.991486Z","alloc_id":"6652d206-ea14-a6be-f0f2-dbf21db54424","error":"hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/reserve/cpuset.cpus: no such file or directory"}
{"@level":"info","@message":"marking allocation for GC","@module":"client.gc","@timestamp":"2024-01-30T13:35:00.991493Z","alloc_id":"6652d206-ea14-a6be-f0f2-dbf21db54424"}

Reproduction steps

This happened on a client which we updated from 1.6.1 -> 1.7.3, servers previously updated to 1.7.3 with no issues.

Expected Result

Job runs as expected

Actual Result

job fails to run on clients updated to 1.7.3

Job file (if appropriate)

job "nginx1" {
  namespace = "platforms"
  node_pool = "platforms"
  group "nginx" {
    count = 3
    spread {
      attribute = "${node.unique.name}"
    }
    task "web-server" {
      driver = "docker"
      config {
        image = "nginx:latest"
      }
      resources {
        cpu    = 1000
        memory = 1000
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

drofloh · 2024-01-30T14:14:14Z

If I create the directory on the host myself and then restart the client all is fine and I see files appear in the "reserve" directory

$ ls -lrt /sys/fs/cgroup/cpuset/nomad/reserve
total 0
-rw-r--r--. 1 root root 0 Jan 30 14:07 tasks
-rw-r--r--. 1 root root 0 Jan 30 14:07 cgroup.procs
-rw-r--r--. 1 root root 0 Jan 30 14:07 notify_on_release
-rw-r--r--. 1 root root 0 Jan 30 14:07 cpuset.sched_relax_domain_level
-rw-r--r--. 1 root root 0 Jan 30 14:07 cpuset.sched_load_balance
-rw-r--r--. 1 root root 0 Jan 30 14:07 cpuset.mems
-rw-r--r--. 1 root root 0 Jan 30 14:07 cpuset.memory_spread_slab
-rw-r--r--. 1 root root 0 Jan 30 14:07 cpuset.memory_spread_page
-r--r--r--. 1 root root 0 Jan 30 14:07 cpuset.memory_pressure
-rw-r--r--. 1 root root 0 Jan 30 14:07 cpuset.memory_migrate
-rw-r--r--. 1 root root 0 Jan 30 14:07 cpuset.mem_hardwall
-rw-r--r--. 1 root root 0 Jan 30 14:07 cpuset.mem_exclusive
-r--r--r--. 1 root root 0 Jan 30 14:07 cpuset.effective_mems
-r--r--r--. 1 root root 0 Jan 30 14:07 cpuset.effective_cpus
-rw-r--r--. 1 root root 0 Jan 30 14:07 cpuset.cpu_exclusive
--w--w--w-. 1 root root 0 Jan 30 14:07 cgroup.event_control
-rw-r--r--. 1 root root 0 Jan 30 14:07 cgroup.clone_children
-rw-r--r--. 1 root root 0 Jan 30 14:10 cpuset.cpus

eduardolmedeiros · 2024-01-31T10:45:13Z

It might be useful, but I'm facing the same issue on 1.7.2 and Rocky 8.
I've managed to fix the issue by rebooting the host (somehow the folder is created automatically after reboot) or creating manually the folder /sys/fs/cgroup/cpuset/nomad/reserve also works.

cesan3 · 2024-02-06T15:07:25Z

It happens to us as well. Migration from 1.6.1 -> 1.7.3.

In current 1.6.1 server, the nomad cpuset subsystem cgroup reservation is created in : /sys/fs/cgroup/cpuset/nomad/reserved

 sudo ls -l /sys/fs/cgroup/cpuset/nomad/reserved/
total 0
-rw-r--r--. 1 root root 0 Feb  6 15:03 cgroup.clone_children
-rw-r--r--. 1 root root 0 Feb  6 15:03 cgroup.procs
-rw-r--r--. 1 root root 0 Feb  6 15:03 cpuset.cpu_exclusive
-rw-r--r--. 1 root root 0 Feb  3 09:13 cpuset.cpus
-r--r--r--. 1 root root 0 Feb  6 15:03 cpuset.effective_cpus
-r--r--r--. 1 root root 0 Feb  6 15:03 cpuset.effective_mems
-rw-r--r--. 1 root root 0 Feb  6 15:03 cpuset.mem_exclusive
-rw-r--r--. 1 root root 0 Feb  6 15:03 cpuset.mem_hardwall
-rw-r--r--. 1 root root 0 Feb  6 15:03 cpuset.memory_migrate
-r--r--r--. 1 root root 0 Feb  6 15:03 cpuset.memory_pressure
-rw-r--r--. 1 root root 0 Feb  6 15:03 cpuset.memory_spread_page
-rw-r--r--. 1 root root 0 Feb  6 15:03 cpuset.memory_spread_slab
-rw-r--r--. 1 root root 0 Feb  3 08:51 cpuset.mems
-rw-r--r--. 1 root root 0 Feb  6 15:03 cpuset.sched_load_balance
-rw-r--r--. 1 root root 0 Feb  6 15:03 cpuset.sched_relax_domain_level
-rw-r--r--. 1 root root 0 Feb  6 15:03 notify_on_release
-rw-r--r--. 1 root root 0 Feb  6 15:03 tasks

When upgrading to 1.7.3, we get the same reported error for the allocations:

Recent Events:
Time                  Type           Description
2024-02-06T13:29:42Z  Setup Failure  failed to setup alloc: pre-run hook "cpuparts_hook" failed: open /sys/fs/cgroup/cpuset/nomad/reserve/cpuset.cpus: no such file or directory
2024-02-06T13:29:42Z  Received       Task received by client

If we restart the nodes, the cpuset subsystem reservation directory is created in the expected /sys/fs/cgroup/cpuset/nomad/reserve/ path and then, Job deployment succeed

lgfa29 · 2024-02-06T21:21:12Z

Hi everyone 👋

I'm still trying to reproduce this issue, but in the mean time would you be able to check on your Nomad client logs for a message such as failed to create reserve cpuset partition? Nomad should be creating this path automatically on start, so not having that path means something when wrong there.

Thanks!

cesan3 · 2024-02-07T19:14:06Z

Hi @lgfa29

Hi everyone 👋

I'm still trying to reproduce this issue, but in the mean time would you be able to check on your Nomad client logs for a message such as failed to create reserve cpuset partition? Nomad should be creating this path automatically on start, so not having that path means something when wrong there.

Thanks!

I checked the logs after the upgrade and I couldn't find failed to create reserve cpuset partition. but I found these errors right after nomad is started after the upgrade:

2024-02-07T18:47:11.756Z [INFO]  client.fingerprint_mgr.vault: Vault is available: cluster=default
2024-02-07T18:47:11.777Z [INFO]  client.proclib.cg1: initializing nomad cgroups: cores="0,2-7"
2024-02-07T18:47:11.777Z [ERROR] client.proclib.cg1: failed to write cores to nomad cpuset cgroup: error="write /sys/fs/cgroup/cpuset/nomad/cpuset.cpus: device or resource busy"
2024-02-07T18:47:11.777Z [INFO]  client.plugin: starting plugin manager: plugin-type=csi
2024-02-07T18:47:11.777Z [INFO]  client.plugin: starting plugin manager: plugin-type=driver
2024-02-07T18:47:11.778Z [INFO]  client.plugin: starting plugin manager: plugin-type=device

lgfa29 · 2024-02-07T23:50:41Z

Thanks for the extra info @cesan3!

Yeah, I just noticed that there are several paths where an error can happen, each with a different error message.

Unfortunately there's not much that we can do in this case as there are multiple reasons those path creation may fail. But the agent shouldn't start in a state where it can't run tasks so I opened #19915 to handle this.

cesan3 · 2024-02-13T20:52:02Z

So, quick question @lgfa29 Do we have another ticket to fix the original problem regarding the migration path from nomad 1.6.1 -> 1.7.x ? Now with the fix, my migration stops earlier when nomad agent starts:

2024-02-13T20:35:56.089Z [INFO]  client.fingerprint_mgr.vault: Vault is available: cluster=default
2024-02-13T20:35:56.100Z [INFO]  client.proclib.cg1: initializing nomad cgroups: cores="0,2-7"
2024-02-13T20:35:56.101Z [ERROR] agent: error starting agent: error="client setup failed: failed to initialize process manager: failed to write cores to nomad cpuset cgroup: write /sys/fs/cgroup/cpuset/nomad/cpuset.cpus: device or resource busy"

Is there any plans to fix the migration?

lgfa29 · 2024-02-13T22:27:01Z

Could you check which process is keeping that path busy using something like fuser or lsof?

I will reopen this issue until we better understand the problem.

cesan3 · 2024-02-14T03:08:16Z

Hey @lgfa29 lsof doesn't show anything when querying /sys/fs/cgroup/cpuset/nomad/cpuset.cpus

I presume that 2 running allocations are keeping it busy???

tmpfs               1024        4      1020   1% /nomad/data/alloc/.../traefik/mnt1
tmpfs               1024        4      1020   1% /nomad/data/alloc/.../traefik/mnt2

Maybe some cgroup active children?

I checked

 mount -t cgroup | cut -f 3 -d ' '
/sys/fs/cgroup/systemd
/sys/fs/cgroup/net_cls,net_prio
/sys/fs/cgroup/cpuset
/sys/fs/cgroup/cpu,cpuacct
/sys/fs/cgroup/perf_event
/sys/fs/cgroup/pids
/sys/fs/cgroup/rdma
/sys/fs/cgroup/blkio
/sys/fs/cgroup/devices
/sys/fs/cgroup/memory
/sys/fs/cgroup/freezer
/sys/fs/cgroup/hugetlb

and

find /sys/fs/cgroup -maxdepth 1 -type l -exec ls {} \;
nomad  system.slice
nomad  system.slice

But the only way of fixing it this time was rebooting the server.

drofloh · 2024-02-14T13:53:17Z

We have as part of the upgrade from 1.6.1 -> 1.7.3 now added the /sys/fs/cgroup/cpuset/nomad/reserve directory ahead of the client restart, which resolved the issue on the majority of nodes, however some then also exhibited a similar issue but related to the /sys/fs/cgroup/cpuset/nomad/share dir not being present, which was /sys/fs/cgroup/cpuset/nomad/shared in 1.6.1 it seems. Also creating this dir ahead of the client restart helps as does a full system reboot.

failed to setup alloc: pre-run hook "cpuparts_hook" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory

Is there a reason these directories seem to of changed from 1.6.1 -> 1.7.3 from reserved and shared to reserve and share?

liukch · 2024-06-14T03:09:26Z

Same issue while upgrade from 1.6.6 -> 1.7.7

This issue does not always reproduce stably.
When I restarted a new instance, then started the Nomad client process, then exited normally by sending SIGINT.
When restarting, there is a probability of error：agent: error starting agent: error="client setup failed: failed to initialize process manager: failed to write root p artition cpuset: write /sys/fs/cgroup/nomad.slice/cpuset.cpus: device or resource busy"

cesan3 · 2024-06-14T03:52:49Z

Unfortunately, in our case, to migrate from 1.6.x to 1.7.x, we had to automate the creation of the expected directories and the nomad cgroup controller removal using cgdelete -g cpuset:/nomad to avoid the node's reboot.

But once you're on 1.7.x, you can upgrade normally.

tgross · 2024-06-25T14:30:44Z

Doing a little bit of issue cleanup. There's a workaround for the original issue here, but the upgrade path is still not very nice. I'm going to re-title this and mark it for roadmapping.

The underlying issue is that in 1.7.x and beyond the name of the partition is reserve (ref partition.go#L33-L42) whereas originally it was reserved (with a "d") (ref cpuset_manager_v1.go#L31) and there's no migration in the client.

drofloh added the type/bug label Jan 30, 2024

lgfa29 added stage/needs-investigation theme/cgroups cgroups issues labels Feb 6, 2024

lgfa29 self-assigned this Feb 6, 2024

lgfa29 mentioned this issue Feb 7, 2024

client: prevent start on cgroups init error #19915

Merged

lgfa29 closed this as completed in #19915 Feb 9, 2024

hc-github-team-nomad-core mentioned this issue Feb 9, 2024

Backport of client: prevent start on cgroups init error into release/1.7.x #19934

Merged

lgfa29 reopened this Feb 13, 2024

tgross added this to Nomad - Community Issues Triage Jun 24, 2024

tgross moved this to Triaging in Nomad - Community Issues Triage Jun 24, 2024

tgross unassigned lgfa29 Jun 24, 2024

tgross moved this from Triaging to Needs Triage in Nomad - Community Issues Triage Jun 24, 2024

tgross changed the title ~~post 1.7.3 update tasks get "failed: open /sys/fs/cgroup/cpuset/nomad/reserve/cpuset.cpus"~~ migrate cpuset reserved partition when upgrading to 1.7+ Jun 25, 2024

tgross added hcc/jira and removed stage/needs-investigation labels Jun 25, 2024

tgross moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Jun 25, 2024

tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migrate cpuset `reserved` partition when upgrading to 1.7+ #19847

migrate cpuset `reserved` partition when upgrading to 1.7+ #19847

drofloh commented Jan 30, 2024

drofloh commented Jan 30, 2024

eduardolmedeiros commented Jan 31, 2024 •

edited

Loading

cesan3 commented Feb 6, 2024

lgfa29 commented Feb 6, 2024

cesan3 commented Feb 7, 2024

lgfa29 commented Feb 7, 2024

cesan3 commented Feb 13, 2024 •

edited

Loading

lgfa29 commented Feb 13, 2024

cesan3 commented Feb 14, 2024

drofloh commented Feb 14, 2024 •

edited

Loading

liukch commented Jun 14, 2024

cesan3 commented Jun 14, 2024

tgross commented Jun 25, 2024 •

edited

Loading

migrate cpuset reserved partition when upgrading to 1.7+ #19847

migrate cpuset reserved partition when upgrading to 1.7+ #19847

Comments

drofloh commented Jan 30, 2024

Nomad version

Operating system and Environment details

Issue

Nomad Client logs (if appropriate)

Reproduction steps

Expected Result

Actual Result

Job file (if appropriate)

drofloh commented Jan 30, 2024

eduardolmedeiros commented Jan 31, 2024 • edited Loading

cesan3 commented Feb 6, 2024

lgfa29 commented Feb 6, 2024

cesan3 commented Feb 7, 2024

lgfa29 commented Feb 7, 2024

cesan3 commented Feb 13, 2024 • edited Loading

lgfa29 commented Feb 13, 2024

cesan3 commented Feb 14, 2024

drofloh commented Feb 14, 2024 • edited Loading

liukch commented Jun 14, 2024

cesan3 commented Jun 14, 2024

tgross commented Jun 25, 2024 • edited Loading

migrate cpuset `reserved` partition when upgrading to 1.7+ #19847

migrate cpuset `reserved` partition when upgrading to 1.7+ #19847

eduardolmedeiros commented Jan 31, 2024 •

edited

Loading

cesan3 commented Feb 13, 2024 •

edited

Loading

drofloh commented Feb 14, 2024 •

edited

Loading

tgross commented Jun 25, 2024 •

edited

Loading