Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chrore: parse cgroup v2 #3857

Merged
merged 2 commits into from
Oct 10, 2024
Merged

chrore: parse cgroup v2 #3857

merged 2 commits into from
Oct 10, 2024

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Oct 3, 2024

Currently, parsing cgroups only works for version 1. In this version, each controller has its own subfolder so for example when we want to fetch memory controller info we search for files in /sys/fs/cgroup/memory. However, in cgroup version 2, all of the controllers reside in the same folder /sys/fs/cgroup/. This PR add support to parse version 2 cgroups under the unified directory model. Furthermore, we also not parse sys/fs/cgroup/user.slice/

  • add support for parsing cgroup version 2
  • add sys/fs/cgroup/user.slice/

Probably resolves #3812

@kostasrim kostasrim self-assigned this Oct 3, 2024
};

// For v1
constexpr auto base_mem_v1 = "/sys/fs/cgroup/memory"sv;
Copy link
Contributor Author

@kostasrim kostasrim Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. There are no memory.max in root cgroup. That means that always, even before this PR the read_mem(StrCat(base_mem, "/memory.limit_in_bytes"), &mdata->mem_total); would never read anything

See my comments in the GH issue.

@kostasrim kostasrim changed the title [do not review] chrore: parse cgroup v2 chrore: parse cgroup v2 Oct 7, 2024
@kostasrim kostasrim marked this pull request as ready for review October 7, 2024 08:37
@kostasrim kostasrim requested a review from adiholden October 7, 2024 08:37
@romange
Copy link
Collaborator

romange commented Oct 7, 2024

How can I test the changes? How to reproduce running under cgroups v2 on my dev machine?

@kostasrim
Copy link
Contributor Author

How can I test the changes? How to reproduce running under cgroups v2 on my dev machine?

Does your machine use v1 or v2 ? IMO I stepped over it in my gdb to verify the paths 🤷‍♂️

Also, note that I suspect that constexpr auto base_mem_v1 = "/sys/fs/cgroup/memory"sv; was a mistake. Even when the controller is on subfolder (for v1 cgroups), we should really poll /sys/fs/cgroup/user.slice/memory instead -- and more specifically it should also include user.slice which seems to be the parrent for all user cgroups. As I do not know what was the rationale behind that path, I left it as is and included the paths for v2.

To fully answer, step through your GDB if you are using cgroup v2 and it should properly resolve the right limits

@romange
Copy link
Collaborator

romange commented Oct 7, 2024

that's what I get when running from master branch with vlog=1:

I20241007 14:32:51.190955 825726 dfly_main.cc:459] mem_path = /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope
I20241007 14:32:51.190960 825726 dfly_main.cc:460] cpu_path = /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope
I20241007 14:32:51.190971 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup/memory/memory.limit_in_bytes: N/A
I20241007 14:32:51.190976 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup/memory/memory.max: N/A
I20241007 14:32:51.190984 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope/memory.limit_in_bytes: N/A
I20241007 14:32:51.190994 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope/memory.max: max
I20241007 14:32:51.191002 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope/memory.high: max

@kostasrim
Copy link
Contributor Author

kostasrim commented Oct 7, 2024

@romange

that's what I get when running from master branch with vlog=1:

I20241007 14:32:51.190955 825726 dfly_main.cc:459] mem_path = /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope
I20241007 14:32:51.190960 825726 dfly_main.cc:460] cpu_path = /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope
I20241007 14:32:51.190971 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup/memory/memory.limit_in_bytes: N/A
I20241007 14:32:51.190976 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup/memory/memory.max: N/A
I20241007 14:32:51.190984 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope/memory.limit_in_bytes: N/A
I20241007 14:32:51.190994 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope/memory.max: max
I20241007 14:32:51.191002 825726 dfly_main.cc:438] container limits: read /sys/fs/cgroup//user.slice/user-1000.slice/session-c1.scope/memory.high: max

Yep this makes sense, as I said the paths /sys/fs/cgroup/memory/memory.max are probably invalid. With the changes in this PR the output will be:

I20241007 14:45:48.500520 133436 dfly_main.cc:459] mem_path = /sys/fs/cgroup//user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-423f3ba1-6da8-42ce-b9bd-77849ef0a079.scope
I20241007 14:45:48.500545 133436 dfly_main.cc:460] cpu_path = /sys/fs/cgroup//user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-423f3ba1-6da8-42ce-b9bd-77849ef0a079.scope
I20241007 14:45:48.500578 133436 dfly_main.cc:438] container limits: read /sys/fs/cgroup/memory/memory.limit_in_bytes: N/A
I20241007 14:45:48.500612 133436 dfly_main.cc:438] container limits: read /sys/fs/cgroup/memory/memory.max: N/A
I20241007 14:45:48.500638 133436 dfly_main.cc:438] container limits: read /sys/fs/cgroup/memory.limit_in_bytes: N/A
I20241007 14:45:48.500659 133436 dfly_main.cc:438] container limits: read /sys/fs/cgroup/memory.max: N/A
I20241007 14:45:48.500681 133436 dfly_main.cc:438] container limits: read /sys/fs/cgroup/user.slice/memory.limit_in_bytes: N/A
I20241007 14:45:48.500723 133436 dfly_main.cc:438] container limits: read /sys/fs/cgroup/user.slice/memory.max: max
I20241007 14:45:48.500752 133436 dfly_main.cc:438] container limits: read /sys/fs/cgroup//user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-423f3ba1-6da8-42ce-b9bd-77849ef0a079.scope/memory.limit_in_bytes: N/A
I20241007 14:45:48.500784 133436 dfly_main.cc:438] container limits: read /sys/fs/cgroup//user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-423f3ba1-6da8-42ce-b9bd-77849ef0a079.scope/memory.max: max
I20241007 14:45:48.500828 133436 dfly_main.cc:438] container limits: read /sys/fs/cgroup//user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-423f3ba1-6da8-42ce-b9bd-77849ef0a079.scope/memory.high: max

Also keep in mind that the problem described in the issue was not debug logs you see here but rather triggering:

  555     LOG(ERROR) << "Failed in deducing any cgroup limits with paths " << mem_path << " and "
  556                << cpu_path;

Which should be now fixed since /sys/fs/cgroup/user.slice should contain the controllers required even if mem_path && cpu_path do not contain any controllers...

@romange romange merged commit 0cea5fe into main Oct 10, 2024
12 checks passed
@romange romange deleted the kpr1 branch October 10, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed in deducing any cgroup limits when the instance starts
3 participants