-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop relying on number of systems for cgroups #1817
Conversation
libcontainer/cgroups/utils_test.go
Outdated
@@ -93,6 +93,211 @@ const systemdMountinfo = `115 83 0:32 / / rw,relatime - aufs none rw,si=c0bd3d3, | |||
136 117 0:12 /1 /dev/console rw,nosuid,noexec,relatime - devpts none rw,gid=5,mode=620,ptmxmode=000 | |||
84 115 0:40 / /tmp rw,relatime - tmpfs none rw` | |||
|
|||
const bedrockMountinfo = `17 22 0:17 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost my entire mountinfo currently, I can trim it down if it's too large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to trim out all the uninteresting entries as this was realy big., let me know if I should leave any more in
12acb08
to
04b1fdf
Compare
When there are complicated mount setups, there can be multiple mount points which have the subsystem we are looking for. Instead of counting the mountpoints, tick off subsystems until we have found them all. Without the 'all' flag, ignore duplicate subsystems after the first. Signed-off-by: Daniel Dao <[email protected]>
Add a mountinfo from a bedrock linux system with 4 strata, and include it for tests Signed-off-by: Jay Kamat <[email protected]> Signed-off-by: Daniel Dao <[email protected]>
Hi @jgkamat, thanks for the patch ! I think we can simplify this a bit more, not add an additional map to track seen subsystems and just use the original subsystem map we pass as argument instead. Something like dqminh@5ee0648 ? |
I think we can simplify this a bit more, not add an additional map to track
seen subsystems and just use the original subsystem map we pass as argument
instead. Something like ***@***.*** ?
Thanks for the supplementary patch, here's my feedback on it:
@@ -151,17 +151,16 @@ func getCgroupMountsHelper(ss map[string]bool, mi io.Reader, all bool) ([]Mount,
Root: fields[3],
}
for _, opt := range strings.Split(fields[len(fields)-1], ",") {
- if !ss[opt] {
+ seen, known := ss[opt]
+ if !known || (!all && seen) {
continue
}
if strings.HasPrefix(opt, cgroupNamePrefix) {
- m.Subsystems = append(m.Subsystems, opt[len(cgroupNamePrefix):])
- } else {
- m.Subsystems = append(m.Subsystems, opt)
- }
- if !all {
- numFound++
+ opt = opt[len(cgroupNamePrefix):]
}
+ m.Subsystems = append(m.Subsystems, opt)
+ ss[opt] = true
Could setting this map be moved before the 'opt = opt[len(cgroupNamePrefix):]'
line? Without that, this dosen't work for me, as sometimes numFound is
incremented when the opt set dosen't match the opt passed in (as it is
modified after lookup).
+ numFound++
}
res = append(res, m)
Even when there's no mountpoints found (for example, due to not finding them
already), they are added to the map. I'm not sure if this is desired behavior
or not. For example, with this patch, getCgroupMounts without all returns:
[{/sys/fs/cgroup/systemd / [systemd]} {/bedrock/strata/arch/sys/fs/cgroup/systemd / []} {/bedrock/strata/bionic/sys/fs/cgroup/systemd / []} {/bedrock/strata/fallback/sys/fs/cgroup/systemd / []} {/bedrock/strata/gentoo/sys/fs/cgroup/systemd / []} {/sys/fs/cgroup/memory / [memory]} {/bedrock/strata/arch/sys/fs/cgroup/memory / []} {/bedrock/strata/bionic/sys/fs/cgroup/memory / []} {/bedrock/strata/fallback/sys/fs/cgroup/memory / []} {/bedrock/strata/gentoo/sys/fs/cgroup/memory / []} {/sys/fs/cgroup/net_cls,net_prio / [net_cls net_prio]} {/bedrock/strata/arch/sys/fs/cgroup/net_cls,net_prio / []} {/bedrock/strata/bionic/sys/fs/cgroup/net_cls,net_prio / []} {/bedrock/strata/fallback/sys/fs/cgroup/net_cls,net_prio / []} {/bedrock/strata/gentoo/sys/fs/cgroup/net_cls,net_prio / []} {/sys/fs/cgroup/freezer / [freezer]} {/bedrock/strata/arch/sys/fs/cgroup/freezer / []} {/bedrock/strata/bionic/sys/fs/cgroup/freezer / []} {/bedrock/strata/fallback/sys/fs/cgroup/freezer / []} {/bedrock/strata/gentoo/sys/fs/cgroup/freezer / []} {/sys/fs/cgroup/cpuset / [cpuset]} {/bedrock/strata/arch/sys/fs/cgroup/cpuset / []} {/bedrock/strata/bionic/sys/fs/cgroup/cpuset / []} {/bedrock/strata/fallback/sys/fs/cgroup/cpuset / []} {/bedrock/strata/gentoo/sys/fs/cgroup/cpuset / []} {/sys/fs/cgroup/cpu,cpuacct / [cpu cpuacct]} {/bedrock/strata/arch/sys/fs/cgroup/cpu,cpuacct / []} {/bedrock/strata/bionic/sys/fs/cgroup/cpu,cpuacct / []} {/bedrock/strata/fallback/sys/fs/cgroup/cpu,cpuacct / []} {/bedrock/strata/gentoo/sys/fs/cgroup/cpu,cpuacct / []} {/sys/fs/cgroup/perf_event / [perf_event]} {/bedrock/strata/arch/sys/fs/cgroup/perf_event / []} {/bedrock/strata/bionic/sys/fs/cgroup/perf_event / []} {/bedrock/strata/fallback/sys/fs/cgroup/perf_event / []} {/bedrock/strata/gentoo/sys/fs/cgroup/perf_event / []} {/sys/fs/cgroup/pids / [pids]} {/bedrock/strata/arch/sys/fs/cgroup/pids / []} {/bedrock/strata/bionic/sys/fs/cgroup/pids / []} {/bedrock/strata/fallback/sys/fs/cgroup/pids / []} {/bedrock/strata/gentoo/sys/fs/cgroup/pids / []} {/sys/fs/cgroup/devices / [devices]} {/bedrock/strata/arch/sys/fs/cgroup/devices / []} {/bedrock/strata/bionic/sys/fs/cgroup/devices / []} {/bedrock/strata/fallback/sys/fs/cgroup/devices / []} {/bedrock/strata/gentoo/sys/fs/cgroup/devices / []} {/sys/fs/cgroup/blkio / [blkio]}]
instead of:
[{/sys/fs/cgroup/systemd / [systemd]} {/sys/fs/cgroup/net_cls,net_prio / [net_cls net_prio]} {/sys/fs/cgroup/blkio / [blkio]} {/sys/fs/cgroup/cpu,cpuacct / [cpu cpuacct]} {/sys/fs/cgroup/cpuset / [cpuset]} {/sys/fs/cgroup/devices / [devices]} {/sys/fs/cgroup/memory / [memory]} {/sys/fs/cgroup/freezer / [freezer]} {/sys/fs/cgroup/pids / [pids]} {/sys/fs/cgroup/perf_event / [perf_event]}]
Both behaviors work fine for me (since the needed mountpoints are available
now), but I'm not sure if listing empty mountpoints is "ok" according to the
standard (I'm not familiar with it).
}
Other than those two things, I think that your patch is a lot better than
mine, thanks for creating it! :)
|
Signed-off-by: Jay Kamat <[email protected]>
Hi @dqminh, I've just committed the changes I described in my commit above (based on the changed you linked). Please let me know if you think anything needs to be improved! |
@crosbymichael I noticed you modified chunks of this file, although it was several years ago. If you have some free time, could you give this a review? I'd like to get this fixed eventually as it causes docker to crash on my system. (ping @dqminh in case he's free for another round of review). |
Thanks, looking |
Fix duplicate entries and missing entries in getCgroupMountsHelper Add test for testing cgroup mounts on bedrock linux Stop relying on number of subsystems for cgroups LGTMs: @crosbymichael @cyphar Closes #1817
Bumping containerd requires bumping runc as well. When doing so we adopt PR opencontainers/runc#1817 which would change the way cgroup mounts table looks like from containers' point of view. Additionally we introduce checks for `hugetbl` and `pids` cgroups This change also makes tests use /proc/self/mountinfo instead of /proc/mounts [#161840949] Co-authored-by: Danail Branekov <[email protected]>
Bumping containerd requires bumping runc as well. When doing so we adopt PR opencontainers/runc#1817 which would change the way cgroup mounts table looks like from containers' point of view. This change also makes tests use /proc/self/mountinfo instead of /proc/mounts [#161840949] Co-authored-by: Danail Branekov <[email protected]>
this for loop determines where to stop by the number of found entries in the mountinfo.
However, there is a kernel feature called shared subtrees, which can cause these mount points to get duplicated in mountinfo. Bedrock Linux uses this extensively to function. Here is a section of my mountinfo showing mounts that correspond to
cpu
:However, my
/proc/self/cgroup
does not contain duplicated entries:Because of this, when docker started, the
getCgroupMounts
function would return:instead of
which lead to docker failing to start.
This PR tries to adjust this behavior so that when "all" is false, it will skip any duplicate subsystems, and only return the first one found.
Please let me know if anything can be improved, or if anyone thinks of a better way this can be implemented.