Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop relying on number of systems for cgroups #1817

Merged
merged 3 commits into from
Sep 19, 2018

Conversation

jgkamat
Copy link
Contributor

@jgkamat jgkamat commented Jun 13, 2018

this for loop determines where to stop by the number of found entries in the mountinfo.

However, there is a kernel feature called shared subtrees, which can cause these mount points to get duplicated in mountinfo. Bedrock Linux uses this extensively to function. Here is a section of my mountinfo showing mounts that correspond to cpu:

150 120 0:34 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:50 - cgroup cgroup rw,cpu,cpuacct
154 124 0:34 / /bedrock/strata/arch/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:50 - cgroup cgroup rw,cpu,cpuacct
153 123 0:34 / /bedrock/strata/fallback/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:50 - cgroup cgroup rw,cpu,cpuacct
152 122 0:34 / /bedrock/strata/gentoo/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:50 - cgroup cgroup rw,cpu,cpuacct
151 121 0:34 / /bedrock/strata/kde/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:50 - cgroup cgroup rw,cpu,cpuacct

However, my /proc/self/cgroup does not contain duplicated entries:

10:perf_event:/
9:pids:/user.slice/user-1000.slice/session-3.scope
8:freezer:/
7:memory:/user.slice
6:devices:/user.slice
5:cpuset:/
4:cpu,cpuacct:/user.slice
3:blkio:/user.slice
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1000.slice/session-3.scope

Because of this, when docker started, the getCgroupMounts function would return:

[{/sys/fs/cgroup/systemd / [systemd]} {/bedrock/strata/fallback/sys/fs/cgroup/systemd / [systemd]} {/bedrock/strata/gentoo/sys/fs/cgroup/systemd / [systemd]} {/bedrock/strata/kde/sys/fs/cgroup/systemd / [systemd]} {/sys/fs/cgroup/memory / [memory]} {/bedrock/strata/fallback/sys/fs/cgroup/memory / [memory]} {/bedrock/strata/gentoo/sys/fs/cgroup/memory / [memory]} {/bedrock/strata/kde/sys/fs/cgroup/memory / [memory]} {/sys/fs/cgroup/blkio / [blkio]} {/bedrock/strata/fallback/sys/fs/cgroup/blkio / [blkio]} {/bedrock/strata/gentoo/sys/fs/cgroup/blkio / [blkio]} {/bedrock/strata/kde/sys/fs/cgroup/blkio / [blkio]}]

instead of

[{/sys/fs/cgroup/systemd / [systemd]} {/sys/fs/cgroup/net_cls,net_prio / [net_cls net_prio]} {/sys/fs/cgroup/blkio / [blkio]} {/sys/fs/cgroup/cpu,cpuacct / [cpu cpuacct]} {/sys/fs/cgroup/cpuset / [cpuset]} {/sys/fs/cgroup/devices / [devices]} {/sys/fs/cgroup/memory / [memory]} {/sys/fs/cgroup/freezer / [freezer]} {/sys/fs/cgroup/pids / [pids]} {/sys/fs/cgroup/perf_event / [perf_event]}]

which lead to docker failing to start.

This PR tries to adjust this behavior so that when "all" is false, it will skip any duplicate subsystems, and only return the first one found.

Please let me know if anything can be improved, or if anyone thinks of a better way this can be implemented.

@@ -93,6 +93,211 @@ const systemdMountinfo = `115 83 0:32 / / rw,relatime - aufs none rw,si=c0bd3d3,
136 117 0:12 /1 /dev/console rw,nosuid,noexec,relatime - devpts none rw,gid=5,mode=620,ptmxmode=000
84 115 0:40 / /tmp rw,relatime - tmpfs none rw`

const bedrockMountinfo = `17 22 0:17 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost my entire mountinfo currently, I can trim it down if it's too large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to trim out all the uninteresting entries as this was realy big., let me know if I should leave any more in

@jgkamat jgkamat force-pushed the bedrock branch 2 times, most recently from 12acb08 to 04b1fdf Compare June 13, 2018 22:35
dqminh and others added 2 commits June 24, 2018 00:00
When there are complicated mount setups, there can be multiple mount
points which have the subsystem we are looking for. Instead of
counting the mountpoints, tick off subsystems until we have found them
all.

Without the 'all' flag, ignore duplicate subsystems after the first.

Signed-off-by: Daniel Dao <[email protected]>
Add a mountinfo from a bedrock linux system with 4 strata, and include
it for tests

Signed-off-by: Jay Kamat <[email protected]>
Signed-off-by: Daniel Dao <[email protected]>
@dqminh
Copy link
Contributor

dqminh commented Jun 23, 2018

Hi @jgkamat, thanks for the patch !

I think we can simplify this a bit more, not add an additional map to track seen subsystems and just use the original subsystem map we pass as argument instead. Something like dqminh@5ee0648 ?

@jgkamat
Copy link
Contributor Author

jgkamat commented Jun 24, 2018 via email

@jgkamat
Copy link
Contributor Author

jgkamat commented Aug 1, 2018

Hi @dqminh, I've just committed the changes I described in my commit above (based on the changed you linked). Please let me know if you think anything needs to be improved!

@jgkamat
Copy link
Contributor Author

jgkamat commented Sep 10, 2018

@crosbymichael I noticed you modified chunks of this file, although it was several years ago. If you have some free time, could you give this a review? I'd like to get this fixed eventually as it causes docker to crash on my system.

(ping @dqminh in case he's free for another round of review).

@crosbymichael
Copy link
Member

Thanks, looking

@crosbymichael
Copy link
Member

crosbymichael commented Sep 17, 2018

LGTM

Approved with PullApprove

@cyphar
Copy link
Member

cyphar commented Sep 19, 2018

LGTM. Hopefully nothing else breaks. 😉

Approved with PullApprove

@cyphar cyphar merged commit a2faaa1 into opencontainers:master Sep 19, 2018
cyphar added a commit that referenced this pull request Sep 19, 2018
  Fix duplicate entries and missing entries in getCgroupMountsHelper
  Add test for testing cgroup mounts on bedrock linux
  Stop relying on number of subsystems for cgroups

LGTMs: @crosbymichael @cyphar
Closes #1817
@jgkamat jgkamat deleted the bedrock branch September 20, 2018 16:08
mrosecrance added a commit to cloudfoundry/garden-integration-tests that referenced this pull request Nov 14, 2018
Bumping containerd requires bumping runc as well. When doing so we adopt
PR opencontainers/runc#1817 which would change
the way cgroup mounts table looks like from containers' point of view.
Additionally we introduce checks for `hugetbl` and `pids` cgroups
This change also makes tests use /proc/self/mountinfo instead of
/proc/mounts

[#161840949]

Co-authored-by: Danail Branekov <[email protected]>
mrosecrance added a commit to cloudfoundry/garden-integration-tests that referenced this pull request Nov 14, 2018
Bumping containerd requires bumping runc as well. When doing so we adopt
PR opencontainers/runc#1817 which would change
the way cgroup mounts table looks like from containers' point of view.
This change also makes tests use /proc/self/mountinfo instead of
/proc/mounts

[#161840949]

Co-authored-by: Danail Branekov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants