-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libct/cgroups: add SkipDevices to Resources #2490
Conversation
@cyphar PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor things, but looks good otherwise.
The kubelet uses libct/cgroups code to set up cgroups. It creates a parent cgroup (kubepods) to put the containers into. The problem (for cgroupv2 that uses eBPF for device configuration) is the hard requirement to have devices cgroup configured results in leaking an eBPF program upon every kubelet restart. program. If kubelet is restarted 64+ times, the cgroup can't be configured anymore. Work around this by adding a SkipDevices flag to Resources. A check was added so that if SkipDevices is set, such a "container" can't be started (to make sure it is only used for non-containers). Signed-off-by: Kir Kolyshkin <[email protected]>
Updated to address @cyphar review comments. |
One question about the justification for this change:
That's a bug in runc (#2366 is tracking that), and is something we need to fix anyway (we should be either doing Is that the only reason for this change (looking back at #2474, it looks like I didn't catch this when discussing the need for this)? I'd be happy to have it anyway (since the requirement for the devices cgroup doesn't make sense in certain circumstances), but I just want to make sure that we're aware that this doesn't fix the underlying issue (which will be at least somewhat more involved). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from my point about this patch not fixing the motivating cgroupv2 bug, LGTM.
I do agree this looks like a workaround for #2366. From the other side, it does not make sense to require devices to be configured in cgroups that are not used to run containers, but rather to hold other cgroups (not sure what's the term here, parent cgroups? meta cgroups?). That is also the reason why I am thinking about renaming |
The kubelet uses libct/cgroups code to set up cgroups. It creates a
parent cgroup (kubepods) to put the containers into.
The problem (for cgroupv2 that uses eBPF for device configuration) is
the hard requirement to have devices cgroup configured results in
leaking an eBPF program upon every kubelet restart. program. If kubelet
is restarted 64+ times, the cgroup can't be configured anymore.
Work around this by adding a SkipDevices flag to Resources.
A check was added so that if SkipDevices is set, such a "container"
can't be started (to make sure it is only used for non-containers).
Fixes #2474