Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.8] fix freeze, add SkipFreezeOnSet #10

Merged
merged 15 commits into from
Dec 21, 2021

Conversation

kolyshkin and others added 15 commits December 16, 2021 09:15
Historically, we never returned an error from failed startUnit
or stopUnit. The startUnit case was fixed by commit 3844789.

It is time to fix stopUnit, too. The reasons are:

1. Ignoring an error from stopUnit means an unexpected trouble down the
   road, for example a failure to create a container with the same name:

   > time="2021-05-07T19:51:27Z" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: Unit runc-test_busybox.scope already exists."

2. A somewhat short timeout of 1 second means the cgroup might
   actually be removed a few seconds later but we might have a
   race between removing the cgroup and creating another one
   with the same name, resulting in the same error as amove.

So, return an error if removal failed, and increase the timeout.

Now, modify the systemd cgroup v1 manager to not mask the error from
stopUnit (stopErr) with the subsequent one from cgroups.RemovePath,
as stopErr is most probably the reason why RemovePath failed.

Note that for v1 we do want to remove the paths even in case of a
failure from stopUnit, as some were not created by systemd.
There's no need to do that for v2, thanks to unified hierarchy,
so no changes there.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 33c9f8b)
Signed-off-by: Kir Kolyshkin <[email protected]>
Commit 108ee85 adds SkipDevices flag, which is used by kubernetes
to create cgroups for pods.

Unfortunately the above commit falls short, and systemd DevicePolicy and
DeviceAllow properties are still set, which requires kubernetes to set
"allow everything" rule.

This commit fixes this: if SkipDevices flag is set, we return
Device* properties to allow all devices.

Fixes: 108ee85
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 752e7a8)
Signed-off-by: Kir Kolyshkin <[email protected]>
The idea is to mimic what kubelet is doing, with minimum amount of code.

First, create a slice with SkipDevices=true. It should have access to
all devices.

Next, create a scope within the above slice, allowing access to /dev/full
only.

Check that within that scope we can only access /dev/full and not other
devices (such as /dev/null).

Repeat the test with SkipDevices=false, make sure we can not access any
devices (as they are disallowed by a parent cgroup). This is done only
to assess the test correctness.

NOTE that cgroup v1 and v2 behave differently for SkipDevices=false
case, and thus the check is different. Cgroup v1 returns EPERM on
writing to devices.allow, so cgroup manager's Set() fails, and we check
for a particular error from m.Set(). Cgroup v2 allows to create a child
cgroup, but denies access to any device (despite access being enabled)
-- so we check the error from the shell script running in that cgroup.
Again, this is only about SkipDevices=false case.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 0e16e7c)
Signed-off-by: Kir Kolyshkin <[email protected]>
1. The meaning of SkipDevices is what it is -- do not set any
   device-related options.

2. Reverts the part of commit 108ee85 which skipped the freeze
   when the SkipDevices is set. Apparently, the freeze is needed on
   update even if no Device* properties are being set.

3. Add "runc update" to "runc run [device cgroup deny]" test.

Fixes: 752e7a8
Fixes: 108ee85
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit bd8e070)
Signed-off-by: Kir Kolyshkin <[email protected]>
If a control group is frozen, all its descendants will report FROZEN
in freezer.state cgroup file.

OTOH cgroup v2 cgroup.freeze is not reporting the cgroup as frozen
unless it is frozen directly (i.e. not via an ancestor).

Fix the discrepancy between v1 and v2 drivers behavior by
looking into freezer.self_freezing cgroup file, which, according
to kernel documentation, will show 1 iff the cgroup was frozen directly.

Co-authored-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 294c486)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit c1a5b3e,
 modified to use fscommon.ReadFile due to missing commit 8f1b4d4)
Signed-off-by: Kir Kolyshkin <[email protected]>
m.Freeze method changes m.cgroups.Resources.Freezer field, which should
not be done while we're temporarily freezing the cgroup in Set. If this
field is changed, and r == m.cgroups.Resources (as it often happens),
this results in inability to freeze the container using Set().

To fix, add and use a method which does not change r.Freezer field.

A test case for the bug will be added separately.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 67cfd3d)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 22b2ff0)
Signed-off-by: Kir Kolyshkin <[email protected]>
The t.Name() usage in libcontainer/integration prevented subtests
to be used, since in such case it returns a string containing "/",
and thus it can't be used to name a container.

Fix this by replacing slashes with underscores where appropriate.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit af1688a)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 01cd4b5)
Signed-off-by: Kir Kolyshkin <[email protected]>
Introduce freezeBeforeSet, which contains the logic of figuring out
whether we need to freeze/thaw around setting systemd unit properties.

In particular, if SkipDevices is set, and the current unit properties
allow all devices, there is no need to freeze and thaw, as systemd
won't write any device rules in this case.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit f2db879,
 minor conflict in include() due to missing commit
 b60e2ed)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 2fc2e3d)
Signed-off-by: Kir Kolyshkin <[email protected]>
TestPodSkipDevicesUpdate checks that updating a pod having SkipDevices: true
does not result in spurious "permission denied" errors in a container
running under the pod. The test is somewhat similar in nature to the
@test "update devices [minimal transition rules]" in tests/integration,
but uses a pod.

This tests the validity of freezeBeforeSet in v1.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit a711026)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 4efb7a6)
Signed-off-by: Kir Kolyshkin <[email protected]>
Run device update tests on cgroup v2, and add a test verifying that we
don't allow access to devices when we don't intend to.

Signed-off-by: Odin Ugedal <[email protected]>
(cherry picked from commit d41a273)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 298a310)
Signed-off-by: Kir Kolyshkin <[email protected]>
Since device updates in cgroup v2 are atomic for systemd, there is no
need to freeze the processes before running the updates.

Signed-off-by: Odin Ugedal <[email protected]>
(cherry picked from commit f33be7c, trivial conflict
 due to missing commit b60e2ed)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 04edd79)
Signed-off-by: Kir Kolyshkin <[email protected]>
This fixes the behavior intended to avoid freezing containers/control
groups without it being necessary. This is important for end users of
libcontainer who rely on the behavior of no freeze.

The previous implementation would always get error trying to get
DevicePolicy from the Unit via dbus, since the Unit interface doesn't
contain DevicePolicy.

Signed-off-by: Odin Ugedal <[email protected]>
(cherry picked from commit 4104367)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 4ce440f)
Signed-off-by: Kir Kolyshkin <[email protected]>
Add a test for freezeBeforeSet, checking various scenarios including
those that were failing before the fix in the previous commit.

[v2: add more cases, add a check before creating a unit.]

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit fec49f2)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 1850dc1)
Signed-off-by: Kir Kolyshkin <[email protected]>
The runc update CLI is not able to modify devices, so let's set SkipDevices
(so that a cgroup controller won't try to update devices cgroup).

This helps use cases when some other device management (NVIDIA GPUs)
applies its configuration on top of what runc does.

Make sure we do not save SkipDevices into state.json.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit bf7492e)
Signed-off-by: Kir Kolyshkin <[email protected]>
This is helpful to kubernetes in cases it knows for sure that the freeze
is not required (since it created the systemd unit with no device
restrictions).

As the code is trivial, no tests are required.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 9a095e4)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 8ec5762)
Signed-off-by: Kir Kolyshkin <[email protected]>
@openshift-ci
Copy link

openshift-ci bot commented Dec 16, 2021

@kolyshkin: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

[4.8] fix freeze, add SkipFreezeOnSet

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kolyshkin
Copy link
Author

(as requested by @ingvagabund)

@ingvagabund
Copy link
Member

Looks good based on openshift/origin#26696. The success rate of failing sig-storage tests went down based on the latest CI results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants