-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4.8] fix freeze, add SkipFreezeOnSet #10
Conversation
Historically, we never returned an error from failed startUnit or stopUnit. The startUnit case was fixed by commit 3844789. It is time to fix stopUnit, too. The reasons are: 1. Ignoring an error from stopUnit means an unexpected trouble down the road, for example a failure to create a container with the same name: > time="2021-05-07T19:51:27Z" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: Unit runc-test_busybox.scope already exists." 2. A somewhat short timeout of 1 second means the cgroup might actually be removed a few seconds later but we might have a race between removing the cgroup and creating another one with the same name, resulting in the same error as amove. So, return an error if removal failed, and increase the timeout. Now, modify the systemd cgroup v1 manager to not mask the error from stopUnit (stopErr) with the subsequent one from cgroups.RemovePath, as stopErr is most probably the reason why RemovePath failed. Note that for v1 we do want to remove the paths even in case of a failure from stopUnit, as some were not created by systemd. There's no need to do that for v2, thanks to unified hierarchy, so no changes there. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 33c9f8b) Signed-off-by: Kir Kolyshkin <[email protected]>
Commit 108ee85 adds SkipDevices flag, which is used by kubernetes to create cgroups for pods. Unfortunately the above commit falls short, and systemd DevicePolicy and DeviceAllow properties are still set, which requires kubernetes to set "allow everything" rule. This commit fixes this: if SkipDevices flag is set, we return Device* properties to allow all devices. Fixes: 108ee85 Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 752e7a8) Signed-off-by: Kir Kolyshkin <[email protected]>
The idea is to mimic what kubelet is doing, with minimum amount of code. First, create a slice with SkipDevices=true. It should have access to all devices. Next, create a scope within the above slice, allowing access to /dev/full only. Check that within that scope we can only access /dev/full and not other devices (such as /dev/null). Repeat the test with SkipDevices=false, make sure we can not access any devices (as they are disallowed by a parent cgroup). This is done only to assess the test correctness. NOTE that cgroup v1 and v2 behave differently for SkipDevices=false case, and thus the check is different. Cgroup v1 returns EPERM on writing to devices.allow, so cgroup manager's Set() fails, and we check for a particular error from m.Set(). Cgroup v2 allows to create a child cgroup, but denies access to any device (despite access being enabled) -- so we check the error from the shell script running in that cgroup. Again, this is only about SkipDevices=false case. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 0e16e7c) Signed-off-by: Kir Kolyshkin <[email protected]>
1. The meaning of SkipDevices is what it is -- do not set any device-related options. 2. Reverts the part of commit 108ee85 which skipped the freeze when the SkipDevices is set. Apparently, the freeze is needed on update even if no Device* properties are being set. 3. Add "runc update" to "runc run [device cgroup deny]" test. Fixes: 752e7a8 Fixes: 108ee85 Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit bd8e070) Signed-off-by: Kir Kolyshkin <[email protected]>
If a control group is frozen, all its descendants will report FROZEN in freezer.state cgroup file. OTOH cgroup v2 cgroup.freeze is not reporting the cgroup as frozen unless it is frozen directly (i.e. not via an ancestor). Fix the discrepancy between v1 and v2 drivers behavior by looking into freezer.self_freezing cgroup file, which, according to kernel documentation, will show 1 iff the cgroup was frozen directly. Co-authored-by: Kir Kolyshkin <[email protected]> Signed-off-by: Odin Ugedal <[email protected]> Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 294c486) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit c1a5b3e, modified to use fscommon.ReadFile due to missing commit 8f1b4d4) Signed-off-by: Kir Kolyshkin <[email protected]>
m.Freeze method changes m.cgroups.Resources.Freezer field, which should not be done while we're temporarily freezing the cgroup in Set. If this field is changed, and r == m.cgroups.Resources (as it often happens), this results in inability to freeze the container using Set(). To fix, add and use a method which does not change r.Freezer field. A test case for the bug will be added separately. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 67cfd3d) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 22b2ff0) Signed-off-by: Kir Kolyshkin <[email protected]>
The t.Name() usage in libcontainer/integration prevented subtests to be used, since in such case it returns a string containing "/", and thus it can't be used to name a container. Fix this by replacing slashes with underscores where appropriate. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit af1688a) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 01cd4b5) Signed-off-by: Kir Kolyshkin <[email protected]>
Introduce freezeBeforeSet, which contains the logic of figuring out whether we need to freeze/thaw around setting systemd unit properties. In particular, if SkipDevices is set, and the current unit properties allow all devices, there is no need to freeze and thaw, as systemd won't write any device rules in this case. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit f2db879, minor conflict in include() due to missing commit b60e2ed) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 2fc2e3d) Signed-off-by: Kir Kolyshkin <[email protected]>
TestPodSkipDevicesUpdate checks that updating a pod having SkipDevices: true does not result in spurious "permission denied" errors in a container running under the pod. The test is somewhat similar in nature to the @test "update devices [minimal transition rules]" in tests/integration, but uses a pod. This tests the validity of freezeBeforeSet in v1. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit a711026) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 4efb7a6) Signed-off-by: Kir Kolyshkin <[email protected]>
Run device update tests on cgroup v2, and add a test verifying that we don't allow access to devices when we don't intend to. Signed-off-by: Odin Ugedal <[email protected]> (cherry picked from commit d41a273) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 298a310) Signed-off-by: Kir Kolyshkin <[email protected]>
Since device updates in cgroup v2 are atomic for systemd, there is no need to freeze the processes before running the updates. Signed-off-by: Odin Ugedal <[email protected]> (cherry picked from commit f33be7c, trivial conflict due to missing commit b60e2ed) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 04edd79) Signed-off-by: Kir Kolyshkin <[email protected]>
This fixes the behavior intended to avoid freezing containers/control groups without it being necessary. This is important for end users of libcontainer who rely on the behavior of no freeze. The previous implementation would always get error trying to get DevicePolicy from the Unit via dbus, since the Unit interface doesn't contain DevicePolicy. Signed-off-by: Odin Ugedal <[email protected]> (cherry picked from commit 4104367) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 4ce440f) Signed-off-by: Kir Kolyshkin <[email protected]>
Add a test for freezeBeforeSet, checking various scenarios including those that were failing before the fix in the previous commit. [v2: add more cases, add a check before creating a unit.] Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit fec49f2) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 1850dc1) Signed-off-by: Kir Kolyshkin <[email protected]>
The runc update CLI is not able to modify devices, so let's set SkipDevices (so that a cgroup controller won't try to update devices cgroup). This helps use cases when some other device management (NVIDIA GPUs) applies its configuration on top of what runc does. Make sure we do not save SkipDevices into state.json. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit bf7492e) Signed-off-by: Kir Kolyshkin <[email protected]>
This is helpful to kubernetes in cases it knows for sure that the freeze is not required (since it created the systemd unit with no device restrictions). As the code is trivial, no tests are required. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 9a095e4) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 8ec5762) Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
(as requested by @ingvagabund) |
Looks good based on openshift/origin#26696. The success rate of failing sig-storage tests went down based on the latest CI results. |
This is a cherry-pick of the following upstream PRs to openshift-4.8 branch: