-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makefile: disable kernel memory accounting on RHEL 7 3.10 kernels #2594
Conversation
@kolyshkin ptal |
TIL that kmemcg is now deprecated, neither the kernel containers ML nor the cgroup ML was on the Cc list for that change. |
Thanks for Akihiro, who noticed that. I guess it has been somewhat problematic overall, but I was surprised as well. Given its deprecation, perhaps a decision should be made before runc 1.0.0 wether or not it should be enabled by default (or supported at all; it could be (silently) "ignored" to not break compatibility) |
This is I think we're good, meaning whoever packages runc for RHEL7 should set |
I learned it recently, too. As much as I like it to be a separate limit, I don't think the majority of users are able to figure out their kmem limits. Currently (at least in cgroupv2) kmem is still accounted and [indirectly] limited by cgroup.memory settings, it's just there are no more direct controls. |
Back in the day in OpenVZ we used to have 20+ limits (https://wiki.openvz.org/Proc/user_beancounters) and users were confused. Ended up having just two primary knobs: ram and swap (https://wiki.openvz.org/VSwap). |
We will need to remove it from the runtime-spec too... |
This CI failure (on CentOS 7) needs to be fixed if we want to proceed:
|
244e5da
to
5ec32f2
Compare
Added a |
I'm not sure what to do about this one. RHEL7 is old, most probably no one is using runc installed from source on it, and whoever is packaging runc for RHEL7 is already taking care of adding |
Yeah, it's a bit tricky. Agreed that packagers should take this into account, OTOH, it's nice to have Even more so if more recent kernels are used (and users may not be aware of it being deprecated);
|
The only problem I see it's now impossible to build runc on RHEL7 with kmem. But maybe it's for good, and I'm pretty sure kmem accounting won't be fixed there. @thaJeztah can you please re-base to bring in recent CI? |
I think it doesn't hurt to include this, especially since we have other (actual code) workarounds for RHEL7 (the first thing that comes to mind is the |
I took a liberty to
It was not working before: [kir@centos70-ne runc]$ make
go build "-mod=vendor" "-buildmode=pie" -tags "seccomp selinux apparmor" -ldflags "-X main.gitCommit="5ec32f266c90ecb8da36282a63a6340edf718a18" -X main.version=1.0.0-rc92+dev " -o runc .
[kir@centos70-ne runc]$ uname -a
Linux centos70-ne 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[kir@centos70-ne runc]$ if uname -r | grep -q '^3\.10\.0.*\.el7\.'; then echo "yes"; fi
yes It is working now: [kir@centos70-ne runc]$ make
go build -trimpath "-mod=vendor" "-buildmode=pie" -tags "seccomp nokmem" -ldflags "-X main.gitCommit="fd7f880166e3b57393e42a905d953ead07683866" -X main.version=1.0.0-rc93+dev " -o runc . The changes are diff --git a/Makefile b/Makefile
index c283b9a6..5ff44f35 100644
--- a/Makefile
+++ b/Makefile
@@ -25,16 +25,17 @@ ifeq ($(shell $(GO) env GOOS),linux)
endif
endif
endif
+
+# Disable kmem if building on RHEL7 (kernel 3.10.0 el7)
+ifneq ($(shell uname -r | grep '^3\.10\.0.*\.el7\.'),)
+ BUILDTAGS += nokmem
+endif
+
GO_BUILD := $(GO) build -trimpath $(MOD_VENDOR) $(GO_BUILDMODE) $(EXTRA_FLAGS) -tags "$(BUILDTAGS)" \
-ldflags "-X main.gitCommit=$(COMMIT) -X main.version=$(VERSION) $(EXTRA_LDFLAGS)"
GO_BUILD_STATIC := CGO_ENABLED=1 $(GO) build -trimpath $(MOD_VENDOR) $(EXTRA_FLAGS) -tags "$(BUILDTAGS) netgo osusergo" \
-ldflags "-w -extldflags -static -X main.gitCommit=$(COMMIT) -X main.version=$(VERSION) $(EXTRA_LDFLAGS)"
-ifeq ($(shell if uname -r | grep -q '^3\.10\.0.*\.el7\.'; then echo "yes"; fi),yes)
- # Disable kmem accounting/limiting on RHEL7 kernels (3.10.0 el7)
- BUILDTAGS += nokmem
-endif
-
.DEFAULT: runc
runc: |
Now once it's working...
..more tests need to be disabled |
Added a commit to disable libct/int kmem-related tests if nokmem build tag is set, to fix failures from the previous comment. I have also changed @thaJeztah PTAL |
This roots in a difference between make's |
Finally, I think we need to amend the README to say nokmem is auto-set when compiling on 3.0.0-el7 kernel. |
Thanks Kir, sorry, I forgot about this one 🤗 |
Make is fun -- depending on whether the variable is coming from the environment or CLI arguments, the effect is different: [kir@centos70-ne runc]$ make
go build -trimpath "-mod=vendor" "-buildmode=pie" -tags "seccomp nokmem" -ldflags "-X main.gitCommit="0b2de072c311e8095e3810a9efb69171d06974c2" -X main.version=1.0.0-rc93+dev " -o runc .
[kir@centos70-ne runc]$ BUILDTAGS="" make
go build -trimpath "-mod=vendor" "-buildmode=pie" -tags " nokmem" -ldflags "-X main.gitCommit="0b2de072c311e8095e3810a9efb69171d06974c2" -X main.version=1.0.0-rc93+dev " -o runc .
[kir@centos70-ne runc]$ make BUILDTAGS=""
go build -trimpath "-mod=vendor" "-buildmode=pie" -tags "" -ldflags "-X main.gitCommit="0b2de072c311e8095e3810a9efb69171d06974c2" -X main.version=1.0.0-rc93+dev " -o runc . |
Seeing this for the first time (on CentOS 7):
Update this is related to #2774 / #2791 and not related to this change. Yet it seems very strange. |
Added a commit with README.md changes. |
Yet another test to fix :)
|
Signed-off-by: Kir Kolyshkin <[email protected]>
This makes those tests that require kmem to skip on RHEL7 kernels. Signed-off-by: Kir Kolyshkin <[email protected]>
Kernel-memory accounting is known to be broken on RHEL 7 kernels. This patch automatically disables kernel-memory accounting when building on a 3.10 kernel. The original behavior is preserved if a custom BUILDTAGS is passed; make BUILDTAGS='seccomp selinux' ... Signed-off-by: Sebastiaan van Stijn <[email protected]> Signed-off-by: Kir Kolyshkin <[email protected]>
1. Provide a more realistic example of using make BUILDTAGS= 2. Fix nokmem dependencies cell (<none> was not rendered). 3. Add information about auto-setting nokmem tag for EL7 kernel. Signed-off-by: Kir Kolyshkin <[email protected]>
Rebased, added a skip to int tests. |
CI failure
is a well-known flake (see #2756). Restarted. |
e.g. to disable `seccomp` and enable `nokmem`, run: | ||
|
||
```bash | ||
make BUILDTAGS='seccomp' | ||
make BUILDTAGS="nokmem" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should encourage keeping seccomp enabled; should we change to (e.g.);
e.g. to disable `seccomp` and enable `nokmem`, run: | |
```bash | |
make BUILDTAGS='seccomp' | |
make BUILDTAGS="nokmem" | |
e.g. to disable kernel memory limiting, but keep `seccomp`, run: | |
```bash | |
make BUILDTAGS="nokmem seccomp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't matter much to me, it's just a usage example, not a recommendation about how to compile runc, so can be anything (and I sometimes find myself disabling seccomp so I can compile runc without having to install libseccomp-devel).
But yes, maybe yours is a better/fuller example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree it's just an example. People love to copy/paste though, and hate to read 😓
They were already OPTIONAL and are now also NOT RECOMMENDED (opencontainers/runtime-spec#1093) |
Kernel-memory accounting is known to be broken on RHEL 7 kernels. This patch automatically disables kernel-memory accounting when building on a 3.10 kernel.
The original behavior is preserved if a custom BUILDTAGS is passed;
I recalled we had a similar patch in Moby (moby/moby@8972aa9), and thought I'd might as well upstream this.
Also note that moby/moby#41254 (comment)
So perhaps either add a similar check for kernel 5.4+, or disabling kmem by default, and making it an opt-on could make sense (@AkihiroSuda wdyt?)