Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't panic when /sys/fs/cgroup is missing for rootless #2634

Merged
merged 1 commit into from
Oct 29, 2020

Conversation

coryb
Copy link
Contributor

@coryb coryb commented Oct 5, 2020

Fixes issue #2573

Downgrade the panic to a debug message when /sys/fs/cgroup is missing. This issue is preventing runc exec from working within buildkit when running rootless (where /sys is not mounted at all).

Steps to reproduce:

$ mkdir ~/mycontainer
$ cd ~/mycontainer
$ mkdir rootfs
$ docker export $(docker create busybox) | tar -C rootfs -xvf -
$ runc spec --rootless
<edit config.json to remove /sys mount>
$ runc --root /tmp/runc run mycontainerid
# 

now in separate shell:

cd ~/mycontainer
runc --root /tmp/runc exec --tty mycontainerid /bin/sh
# 

A potentially better solution is to change the signature from:

func IsCgroup2UnifiedMode() bool

to:

func IsCgroup2UnifiedMode() (bool, error)

so that we can manage the error more directly in rootless mode, but the diff for this change would be much larger, since the function is used quite a lot. I can update is this direction is preferred.

@@ -34,7 +34,9 @@ func IsCgroup2UnifiedMode() bool {
isUnifiedOnce.Do(func() {
var st unix.Statfs_t
if err := unix.Statfs(unifiedMountpoint, &st); err != nil {
panic("cannot statfs cgroup root")
logrus.WithError(err).Debug("cannot statfs cgroup root")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It should be Info not Debug I think as it's pretty major.
  2. Amend the message to say "assuming cgroup v1".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Maybe add a TODO item to change the default to cgroupv2 somewhere in 2024 :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is partially why I am leaning towards changing the signature to return an error, otherwise users will get this message 100% of the time exec'ing to a container via buildkit with rootless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does something like this seem more reasonable to you?

		err := unix.Statfs(unifiedMountpoint, &st)
		if err != nil && os.IsNotExist(err) {
			// /sys/fs/cgroup not found, likely rootless
			logrus.WithError(err).Debugf("%s missing, assuming cgroup v1", unifiedMountpoint)
			isUnified = false
			return
		} else if err != nil {
			panic(fmt.Sprintf("cannot statfs cgroup root: %s", err))
		}

This would preserve the panic behavior for all error scenarios except where the /sys/fs/cgroup path simply does not exist (which is a normal situation under rootless)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also better to panic when os.IsNotExist && !system.RunningInUserNS()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the check to only ignore (w/ debug msg) errors for NotExists while RunningInUserNS

@kolyshkin
Copy link
Contributor

@AkihiroSuda PTAL

AkihiroSuda
AkihiroSuda previously approved these changes Oct 6, 2020
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@coryb coryb force-pushed the issue-2573 branch 2 times, most recently from 92a85a0 to 4a75408 Compare October 6, 2020 06:48
AkihiroSuda
AkihiroSuda previously approved these changes Oct 6, 2020
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of nits

@coryb
Copy link
Contributor Author

coryb commented Oct 8, 2020

The test failure (timeout) seems unrelated to my change, and Travis is not allowing me to retry the build.

@coryb
Copy link
Contributor Author

coryb commented Oct 15, 2020

I squashed and forced push, tests passed after the push this time. I think I have addressed all the feedback, so hopefully we can get this merged in.

@AkihiroSuda
Copy link
Member

ping @kolyshkin @cyphar @mrunalp

@AkihiroSuda AkihiroSuda added this to the 1.0.0-rc93 milestone Oct 21, 2020
@AkihiroSuda AkihiroSuda requested a review from kolyshkin October 23, 2020 04:56
@AkihiroSuda
Copy link
Member

@kolyshkin @cyphar @mrunalp PTAL

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants