run create/run/exec: refuse bad cgroup #3223

kolyshkin · 2021-09-23T23:27:22Z

This is the final part of changes from #3131, depending on (already merged) #3214, #3215, #3216, #3217.

I. runc exec: refuse paused container

This bugged me a few times during runc development. A new container is
run, and suddenly runc init is stuck, 🎶 and nothing ever happens, and
I wonder... 🎶

Figuring out that the cause of it is (pre-created) frozen cgroup is not
very obvious (to say at least), and neither Ctrl-C nor Ctrl-Z work.

The fix is to add a check that refuses to exec in a paused container.
A (very bad) alternative to that would be to thaw the cgroup.

Implement the fix, add a test case.

Update: it has been noted that exec in a paused container might be
a legitimate use case. Added --ignore-paused flag, documented, tested.

II. runc run/create: warn about non-empty cgroup

Currently runc allows multiple containers to share the same cgroup (for
example, by having the same cgroupPath in config.json). While such
shared configuration might be OK, there are some issues:

When each container has its own resource limits, the order of
containers start determines whose limits will be effectively applied.
When one of containers is paused, all others are paused, too.
When a container is paused, any attempt to do runc create/run/exec
end up with runc init stuck inside a frozen cgroup.
When a systemd cgroup manager is used, this becomes even worse -- such
as, stop (or even failed start) of any container results in
"stopTransientUnit" command being sent to systemd, and so (depending
on unit properties) other containers can receive SIGTERM, be killed
after a timeout etc.

All this may lead to various hard-to-debug situations in production
(runc init stuck, cgroup removal error, wrong resource limits, container
init not working as zombie reaper, etc).

One obvious solution is to refuse a non-empty cgroup when starting a new
container. This would be a breaking change though, so let's make it in
steps, with the first step is issue a warning and a deprecated notice
about a non-empty cgroup.

Later (in runc 1.2) we will replace this warning with an error.

III. runc run: refuse cgroup if frozen

Sometimes a container cgroup already exists but is frozen.
When this happens, runc init hangs, and it's not clear what is going on.

Refuse to run in a frozen cgroup; add a test case.

Proposed changelog entry

Bugfixes:
* runc run/start/exec now refuses a frozen cgroup (paused container in case of exec) (#3132, #3223)
* runc run/start now warns if a new container cgroup is non-empty or frozen;
  this warning will become an error in runc 1.2 (#3132, #3223)

Fixes: #3132

AkihiroSuda · 2021-09-27T04:48:08Z

runc run: refuse non-empty cgroup

This commit may potentially break some downstream projects?
Can we update the runtime spec too?

kolyshkin · 2021-09-27T21:56:28Z

may potentially break some downstream projects?

That's right, although I'm not sure which ones. Sharing a cgroup between a few containers is almost always a bad idea, since the code universally assumes 1:1 container to cgroup mapping.

Can we update the runtime spec too?

Good idea! PTAL opencontainers/runtime-spec#1125

libcontainer/factory_linux.go

cyphar · 2021-09-28T05:11:49Z

This commit may potentially break some downstream projects?

This was my worry too. I know we've discussed this before, but there was a time when there were Docker PRs to add support for sharing cgroups between containers (showing that some people apparently wanted this feature) but it seems those features were never merged. IMHO I think there are much better ways of solving the problems that sharing a cgroup would solve (for instance, placing the container in a nested cgroup and controlling the group of containers through the common ancestor cgroup) which I would've expected folks to be using instead. Also this "feature" is so broken in various ways that I feel that if anyone was using it, we would've heard from them by now.

I guess we'll have to put it in -rc1 and see who screams. 😬

cyphar · 2021-09-28T06:17:57Z

exec.go

+	if status == libcontainer.Stopped || status == libcontainer.Paused {
+		return -1, fmt.Errorf("cannot exec in a %s container", status)


I understand the argument for blocking this, but I do wonder if it's really runc's job to say that trying to execute a program in a paused container won't work until you unpause it? I can imagine few hypothetical scenarios where you might want to exec a program inside a container in a frozen state then unfreeze it, though I don't know how widely used this feature is.

This comes from my own experience using runc. Execute runc exec and nothing happens. Do Ctrl-C and nothing happens. Ctrl-Z does not help either. You have no idea what's going on, and that's super confusing even for me.

I do understand though that runc is not end user tool, and the scenario with "runc exec" followed by "runc resume" might be legit.

Not sure what to do here. Replace the rejection with a warning? This might mess with the output that a user expects. Add something like --ignore-paused flag? Does not look good either.

Drop this check entirely?

(Separate) check for .Paused which tells user to use "--ignore-paused" flag to force running sounds fine for me. Frozen invocations which do not accept Ctrl-C/-Z are really annoying even to experienced users.

Not that bad solution, I guess (except that old runc will refuse this option, and a new one will require it for the frozen cgroup case).

Implemented, PTAL

except that old runc will refuse this option, and a new one will require it for the frozen cgroup case

A solution to that would be to use an environment variable, rather than a CLI option.
Something like RUNC_EXEC_IGNORE_PAUSED, which, if set, makes runc work as before.

Not sure it's all worth it.

That would be a problem only in some kind of automated system which tries to run things in frozen containers with random runc versions. But I think that "--ignore-paused" is needed only(?) when debugging something manually, in which case CLI option is fine.

libcontainer/factory_linux.go

kolyshkin · 2021-10-07T01:21:36Z

Rebased to resolve multiple conflicts due to merged #3059; @opencontainers/runc-maintainers PTAL

thaJeztah

SGTM

I was trying to come up with possible scenarios where the "non-empty cgroup" could be a (valid? existing?) case, thinking of the docker run --pid=container:<foo> etc situations. But even in those cases, I don't think that's relevant. Just to be sure nothing horribly broke, I opened moby/moby#42914, which didn't explode, so I think we should be good to go.

If anyone can come up with scenarios, or something we overlooked, please comment though!

thaJeztah · 2021-10-07T14:19:07Z

tests/integration/cgroups.bats

@@ -303,3 +303,47 @@ function setup() {
 	# check that the cgroups v2 path is the same for both processes
 	[[ "$run_cgroup" == "$exec_cgroup" ]]
 }
+
+@test "runc exec should refuse a paused container" {
+	if [[ "$ROOTLESS" -ne 0 ]]; then


minor nit, but it looks like we're mixing bash and posix approaches here ([[ foo == bar ]] / [ foo = bar ]); at a glance, not all of those require the bash variant; perhaps we should do a round of cleaning up in a follow-up

Requiring interactive shell (= Bash) in automation scripts does sound wrong...

Well, it's BATS, so it assumes Bash; https://github.com/sstephenson/bats

Alas there's a lot of such things in the tests. In here I am using the style that's already used elsewhere in this file.

thaJeztah

looks like last push was just a rebase?

still SGTM

kolyshkin · 2021-10-12T01:24:49Z

looks like last push was just a rebase?

Yes, just rebasing to make sure everything is still peachy.

AkihiroSuda · 2021-10-19T09:00:26Z

I'm still not sure whether we can safely have the runc run/create: refuse non-empty cgroup commit in v1.1.

Can we exclude the commit from this PR and revisit it after releasing v1.1?

I'm fine to deprecate run/create w/ non-empty cgroup in v1.1, though.

thaJeztah

LGTM

AkihiroSuda · 2021-10-25T12:19:25Z

@cyphar LGTY?

AkihiroSuda · 2021-10-29T16:57:18Z

Still LGTM, but could you rebase to retrigger CI with the current master

kolyshkin · 2021-10-29T21:52:25Z

Still LGTM, but could you rebase to retrigger CI with the current master

Done. @AkihiroSuda @thaJeztah @cyphar @mrunalp PTAL

thaJeztah

still LGTM. @cyphar ptal

kolyshkin · 2021-11-04T03:34:12Z

#3261 is merged; closing/reopening to kick CI

Currently, if a container is paused (i.e. its cgroup is frozen), runc exec just hangs, and it is not obvious why. Refuse to exec in a paused container. Add a test case. In case runc exec in a paused container is a legit use case, add --ignore-paused option to override the check. Document it, add a test case. Signed-off-by: Kir Kolyshkin <[email protected]>

Currently runc allows multiple containers to share the same cgroup (for example, by having the same cgroupPath in config.json). While such shared configuration might be OK, there are some issues: - When each container has its own resource limits, the order of containers start determines whose limits will be effectively applied. - When one of containers is paused, all others are paused, too. - When a container is paused, any attempt to do runc create/run/exec end up with runc init stuck inside a frozen cgroup. - When a systemd cgroup manager is used, this becomes even worse -- such as, stop (or even failed start) of any container results in "stopTransientUnit" command being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc. Any of the above may lead to various hard-to-debug situations in production (runc init stuck, cgroup removal error, wrong resource limits, init not reaping zombies etc.). One obvious solution is to refuse a non-empty cgroup when starting a new container. This would be a breaking change though, so let's make it in steps, with the first step is issue a warning and a deprecated notice about a non-empty cgroup. Later (in runc 1.2) we will replace this warning with an error. Signed-off-by: Kir Kolyshkin <[email protected]>

Sometimes a container cgroup already exists but is frozen. When this happens, runc init hangs, and it's not clear what is going on. Refuse to run in a frozen cgroup; add a test case. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin · 2021-11-04T03:36:45Z

OK looks like this had to be rebased on top of #3261. Done.

@AkihiroSuda @cyphar @thaJeztah @eero-t @mrunalp @opencontainers/runc-maintainers PTAL

thaJeztah

LGTM

kolyshkin requested review from AkihiroSuda, cyphar and thaJeztah September 23, 2021 23:29

kolyshkin added area/cgroupv1 area/cgroupv2 impact/changelog labels Sep 23, 2021

kolyshkin added this to the 1.1.0 milestone Sep 23, 2021

kolyshkin mentioned this pull request Sep 23, 2021

runc run/create: refuse non-empty cgroup; runc exec: refuse frozen cgroup #3131

Closed

5 tasks

kolyshkin changed the title ~~Refuse bad cgroup~~ run create/run/exec: refuse bad cgroup Sep 23, 2021

kolyshkin mentioned this pull request Sep 27, 2021

Release 1.1.0 #3195

Closed

kolyshkin commented Sep 27, 2021

View reviewed changes

libcontainer/factory_linux.go Outdated Show resolved Hide resolved

kolyshkin force-pushed the refuse-bad-cgroup branch from eac6e28 to 544da45 Compare September 27, 2021 23:50

cyphar reviewed Sep 28, 2021

View reviewed changes

libcontainer/factory_linux.go Outdated Show resolved Hide resolved

kolyshkin force-pushed the refuse-bad-cgroup branch 3 times, most recently from 2e67957 to d04ed72 Compare October 1, 2021 01:20

kolyshkin requested a review from cyphar October 1, 2021 17:15

kolyshkin force-pushed the refuse-bad-cgroup branch from d04ed72 to 642fac6 Compare October 7, 2021 01:20

thaJeztah mentioned this pull request Oct 7, 2021

[WIP] test runc changes around cgroup handling moby/moby#42914

Closed

thaJeztah previously approved these changes Oct 8, 2021

View reviewed changes

thaJeztah dismissed their stale review via 642fac6 October 8, 2021 19:18

kolyshkin force-pushed the refuse-bad-cgroup branch from 642fac6 to 383bd68 Compare October 11, 2021 15:32

thaJeztah previously approved these changes Oct 11, 2021

View reviewed changes

kolyshkin requested a review from thaJeztah October 21, 2021 04:41

kolyshkin force-pushed the refuse-bad-cgroup branch from 3bf4e44 to 5d0ab3c Compare October 21, 2021 15:22

thaJeztah previously approved these changes Oct 22, 2021

View reviewed changes

AkihiroSuda previously approved these changes Oct 25, 2021

View reviewed changes

kolyshkin dismissed stale reviews from AkihiroSuda and thaJeztah via e809955 October 29, 2021 21:52

kolyshkin force-pushed the refuse-bad-cgroup branch from 5d0ab3c to e809955 Compare October 29, 2021 21:52

AkihiroSuda previously approved these changes Oct 30, 2021

View reviewed changes

thaJeztah previously approved these changes Nov 1, 2021

View reviewed changes

kolyshkin closed this Nov 4, 2021

kolyshkin reopened this Nov 4, 2021

kolyshkin added 3 commits November 3, 2021 20:36

runc run: refuse a frozen cgroup

68c2b6a

Sometimes a container cgroup already exists but is frozen. When this happens, runc init hangs, and it's not clear what is going on. Refuse to run in a frozen cgroup; add a test case. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin dismissed stale reviews from thaJeztah and AkihiroSuda via 68c2b6a November 4, 2021 03:36

kolyshkin force-pushed the refuse-bad-cgroup branch from e809955 to 68c2b6a Compare November 4, 2021 03:36

kolyshkin requested review from thaJeztah and AkihiroSuda November 5, 2021 05:43

thaJeztah approved these changes Nov 5, 2021

View reviewed changes

AkihiroSuda approved these changes Nov 8, 2021

View reviewed changes

AkihiroSuda merged commit cbd725e into opencontainers:master Nov 8, 2021

kolyshkin mentioned this pull request Nov 5, 2023

Fix runc kill and runc delete for containers with no init and no private PID namespace #4102

Merged

kolyshkin mentioned this pull request Nov 28, 2023

Proposal: just signal all processes inside the container #2037

Closed

kolyshkin mentioned this pull request Oct 29, 2024

runc hang on init when containerd set up #4481

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run create/run/exec: refuse bad cgroup #3223

run create/run/exec: refuse bad cgroup #3223

kolyshkin commented Sep 23, 2021 •

edited

Loading

AkihiroSuda commented Sep 27, 2021

kolyshkin commented Sep 27, 2021

cyphar commented Sep 28, 2021 •

edited

Loading

cyphar Sep 28, 2021

kolyshkin Sep 29, 2021

eero-t Sep 30, 2021

kolyshkin Oct 1, 2021

kolyshkin Oct 1, 2021

eero-t Oct 4, 2021

kolyshkin commented Oct 7, 2021

thaJeztah left a comment

thaJeztah Oct 7, 2021

eero-t Oct 8, 2021

thaJeztah Oct 8, 2021

kolyshkin Oct 11, 2021

thaJeztah left a comment

kolyshkin commented Oct 12, 2021

AkihiroSuda commented Oct 19, 2021

thaJeztah left a comment

AkihiroSuda commented Oct 25, 2021

AkihiroSuda commented Oct 29, 2021

kolyshkin commented Oct 29, 2021

thaJeztah left a comment

kolyshkin commented Nov 4, 2021

kolyshkin commented Nov 4, 2021

thaJeztah left a comment

		if status == libcontainer.Stopped \|\| status == libcontainer.Paused {
		return -1, fmt.Errorf("cannot exec in a %s container", status)

run create/run/exec: refuse bad cgroup #3223

run create/run/exec: refuse bad cgroup #3223

Conversation

kolyshkin commented Sep 23, 2021 • edited Loading

I. runc exec: refuse paused container

II. runc run/create: warn about non-empty cgroup

III. runc run: refuse cgroup if frozen

Proposed changelog entry

AkihiroSuda commented Sep 27, 2021

kolyshkin commented Sep 27, 2021

cyphar commented Sep 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolyshkin commented Oct 7, 2021

thaJeztah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thaJeztah left a comment

Choose a reason for hiding this comment

kolyshkin commented Oct 12, 2021

AkihiroSuda commented Oct 19, 2021

thaJeztah left a comment

Choose a reason for hiding this comment

AkihiroSuda commented Oct 25, 2021

AkihiroSuda commented Oct 29, 2021

kolyshkin commented Oct 29, 2021

thaJeztah left a comment

Choose a reason for hiding this comment

kolyshkin commented Nov 4, 2021

kolyshkin commented Nov 4, 2021

thaJeztah left a comment

Choose a reason for hiding this comment

kolyshkin commented Sep 23, 2021 •

edited

Loading

cyphar commented Sep 28, 2021 •

edited

Loading