Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run create/run/exec: refuse bad cgroup #3223

Merged
merged 3 commits into from
Nov 8, 2021

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Sep 23, 2021

This is the final part of changes from #3131, depending on (already merged) #3214, #3215, #3216, #3217.

I. runc exec: refuse paused container

This bugged me a few times during runc development. A new container is
run, and suddenly runc init is stuck, 🎶 and nothing ever happens, and
I wonder... 🎶

Figuring out that the cause of it is (pre-created) frozen cgroup is not
very obvious (to say at least), and neither Ctrl-C nor Ctrl-Z work.

The fix is to add a check that refuses to exec in a paused container.
A (very bad) alternative to that would be to thaw the cgroup.

Implement the fix, add a test case.

Update: it has been noted that exec in a paused container might be
a legitimate use case. Added --ignore-paused flag, documented, tested.

II. runc run/create: warn about non-empty cgroup

Currently runc allows multiple containers to share the same cgroup (for
example, by having the same cgroupPath in config.json). While such
shared configuration might be OK, there are some issues:

  • When each container has its own resource limits, the order of
    containers start determines whose limits will be effectively applied.

  • When one of containers is paused, all others are paused, too.

  • When a container is paused, any attempt to do runc create/run/exec
    end up with runc init stuck inside a frozen cgroup.

  • When a systemd cgroup manager is used, this becomes even worse -- such
    as, stop (or even failed start) of any container results in
    "stopTransientUnit" command being sent to systemd, and so (depending
    on unit properties) other containers can receive SIGTERM, be killed
    after a timeout etc.

All this may lead to various hard-to-debug situations in production
(runc init stuck, cgroup removal error, wrong resource limits, container
init not working as zombie reaper, etc).

One obvious solution is to refuse a non-empty cgroup when starting a new
container. This would be a breaking change though, so let's make it in
steps, with the first step is issue a warning and a deprecated notice
about a non-empty cgroup.

Later (in runc 1.2) we will replace this warning with an error.

III. runc run: refuse cgroup if frozen

Sometimes a container cgroup already exists but is frozen.
When this happens, runc init hangs, and it's not clear what is going on.

Refuse to run in a frozen cgroup; add a test case.

Proposed changelog entry

Bugfixes:
* runc run/start/exec now refuses a frozen cgroup (paused container in case of exec) (#3132, #3223)
* runc run/start now warns if a new container cgroup is non-empty or frozen;
  this warning will become an error in runc 1.2 (#3132, #3223)

Fixes: #3132

@kolyshkin kolyshkin added this to the 1.1.0 milestone Sep 23, 2021
@kolyshkin kolyshkin changed the title Refuse bad cgroup run create/run/exec: refuse bad cgroup Sep 23, 2021
@AkihiroSuda
Copy link
Member

runc run: refuse non-empty cgroup

  • This commit may potentially break some downstream projects?
  • Can we update the runtime spec too?

@kolyshkin
Copy link
Contributor Author

may potentially break some downstream projects?

That's right, although I'm not sure which ones. Sharing a cgroup between a few containers is almost always a bad idea, since the code universally assumes 1:1 container to cgroup mapping.

Can we update the runtime spec too?

Good idea! PTAL opencontainers/runtime-spec#1125

@kolyshkin kolyshkin mentioned this pull request Sep 27, 2021
@cyphar
Copy link
Member

cyphar commented Sep 28, 2021

This commit may potentially break some downstream projects?

This was my worry too. I know we've discussed this before, but there was a time when there were Docker PRs to add support for sharing cgroups between containers (showing that some people apparently wanted this feature) but it seems those features were never merged. IMHO I think there are much better ways of solving the problems that sharing a cgroup would solve (for instance, placing the container in a nested cgroup and controlling the group of containers through the common ancestor cgroup) which I would've expected folks to be using instead. Also this "feature" is so broken in various ways that I feel that if anyone was using it, we would've heard from them by now.

I guess we'll have to put it in -rc1 and see who screams. 😬

exec.go Outdated
Comment on lines 117 to 118
if status == libcontainer.Stopped || status == libcontainer.Paused {
return -1, fmt.Errorf("cannot exec in a %s container", status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the argument for blocking this, but I do wonder if it's really runc's job to say that trying to execute a program in a paused container won't work until you unpause it? I can imagine few hypothetical scenarios where you might want to exec a program inside a container in a frozen state then unfreeze it, though I don't know how widely used this feature is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from my own experience using runc. Execute runc exec and nothing happens. Do Ctrl-C and nothing happens. Ctrl-Z does not help either. You have no idea what's going on, and that's super confusing even for me.

I do understand though that runc is not end user tool, and the scenario with "runc exec" followed by "runc resume" might be legit.

Not sure what to do here. Replace the rejection with a warning? This might mess with the output that a user expects. Add something like --ignore-paused flag? Does not look good either.

Drop this check entirely?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Separate) check for .Paused which tells user to use "--ignore-paused" flag to force running sounds fine for me. Frozen invocations which do not accept Ctrl-C/-Z are really annoying even to experienced users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that bad solution, I guess (except that old runc will refuse this option, and a new one will require it for the frozen cgroup case).

Implemented, PTAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except that old runc will refuse this option, and a new one will require it for the frozen cgroup case

A solution to that would be to use an environment variable, rather than a CLI option.
Something like RUNC_EXEC_IGNORE_PAUSED, which, if set, makes runc work as before.

Not sure it's all worth it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a problem only in some kind of automated system which tries to run things in frozen containers with random runc versions. But I think that "--ignore-paused" is needed only(?) when debugging something manually, in which case CLI option is fine.

@kolyshkin kolyshkin force-pushed the refuse-bad-cgroup branch 3 times, most recently from 2e67957 to d04ed72 Compare October 1, 2021 01:20
@kolyshkin kolyshkin requested a review from cyphar October 1, 2021 17:15
@kolyshkin
Copy link
Contributor Author

Rebased to resolve multiple conflicts due to merged #3059; @opencontainers/runc-maintainers PTAL

thaJeztah
thaJeztah previously approved these changes Oct 8, 2021
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

I was trying to come up with possible scenarios where the "non-empty cgroup" could be a (valid? existing?) case, thinking of the docker run --pid=container:<foo> etc situations. But even in those cases, I don't think that's relevant. Just to be sure nothing horribly broke, I opened moby/moby#42914, which didn't explode, so I think we should be good to go.

If anyone can come up with scenarios, or something we overlooked, please comment though!

@@ -303,3 +303,47 @@ function setup() {
# check that the cgroups v2 path is the same for both processes
[[ "$run_cgroup" == "$exec_cgroup" ]]
}

@test "runc exec should refuse a paused container" {
if [[ "$ROOTLESS" -ne 0 ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, but it looks like we're mixing bash and posix approaches here ([[ foo == bar ]] / [ foo = bar ]); at a glance, not all of those require the bash variant; perhaps we should do a round of cleaning up in a follow-up

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring interactive shell (= Bash) in automation scripts does sound wrong...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's BATS, so it assumes Bash; https://github.com/sstephenson/bats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas there's a lot of such things in the tests. In here I am using the style that's already used elsewhere in this file.

thaJeztah
thaJeztah previously approved these changes Oct 11, 2021
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like last push was just a rebase?

still SGTM

@kolyshkin
Copy link
Contributor Author

looks like last push was just a rebase?

Yes, just rebasing to make sure everything is still peachy.

@AkihiroSuda
Copy link
Member

I'm still not sure whether we can safely have the runc run/create: refuse non-empty cgroup commit in v1.1.

Can we exclude the commit from this PR and revisit it after releasing v1.1?

I'm fine to deprecate run/create w/ non-empty cgroup in v1.1, though.

thaJeztah
thaJeztah previously approved these changes Oct 22, 2021
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

AkihiroSuda
AkihiroSuda previously approved these changes Oct 25, 2021
@AkihiroSuda
Copy link
Member

@cyphar LGTY?

@AkihiroSuda
Copy link
Member

Still LGTM, but could you rebase to retrigger CI with the current master

@kolyshkin kolyshkin dismissed stale reviews from AkihiroSuda and thaJeztah via e809955 October 29, 2021 21:52
@kolyshkin
Copy link
Contributor Author

Still LGTM, but could you rebase to retrigger CI with the current master

Done. @AkihiroSuda @thaJeztah @cyphar @mrunalp PTAL

AkihiroSuda
AkihiroSuda previously approved these changes Oct 30, 2021
thaJeztah
thaJeztah previously approved these changes Nov 1, 2021
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still LGTM. @cyphar ptal

@kolyshkin
Copy link
Contributor Author

#3261 is merged; closing/reopening to kick CI

@kolyshkin kolyshkin closed this Nov 4, 2021
@kolyshkin kolyshkin reopened this Nov 4, 2021
Currently, if a container is paused (i.e. its cgroup is frozen), runc exec
just hangs, and it is not obvious why.

Refuse to exec in a paused container. Add a test case.

In case runc exec in a paused container is a legit use case,
add --ignore-paused option to override the check. Document it,
add a test case.

Signed-off-by: Kir Kolyshkin <[email protected]>
Currently runc allows multiple containers to share the same cgroup (for
example, by having the same cgroupPath in config.json). While such
shared configuration might be OK, there are some issues:

 - When each container has its own resource limits, the order of
   containers start determines whose limits will be effectively applied.

 - When one of containers is paused, all others are paused, too.

 - When a container is paused, any attempt to do runc create/run/exec
   end up with runc init stuck inside a frozen cgroup.

 - When a systemd cgroup manager is used, this becomes even worse -- such
   as, stop (or even failed start) of any container results in
   "stopTransientUnit" command being sent to systemd, and so (depending on
   unit properties) other containers can receive SIGTERM, be killed after a
   timeout etc.

Any of the above may lead to various hard-to-debug situations in production
(runc init stuck, cgroup removal error, wrong resource limits, init not
reaping zombies etc.).

One obvious solution is to refuse a non-empty cgroup when starting a new
container. This would be a breaking change though, so let's make it in
steps, with the first step is issue a warning and a deprecated notice
about a non-empty cgroup.

Later (in runc 1.2) we will replace this warning with an error.

Signed-off-by: Kir Kolyshkin <[email protected]>
Sometimes a container cgroup already exists but is frozen.
When this happens, runc init hangs, and it's not clear what is going on.

Refuse to run in a frozen cgroup; add a test case.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin dismissed stale reviews from thaJeztah and AkihiroSuda via 68c2b6a November 4, 2021 03:36
@kolyshkin
Copy link
Contributor Author

OK looks like this had to be rebased on top of #3261. Done.

@AkihiroSuda @cyphar @thaJeztah @eero-t @mrunalp @opencontainers/runc-maintainers PTAL

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple containers in the same cgroup
5 participants