-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run create/run/exec: refuse bad cgroup #3223
run create/run/exec: refuse bad cgroup #3223
Conversation
|
That's right, although I'm not sure which ones. Sharing a cgroup between a few containers is almost always a bad idea, since the code universally assumes 1:1 container to cgroup mapping.
Good idea! PTAL opencontainers/runtime-spec#1125 |
eac6e28
to
544da45
Compare
This was my worry too. I know we've discussed this before, but there was a time when there were Docker PRs to add support for sharing cgroups between containers (showing that some people apparently wanted this feature) but it seems those features were never merged. IMHO I think there are much better ways of solving the problems that sharing a cgroup would solve (for instance, placing the container in a nested cgroup and controlling the group of containers through the common ancestor cgroup) which I would've expected folks to be using instead. Also this "feature" is so broken in various ways that I feel that if anyone was using it, we would've heard from them by now. I guess we'll have to put it in -rc1 and see who screams. 😬 |
exec.go
Outdated
if status == libcontainer.Stopped || status == libcontainer.Paused { | ||
return -1, fmt.Errorf("cannot exec in a %s container", status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the argument for blocking this, but I do wonder if it's really runc's job to say that trying to execute a program in a paused container won't work until you unpause it? I can imagine few hypothetical scenarios where you might want to exec a program inside a container in a frozen state then unfreeze it, though I don't know how widely used this feature is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from my own experience using runc. Execute runc exec
and nothing happens. Do Ctrl-C and nothing happens. Ctrl-Z does not help either. You have no idea what's going on, and that's super confusing even for me.
I do understand though that runc is not end user tool, and the scenario with "runc exec" followed by "runc resume" might be legit.
Not sure what to do here. Replace the rejection with a warning? This might mess with the output that a user expects. Add something like --ignore-paused
flag? Does not look good either.
Drop this check entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Separate) check for .Paused which tells user to use "--ignore-paused" flag to force running sounds fine for me. Frozen invocations which do not accept Ctrl-C/-Z are really annoying even to experienced users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that bad solution, I guess (except that old runc will refuse this option, and a new one will require it for the frozen cgroup case).
Implemented, PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except that old runc will refuse this option, and a new one will require it for the frozen cgroup case
A solution to that would be to use an environment variable, rather than a CLI option.
Something like RUNC_EXEC_IGNORE_PAUSED
, which, if set, makes runc work as before.
Not sure it's all worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a problem only in some kind of automated system which tries to run things in frozen containers with random runc versions. But I think that "--ignore-paused" is needed only(?) when debugging something manually, in which case CLI option is fine.
2e67957
to
d04ed72
Compare
d04ed72
to
642fac6
Compare
Rebased to resolve multiple conflicts due to merged #3059; @opencontainers/runc-maintainers PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
I was trying to come up with possible scenarios where the "non-empty cgroup" could be a (valid? existing?) case, thinking of the docker run --pid=container:<foo>
etc situations. But even in those cases, I don't think that's relevant. Just to be sure nothing horribly broke, I opened moby/moby#42914, which didn't explode, so I think we should be good to go.
If anyone can come up with scenarios, or something we overlooked, please comment though!
@@ -303,3 +303,47 @@ function setup() { | |||
# check that the cgroups v2 path is the same for both processes | |||
[[ "$run_cgroup" == "$exec_cgroup" ]] | |||
} | |||
|
|||
@test "runc exec should refuse a paused container" { | |||
if [[ "$ROOTLESS" -ne 0 ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, but it looks like we're mixing bash and posix approaches here ([[ foo == bar ]]
/ [ foo = bar ]
); at a glance, not all of those require the bash variant; perhaps we should do a round of cleaning up in a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring interactive shell (= Bash) in automation scripts does sound wrong...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's BATS, so it assumes Bash; https://github.com/sstephenson/bats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas there's a lot of such things in the tests. In here I am using the style that's already used elsewhere in this file.
642fac6
to
383bd68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like last push was just a rebase?
still SGTM
Yes, just rebasing to make sure everything is still peachy. |
I'm still not sure whether we can safely have the Can we exclude the commit from this PR and revisit it after releasing v1.1? I'm fine to deprecate |
3bf4e44
to
5d0ab3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@cyphar LGTY? |
Still LGTM, but could you rebase to retrigger CI with the current master |
5d0ab3c
to
e809955
Compare
Done. @AkihiroSuda @thaJeztah @cyphar @mrunalp PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still LGTM. @cyphar ptal
#3261 is merged; closing/reopening to kick CI |
Currently, if a container is paused (i.e. its cgroup is frozen), runc exec just hangs, and it is not obvious why. Refuse to exec in a paused container. Add a test case. In case runc exec in a paused container is a legit use case, add --ignore-paused option to override the check. Document it, add a test case. Signed-off-by: Kir Kolyshkin <[email protected]>
Currently runc allows multiple containers to share the same cgroup (for example, by having the same cgroupPath in config.json). While such shared configuration might be OK, there are some issues: - When each container has its own resource limits, the order of containers start determines whose limits will be effectively applied. - When one of containers is paused, all others are paused, too. - When a container is paused, any attempt to do runc create/run/exec end up with runc init stuck inside a frozen cgroup. - When a systemd cgroup manager is used, this becomes even worse -- such as, stop (or even failed start) of any container results in "stopTransientUnit" command being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc. Any of the above may lead to various hard-to-debug situations in production (runc init stuck, cgroup removal error, wrong resource limits, init not reaping zombies etc.). One obvious solution is to refuse a non-empty cgroup when starting a new container. This would be a breaking change though, so let's make it in steps, with the first step is issue a warning and a deprecated notice about a non-empty cgroup. Later (in runc 1.2) we will replace this warning with an error. Signed-off-by: Kir Kolyshkin <[email protected]>
Sometimes a container cgroup already exists but is frozen. When this happens, runc init hangs, and it's not clear what is going on. Refuse to run in a frozen cgroup; add a test case. Signed-off-by: Kir Kolyshkin <[email protected]>
e809955
to
68c2b6a
Compare
OK looks like this had to be rebased on top of #3261. Done. @AkihiroSuda @cyphar @thaJeztah @eero-t @mrunalp @opencontainers/runc-maintainers PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is the final part of changes from #3131, depending on (already merged) #3214, #3215, #3216, #3217.
I. runc exec: refuse paused container
This bugged me a few times during runc development. A new container is
run, and suddenly runc init is stuck, 🎶 and nothing ever happens, and
I wonder... 🎶
Figuring out that the cause of it is (pre-created) frozen cgroup is not
very obvious (to say at least), and neither Ctrl-C nor Ctrl-Z work.
The fix is to add a check that refuses to exec in a paused container.
A (very bad) alternative to that would be to thaw the cgroup.
Implement the fix, add a test case.
Update: it has been noted that exec in a paused container might be
a legitimate use case. Added
--ignore-paused
flag, documented, tested.II. runc run/create: warn about non-empty cgroup
Currently runc allows multiple containers to share the same cgroup (for
example, by having the same
cgroupPath
in config.json). While suchshared configuration might be OK, there are some issues:
When each container has its own resource limits, the order of
containers start determines whose limits will be effectively applied.
When one of containers is paused, all others are paused, too.
When a container is paused, any attempt to do runc create/run/exec
end up with runc init stuck inside a frozen cgroup.
When a systemd cgroup manager is used, this becomes even worse -- such
as, stop (or even failed start) of any container results in
"stopTransientUnit" command being sent to systemd, and so (depending
on unit properties) other containers can receive SIGTERM, be killed
after a timeout etc.
All this may lead to various hard-to-debug situations in production
(runc init stuck, cgroup removal error, wrong resource limits, container
init not working as zombie reaper, etc).
One obvious solution is to refuse a non-empty cgroup when starting a new
container. This would be a breaking change though, so let's make it in
steps, with the first step is issue a warning and a deprecated notice
about a non-empty cgroup.
Later (in runc 1.2) we will replace this warning with an error.
III. runc run: refuse cgroup if frozen
Sometimes a container cgroup already exists but is frozen.
When this happens, runc init hangs, and it's not clear what is going on.
Refuse to run in a frozen cgroup; add a test case.
Proposed changelog entry
Fixes: #3132