Thaw a paused container in cgroup v1 when it is forcely deleted. #1204

cyyzero · 2022-09-21T10:03:59Z

If force is not given, and cgroups is v1 and container is frozen, display error saying could not be killed as it was forzen. If force option is given, and cgroups is v1 and container is frozen, thaw it and send the kill signal. If cgroups is v2 , nothing special needs to be done.

Fix: #1129

Signed-off-by: Chen Yiyang [email protected]

If force is not given, and cgroups is v1 and container is frozen, display error saying could not be killed as it was forzen. If force option is given, and cgroups is v1 and container is frozen, thaw it and send the kill signal. If cgroups is v2 , nothing special needs to be done. Fix: youki-dev#1129 Signed-off-by: Chen Yiyang <[email protected]>

codecov-commenter · 2022-09-21T10:06:55Z

Codecov Report

Merging #1204 (2a711e9) into main (281c0a9) will decrease coverage by 0.52%.
The diff coverage is 0.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1204      +/-   ##
==========================================
- Coverage   69.29%   68.77%   -0.53%     
==========================================
  Files         118      119       +1     
  Lines       12474    12595     +121     
==========================================
+ Hits         8644     8662      +18     
- Misses       3830     3933     +103

YJDoc2 · 2022-09-29T07:17:35Z

Hey @cyyzero , I was going through implementation of runc of this, and one thing I noticed is that for single or multiple (https://github.com/opencontainers/runc/blob/main/libcontainer/container_linux.go#L370) , they first signal the processes, in the frozen state ( in fact they seem to freeze the cgroup before sending kill signal in case of all https://github.com/opencontainers/runc/blob/1c3b8dbaf440d16653d834b612258bbb28268730/libcontainer/init_linux.go#L530) then send kill signal, and then thaw the cgroup

In the PR you seem to thaw the cgroup before sending the kill signal. One of the issues with this is that between the time you get all processes from the cgroup and the time you send signal, there might be additional processes spawned as cgroup is not frozen. The runc's way seems to avoid it.

Also once you are done with the initial implementation that is in the current state, can I ask you to add --all option in the cmd interface? It is a pretty small change, and the value of it will be given as all argument to the kill. That will help us to pass one of the containerd test.

Thank you!

cyyzero · 2022-10-05T16:36:46Z

I might know why integration test TestContainerKillInitPidHost fails. I output command lines of each execution of youki:

runc --root /run/containerd/runc/testing --log /run/containerd-test/io.containerd.runtime.v2.task/testing/TestContainerKillInitPidHost/log.json --log-format json create --bundle /run/containerd-test/io.containerd.runtime.v2.task/testing/TestContainerKillInitPidHost --pid-file /run/containerd-test/io.containerd.runtime.v2.task/testing/TestContainerKillInitPidHost/init.pid TestContainerKillInitPidHost
runc --root /run/containerd/runc/testing --log /run/containerd-test/io.containerd.runtime.v2.task/testing/TestContainerKillInitPidHost/log.json --log-format json start TestContainerKillInitPidHost
runc --root /run/containerd/runc/testing --log /run/containerd-test/io.containerd.runtime.v2.task/testing/TestContainerKillInitPidHost/log.json --log-format json kill TestContainerKillInitPidHost 9
runc --root /run/containerd/runc/testing --log /run/containerd-test/io.containerd.runtime.v2.task/testing/TestContainerKillInitPidHost/log.json --log-format json kill --all TestContainerKillInitPidHost 9
runc --root /run/containerd/runc/testing --log /run/containerd-test/io.containerd.runtime.v2.task/testing/TestContainerKillInitPidHost/log.json --log-format json kill --all TestContainerKillInitPidHost 9
runc --root /run/containerd/runc/testing --log /run/containerd-test/io.containerd.runtime.v2.task/testing/TestContainerKillInitPidHost/log.json --log-format json delete TestContainerKillInitPidHost
runc --root /run/containerd/runc/testing --log /run/containerd-test/io.containerd.runtime.v2.task/testing/TestContainerKillInitPidHost/log.json --log-format json delete --force TestContainerKillInitPidHost

TestContainerKillInitPidHost runs the container without creating pid namespace, so the child process will not exit automatically after killing the init process, it is necessary to use kill -all.

https://github.com/containerd/containerd/blob/31f9d13f0cc3a7d60e009cb8e76ccb188d4a76d7/integration/client/container_linux_test.go#L1120

I haven't read the containerd or shim-v2 source code , so I'm not quite clear why the kill command is called multiple times. It seems the first kill has no options. In the previous implementation, the container status becomes stopped after kill, and subsequent kill --alls can't pass the can_kill check, so the child process could not be killed correctly.

In fact, in the runc source code, it doesn't do status check for --all option.
https://github.com/opencontainers/runc/blob/1102f3fc9d5208961f19d70618c3be251466ee3f/libcontainer/container_linux.go#L361-L371

cyyzero · 2022-10-05T16:39:02Z

The latest commit seems to be able to pass TestContainerKillInitPidHost and TestContainerKillInitKillsChildWhenNotHostPid.

cc @YJDoc2

YJDoc2 · 2022-10-06T05:27:31Z

Hey @cyyzero Thanks a lot for taking time and updating the PR! It is great that more integration tests are passing now!
I have been running into same issues with runc impl for making some other tests pass, where it does not exactly align with the "standard spec". However, as runc is quite famous and is used a lot, it has become a sort of "standard" itself 😅

I'll take a look at the code around weekend, and get back here. Thanks :)

YJDoc2

Hey, great work! I have left some comments, please take a look.

crates/libcontainer/src/container/container_kill.rs

YJDoc2 · 2022-10-11T10:15:19Z

crates/libcontainer/src/container/container_kill.rs

+            }
+        }
+        self.set_status(ContainerStatus::Stopped).save()?;
+        std::process::exit(0);


I'm not sure why we had exit here in the original implementation... libcontainers is a library and should not call exit unless extreme conditions. Ideally we should let users of the library decide what to do after calling this function. It would be a better idea to return Ok(()) from here. @Furisto if you get time, can you check and let know if there was any particular reasoning behind exiting from here?

I agree. return Ok(()) is better than exiting here.

Can you make that change, where we return the result instead of exiting? Thanks

@YJDoc2 Sure😄

crates/libcontainer/src/container/container_kill.rs

YJDoc2 · 2022-10-25T05:41:50Z

@cyyzero Apologies for the delay in my response. I have updated the comments, and if you can change the process exit to return result and add the --all option to the cli part (liboci-cli and the youki actually invoking the command), this PR is fine. Can you make that change and un-draft this PR, so we can approve and merge it? Thanks a lot!

@utam0k , as this does change the behavior of kill slightly, and adds an --all flag, should we bump the version of youki, liboci-cli and libcontainers?

Again, apologies to both for the delay on my side 😅

utam0k · 2022-10-25T12:19:04Z

@utam0k , as this does change the behavior of kill slightly, and adds an --all flag, should we bump the version of youki, liboci-cli and libcontainers?

I'm planning to release v0.0.4 soon if the contained test is passed with youki after merging this PR, so there is no problem. Thanks for your concern ;)

To pass containerd integration test TestContainerKillInitPidHost, we need to allow a container to be killed --all again when its status is stopped. Fix: youki-dev#1225 Signed-off-by: Chen Yiyang <[email protected]>

cyyzero · 2022-10-29T11:02:16Z

https://github.com/containers/youki/blob/6d05dd2f60198aaecb7c93d08c08d7db4fbc6600/crates/liboci-cli/src/kill.rs#L5-L11
@YJDoc2 Thanks for your review! It seems that the --all option has been added to the liboci-cli before. And I have changed some exit(0) to return Ok(()).

YJDoc2

lgtm 👍

YJDoc2 · 2022-10-31T05:29:53Z

Hey @cyyzero Thanks for your contribution and efforts ! :)

cyyzero marked this pull request as draft September 21, 2022 10:04

This was referenced Sep 29, 2022

Support for containerd #531

Closed

containerd: TestContainerKillInitPidHost #1225

Closed

YJDoc2 reviewed Oct 11, 2022

View reviewed changes

Allow kill --all even when container is stopped

2a711e9

To pass containerd integration test TestContainerKillInitPidHost, we need to allow a container to be killed --all again when its status is stopped. Fix: youki-dev#1225 Signed-off-by: Chen Yiyang <[email protected]>

cyyzero force-pushed the fix_kill branch from d4088a9 to 2a711e9 Compare October 29, 2022 10:54

cyyzero marked this pull request as ready for review October 29, 2022 11:03

utam0k requested a review from YJDoc2 October 29, 2022 12:50

YJDoc2 approved these changes Oct 31, 2022

View reviewed changes

YJDoc2 merged commit 0b90f8f into youki-dev:main Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thaw a paused container in cgroup v1 when it is forcely deleted. #1204

Thaw a paused container in cgroup v1 when it is forcely deleted. #1204

cyyzero commented Sep 21, 2022 •

edited by gitpod-io bot

Loading

codecov-commenter commented Sep 21, 2022 •

edited

Loading

YJDoc2 commented Sep 29, 2022

cyyzero commented Oct 5, 2022

cyyzero commented Oct 5, 2022

YJDoc2 commented Oct 6, 2022

YJDoc2 left a comment

YJDoc2 Oct 11, 2022

cyyzero Oct 15, 2022

YJDoc2 Oct 25, 2022

cyyzero Oct 26, 2022

YJDoc2 commented Oct 25, 2022

utam0k commented Oct 25, 2022

cyyzero commented Oct 29, 2022

YJDoc2 left a comment

YJDoc2 commented Oct 31, 2022

Thaw a paused container in cgroup v1 when it is forcely deleted. #1204

Thaw a paused container in cgroup v1 when it is forcely deleted. #1204

Conversation

cyyzero commented Sep 21, 2022 • edited by gitpod-io bot Loading

codecov-commenter commented Sep 21, 2022 • edited Loading

Codecov Report

YJDoc2 commented Sep 29, 2022

cyyzero commented Oct 5, 2022

cyyzero commented Oct 5, 2022

YJDoc2 commented Oct 6, 2022

YJDoc2 left a comment

Choose a reason for hiding this comment

YJDoc2 Oct 11, 2022

Choose a reason for hiding this comment

cyyzero Oct 15, 2022

Choose a reason for hiding this comment

YJDoc2 Oct 25, 2022

Choose a reason for hiding this comment

cyyzero Oct 26, 2022

Choose a reason for hiding this comment

YJDoc2 commented Oct 25, 2022

utam0k commented Oct 25, 2022

cyyzero commented Oct 29, 2022

YJDoc2 left a comment

Choose a reason for hiding this comment

YJDoc2 commented Oct 31, 2022

cyyzero commented Sep 21, 2022 •

edited by gitpod-io bot

Loading

codecov-commenter commented Sep 21, 2022 •

edited

Loading