Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix clone(2) with double fork #217

Merged
merged 5 commits into from
Aug 24, 2021
Merged

Fix clone(2) with double fork #217

merged 5 commits into from
Aug 24, 2021

Conversation

yihuaf
Copy link
Collaborator

@yihuaf yihuaf commented Aug 22, 2021

Fix #185

This PR replaces clone(2) with double fork to create the container process. Double fork was the original implementation, and we wisely thought that we could improve it by using a single clone call. Turns out jokes on us that we can't use a single clone call. With clone, if we enter into a new pid namespace, we are OK. However, if we need to set_ns into an existing pid namespace, we are out of luck. So the right path forward needs to be double fork. For a more detailed discussion, see #185 why it doesn't work.

Also, turns out, this is a bigger change than I intended. So I apologize up front for the big change.

@yihuaf yihuaf force-pushed the yihuaf/185 branch 2 times, most recently from ad939ee to 6a483bd Compare August 22, 2021 22:39
@yihuaf yihuaf marked this pull request as draft August 22, 2021 22:54
@yihuaf yihuaf force-pushed the yihuaf/185 branch 2 times, most recently from 5e4497e to d243eef Compare August 23, 2021 02:48
@yihuaf yihuaf marked this pull request as ready for review August 23, 2021 03:25
@yihuaf yihuaf requested a review from utam0k August 23, 2021 03:25
.github/workflows/main.yml Outdated Show resolved Hide resolved
@utam0k
Copy link
Member

utam0k commented Aug 23, 2021

@yihuaf
Could you give me an example command that I can use to check that this has been fixed correctly?

@utam0k
Copy link
Member

utam0k commented Aug 23, 2021

Maybe, but if we don't use rootless, do we need to do double-fork?

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 23, 2021

@yihuaf
Could you give me an example command that I can use to check that this has been fixed correctly?

Try

docker run -it --rm --runtime youki --name youki busybox

and then

docker exec -it youki sh

In the shell using docker exec, check /proc/self and /proc/2 are set up correctly. Before, because exec didn't enter the correct pid namespace, /proc/<pid> was not configured correctly. In the main shell by docker run, you should also be able to see /proc/2 for the docker exec shell.
Pid 1 -> docker run shell
Pid 2 -> docker exec shell

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 23, 2021

Maybe, but if we don't use rootless, do we need to do double-fork?

There are 2 cases where we don't need to double fork.

  1. If we don't use rootless and/or don't enter into an existing user namespace, we can enter into Pid namespace on the Youki main process and then fork the init process.
  2. If we don't need to enter an existing Pid namespace through set_ns, we can use clone to enter user and pid namespace together and create the init process directly.

@yihuaf yihuaf requested a review from utam0k August 23, 2021 21:12
@utam0k
Copy link
Member

utam0k commented Aug 24, 2021

@yihuaf
Could you give me an example command that I can use to check that this has been fixed correctly?

Try

docker run -it --rm --runtime youki --name youki busybox

and then

docker exec -it youki sh

In the shell using docker exec, check /proc/self and /proc/2 are set up correctly. Before, because exec didn't enter the correct pid namespace, /proc/<pid> was not configured correctly. In the main shell by docker run, you should also be able to see /proc/2 for the docker exec shell.
Pid 1 -> docker run shell
Pid 2 -> docker exec shell

I was able to confirm the change. Thank you very much.

main

$ docker exec -it youki sh
/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 sh
    4 root      0:00 ps

this pr

$ docker exec -it youki sh
/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 sh
    2 root      0:00 sh
    3 root      0:00 ps
/ #

@utam0k
Copy link
Member

utam0k commented Aug 24, 2021

There are 2 cases where we don't need to double fork.

  1. If we don't use rootless and/or don't enter into an existing user namespace, we can enter into Pid namespace on the Youki main process and then fork the init process.
  2. If we don't need to enter an existing Pid namespace through set_ns, we can use clone to enter user and pid namespace together and create the init process directly.

Does this mean that if uid_mapping and gid_mapping are not needed, double-fork is not needed?
I thought it might be a good idea to improve the case where double-fork is not needed. However, once this PR is done, it's good.

@utam0k
Copy link
Member

utam0k commented Aug 24, 2021

I was hoping these PRs would pass with this PR fix, but it looks like not.
https://github.com/containers/youki/blob/9acc899239ee0b9d58ea1a81b69884d13ba561fe/integration_test.sh#L37-L40

@utam0k
Copy link
Member

utam0k commented Aug 24, 2021

I will not point out any details in this PR. We will not point out details in this PR, because this PR is very important and will be very hard to merge because of the large number of changes. Those corrections will be made in the PR little by little. I am looking forward to your reviews.
If you respond to some of my comments, I'll try to merge them.
Finally, it would be helpful if you could leave a brief summary of this issue on issue or on this PR.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 24, 2021

I was hoping these PRs would pass with this PR fix, but it looks like not.
https://github.com/containers/youki/blob/9acc899239ee0b9d58ea1a81b69884d13ba561fe/integration_test.sh#L37-L40

I will take a look at these tests right after this. #221 filed.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 24, 2021

There are 2 cases where we don't need to double fork.

  1. If we don't use rootless and/or don't enter into an existing user namespace, we can enter into Pid namespace on the Youki main process and then fork the init process.
  2. If we don't need to enter an existing Pid namespace through set_ns, we can use clone to enter user and pid namespace together and create the init process directly.

Does this mean that if uid_mapping and gid_mapping are not needed, double-fork is not needed?
I thought it might be a good idea to improve the case where double-fork is not needed. However, once this PR is done, it's good.

Actually no. Without uid/gid mapping, we would still require double fork unless we are in one of two cases mentioned above. I think we can potentially improve this by differentiating which case to use, but it would take some effort to get all the corner cases right. The double fork implementation would cover all the cases and should be good enough for the moment.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 24, 2021

I will not point out any details in this PR. We will not point out details in this PR, because this PR is very important and will be very hard to merge because of the large number of changes. Those corrections will be made in the PR little by little. I am looking forward to your reviews.
If you respond to some of my comments, I'll try to merge them.
Finally, it would be helpful if you could leave a brief summary of this issue on issue or on this PR.

That's probably a good idea, and I am happy to continue to fix all the details. I will summarize the issue here again, and #185 contains a history of how we got here.

Before this PR, the implementation was to use clone to create the container init process, joining new user and new pid namespaces together. However, in the case of docker exec, the container process will join existing namespaces user and pid namespaces. The set_ns to pid namespaces requires an extra fork to enter into the correct pid namespace. As a result, the container process didn't join the right pid namespaces. Many of the pid related operations were failing, such as looking at /proc/self and /proc/<pid>. The solution in this PR is to always double fork, one to enter user namespace and another fork to enter into pid namespace. We can't enter user and pid namespace together because entering into pid namespace requires CAP_SYSADMIN so we have to enter into user namespace first.

Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superb!

@utam0k utam0k merged commit 99e5216 into youki-dev:main Aug 24, 2021
@yihuaf yihuaf deleted the yihuaf/185 branch August 24, 2021 04:29
@utam0k utam0k mentioned this pull request Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

youki exec did not join the correct pid namespace
2 participants