-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A container can join namespaces of another container #105
Conversation
@vishvananda This is related to #101 |
namespaces[i]); | ||
fds[i] = open(ns, O_RDONLY); | ||
if (fds[i] == -1) { | ||
savedErr = errno; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can call pr_perror and then close descritores. In this case you will not need to save errno in a temporary variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, i think we also discussed before about this but i forgot to move the patch over. Since we exit anyway, it's ok to not having to close the fds IIRC, right ?
Tests fixed now. They were failing because of #109 |
@@ -779,3 +817,40 @@ func (c *linuxContainer) currentState() (*State, error) { | |||
} | |||
return state, nil | |||
} | |||
|
|||
// orderNamespacePaths sorts that namespace paths into a list of paths that we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/that/the
09636e3
to
5359301
Compare
667fb58
to
652a368
Compare
@avagin added your suggestion on setns all custom namespaces first, then clone new namespaces after . This is currently done only when we have custom user namespace for now ( we probably can also refactor further to unify both custom and default i.e., also write uid/gid mapping in C instead of Go ). Can you take a look if the current approach looks right ? @vishvananda can you help testing this patch with your custom network namespace ? It's different from your approach but i think it's simpler. |
// otherwise do it inside nsexec by passing the clone flags because we dont | ||
// have to perform any additional setup when start a new process. | ||
if c.config.Namespaces.PathOf(configs.NEWUSER) == "" { | ||
cmd.SysProcAttr.Cloneflags = cloneFlags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need to create a new user namespace and attach to a precreated pid namespace, you need to enter into the pid namespace and only then create a user namespace, so the user namespace should be created from nsexec.c, should not it?
It is confusing to be using cloneflags in the go process for new userns which puts it at the beginning of the list, but then setns into it at the end of the list if you specify a path. I suggest pulling in my patches that explicitly set the uid_map and gid_map in our code instead of relying on the go exec to do it. Alternatively, we could always fork and set the uid_map and gid_map in the c code, using a pipe to sync between the parent and child as is suggested here: https://lwn.net/Articles/532593/ |
Finally having internet at home, so i can have some cycle to work on this and other issues again 🐳 Given that we want to create a new process with clone flags in C always (meant that we cant depend on uid/gid_map file creation in Go, i think it's much more convenient to pass all the data we need into C layer, do work there and pass back to Go when we are finally done. I'm thinking of:
Thoughts @mrunalp @avagin @vishvananda ? |
I am not sure that we need json here, maybe a handmade binary format would be enough. I think it will be obvious when you show code. |
@avagin hmm yes a binary format also works. JSON is just an implementation detail since it's easy to generate from Go, and easy enough to parse from C with a library. Not requiring external dependency is a big win though. |
+1 Yeah, it will be good if we can avoid a dependency. |
For now a hack for the 1.6.1-userns branch (modifying vendored code) to enable user namespace join on exec. This will be more thorougly corrected (for other use cases) in a PR for opencontainers/runc#105. Docker-DCO-1.1-Signed-off-by: Phil Estes <[email protected]>
652a368
to
2142fbc
Compare
This adds `configs.IsNamespaceSupported(nsType)` to check if the host supports a namespace type. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>
Signed-off-by: Daniel, Dao Quang Minh <[email protected]>
Signed-off-by: Daniel, Dao Quang Minh <[email protected]>
This adds orderNamespacePaths to get correct order of namespaces for the bootstrap program to join. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>
An init process can join other namespaces (pidns, ipc etc.). This leverages C code defined in nsenter package to spawn a process with correct namespaces and clone if necessary. This moves all setns and cloneflags related code to nsenter layer, which mean that we dont use Go os/exec to create process with cloneflags and set uid/gid_map or setgroups anymore. The necessary data is passed from Go to C using a netlink binary-encoding format. With this change, setns and init processes are almost the same, which brings some opportunity for refactoring. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>
we dont have a need for this method, since all namespaces are joined/created in nsenter Signed-off-by: Daniel, Dao Quang Minh <[email protected]>
setsid is called in nsenter, so we dont have to call it a second time. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>
Signed-off-by: Daniel, Dao Quang Minh <[email protected]>
24e1066
to
df6b74b
Compare
rebased against master. |
@dqminh are you still interested in this PR? Right now it is kinda hard to review. @LK4D4 and I were look at this today and maybe we can get this merged quicker if we split up the PR. Maybe start by a PR adding netlink for communication instead of json over pipe. What do you think? |
@crosbymichael @LK4D4 yes, i would still like to finish it. I think i can spend some more time on this in the next few days. I think the behavior should be more or less finalized pending the netlink changes. I need to take a look at the netlink structure again Not sure about replacing json over pipe with netlink. We are still going to do the json over pipe to send config over. Netlink messages is just to send the necessary data structure to C-land so it can setup the namespace and clone properly. Maybe it's possible to split the netlink part into a separate PR so we can replace environment variable usage first, and then change the way we setup namespaces later. I will see how much cleaner this is. |
@dqminh but json config will be passed further to Golang, right? |
@LK4D4 yes, the same pipe is used to pass netlink data to C-land first, and then later on json data to Go-land. |
+1 |
It would be awesome to get PID namespace sharing into Docker sometime soon. It looks like #340 was merged. Anything else holding this up? |
I think this can be closed now that #454 is merged? |
Yes, thanks |
Have you tested it for joining userns of another container?
|
@ashahab-altiscale yes, it works correctly. Working on the changes/PR to update Docker's vendor-in to do more testing, but changes to systemd/cgroup mounts on the libcontainer side are breaking userns at the moment.. |
Original PR and discussion: docker-archive/libcontainer#609
Enable moby/moby#13453 and moby/moby#10163
This supports setting existing namespaces for a container's init process. It leverages the same C code infrastructure for execin to clone a new init process with the wanted namespaces and cloneflags. Now all namespaces related operation is performed in nsenter C layer, with data sent from Go layer using a simple binary format.