A container can join namespaces of another container #105

dqminh · 2015-07-09T12:10:24Z

Original PR and discussion: docker-archive/libcontainer#609

Enable moby/moby#13453 and moby/moby#10163

This supports setting existing namespaces for a container's init process. It leverages the same C code infrastructure for execin to clone a new init process with the wanted namespaces and cloneflags. Now all namespaces related operation is performed in nsenter C layer, with data sent from Go layer using a simple binary format.

dqminh · 2015-07-09T12:13:24Z

@vishvananda This is related to #101
IIRC joining another user namespace still doesn't work yet for init process. I didn't have time to look into the errors further. There are also some discussion related to that in docker-archive/libcontainer#609

avagin · 2015-07-09T12:55:09Z

libcontainer/nsenter/nsexec.c

-				  namespaces[i]);
+		fds[i] = open(ns, O_RDONLY);
+		if (fds[i] == -1) {
+			savedErr = errno;


You can call pr_perror and then close descritores. In this case you will not need to save errno in a temporary variable.

right, i think we also discussed before about this but i forgot to move the patch over. Since we exit anyway, it's ok to not having to close the fds IIRC, right ?

mrunalp · 2015-07-09T23:19:43Z

Tests fixed now. They were failing because of #109

mrunalp · 2015-07-09T23:29:58Z

libcontainer/container_linux.go

@@ -779,3 +817,40 @@ func (c *linuxContainer) currentState() (*State, error) {
 	}
 	return state, nil
 }
+
+// orderNamespacePaths sorts that namespace paths into a list of paths that we


vishvananda · 2015-07-11T17:08:01Z

@LK4D4 @mrunalp Updated code to remove cloneflags and refactor process: dqminh#1

dqminh · 2015-07-13T05:32:19Z

@avagin added your suggestion on setns all custom namespaces first, then clone new namespaces after . This is currently done only when we have custom user namespace for now ( we probably can also refactor further to unify both custom and default i.e., also write uid/gid mapping in C instead of Go ). Can you take a look if the current approach looks right ?

@vishvananda can you help testing this patch with your custom network namespace ? It's different from your approach but i think it's simpler.

avagin · 2015-07-13T07:49:27Z

libcontainer/container_linux.go

+	// otherwise do it inside nsexec by passing the clone flags because we dont
+	// have to perform any additional setup when start a new process.
+	if c.config.Namespaces.PathOf(configs.NEWUSER) == "" {
+		cmd.SysProcAttr.Cloneflags = cloneFlags


If you need to create a new user namespace and attach to a precreated pid namespace, you need to enter into the pid namespace and only then create a user namespace, so the user namespace should be created from nsexec.c, should not it?

vishvananda · 2015-07-13T17:46:04Z

It is confusing to be using cloneflags in the go process for new userns which puts it at the beginning of the list, but then setns into it at the end of the list if you specify a path. I suggest pulling in my patches that explicitly set the uid_map and gid_map in our code instead of relying on the go exec to do it. Alternatively, we could always fork and set the uid_map and gid_map in the c code, using a pipe to sync between the parent and child as is suggested here: https://lwn.net/Articles/532593/

dqminh · 2015-08-24T09:07:16Z

Finally having internet at home, so i can have some cycle to work on this and other issues again 🐳

Given that we want to create a new process with clone flags in C always (meant that we cant depend on uid/gid_map file creation in Go, i think it's much more convenient to pass all the data we need into C layer, do work there and pass back to Go when we are finally done. I'm thinking of:

passing a json data structure from Go to C, which includes all values we currently specified as environment variables, and more data such as the content of uid_map/gid_map file, list of namespace paths to be joined, setgroups etc. Just enough for the C layer to join namespaces, then create new process with correct clone flags, then write uid_map/gid_map/setgroups if necessary. Then it returns back to Go layer. This allows us more flexibility and not having to pass control signal back and forth many times.
Linking to libyajl to process JSON. if you have a better alternative, please let me know

Thoughts @mrunalp @avagin @vishvananda ?

avagin · 2015-08-25T19:45:59Z

I am not sure that we need json here, maybe a handmade binary format would be enough. I think it will be obvious when you show code.

dqminh · 2015-08-25T21:21:53Z

@avagin hmm yes a binary format also works. JSON is just an implementation detail since it's easy to generate from Go, and easy enough to parse from C with a library. Not requiring external dependency is a big win though.

mrunalp · 2015-08-25T21:31:03Z

+1 Yeah, it will be good if we can avoid a dependency.

For now a hack for the 1.6.1-userns branch (modifying vendored code) to enable user namespace join on exec. This will be more thorougly corrected (for other use cases) in a PR for opencontainers/runc#105. Docker-DCO-1.1-Signed-off-by: Phil Estes <[email protected]>

dqminh · 2015-09-14T01:00:24Z

This depends on #43 now for new netlink library.

@avagin updated with changes to use a netlink message to send data from Go to C layer now, also includes some fixes from last round of review. PTAL.

This adds `configs.IsNamespaceSupported(nsType)` to check if the host supports a namespace type. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

This adds orderNamespacePaths to get correct order of namespaces for the bootstrap program to join. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

An init process can join other namespaces (pidns, ipc etc.). This leverages C code defined in nsenter package to spawn a process with correct namespaces and clone if necessary. This moves all setns and cloneflags related code to nsenter layer, which mean that we dont use Go os/exec to create process with cloneflags and set uid/gid_map or setgroups anymore. The necessary data is passed from Go to C using a netlink binary-encoding format. With this change, setns and init processes are almost the same, which brings some opportunity for refactoring. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

we dont have a need for this method, since all namespaces are joined/created in nsenter Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

setsid is called in nsenter, so we dont have to call it a second time. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

dqminh · 2015-09-14T23:04:14Z

rebased against master.

crosbymichael · 2015-10-12T17:50:15Z

@dqminh are you still interested in this PR?

Right now it is kinda hard to review. @LK4D4 and I were look at this today and maybe we can get this merged quicker if we split up the PR.

Maybe start by a PR adding netlink for communication instead of json over pipe.
Then do one with the refactoring needed, etc...

What do you think?

dqminh · 2015-10-14T13:48:35Z

@crosbymichael @LK4D4 yes, i would still like to finish it. I think i can spend some more time on this in the next few days. I think the behavior should be more or less finalized pending the netlink changes. I need to take a look at the netlink structure again

Not sure about replacing json over pipe with netlink. We are still going to do the json over pipe to send config over. Netlink messages is just to send the necessary data structure to C-land so it can setup the namespace and clone properly.

Maybe it's possible to split the netlink part into a separate PR so we can replace environment variable usage first, and then change the way we setup namespaces later. I will see how much cleaner this is.

LK4D4 · 2015-10-14T16:21:32Z

@dqminh but json config will be passed further to Golang, right?
Replacing env vars with netlink first sounds good to me. Maybe makes sense to start split C code to different files too.

dqminh · 2015-10-14T16:24:52Z

@LK4D4 yes, the same pipe is used to pass netlink data to C-land first, and then later on json data to Go-land.

rootfs · 2015-12-02T18:13:37Z

+1
what's the status/plan for this?

dqminh · 2015-12-03T18:10:11Z

@rootfs we are trying to get the main refactoring reviewed at #340 then this change will be much simpler here.

ianlewis · 2016-02-02T06:15:28Z

It would be awesome to get PID namespace sharing into Docker sometime soon. It looks like #340 was merged. Anything else holding this up?

estesp · 2016-03-01T01:13:18Z

I think this can be closed now that #454 is merged?

crosbymichael · 2016-03-01T01:14:17Z

Yes, thanks

ashahab-altiscale · 2016-03-01T01:14:56Z

Have you tested it for joining userns of another container?
On Feb 29, 2016 5:13 PM, "Phil Estes" [email protected] wrote:

I think this can be closed now that #454
#454 is merged?

—
Reply to this email directly or view it on GitHub
#105 (comment).

estesp · 2016-03-01T14:54:59Z

@ashahab-altiscale yes, it works correctly. Working on the changes/PR to update Docker's vendor-in to do more testing, but changes to systemd/cgroup mounts on the libcontainer side are breaking userns at the moment..

avagin reviewed Jul 9, 2015
View reviewed changes

mrunalp reviewed Jul 9, 2015
View reviewed changes

dqminh force-pushed the libcontainer-pidns branch 2 times, most recently from 09636e3 to 5359301 Compare July 12, 2015 22:31

GordonTheTurtle added the dco/no label Jul 13, 2015

dqminh force-pushed the libcontainer-pidns branch from 667fb58 to 652a368 Compare July 13, 2015 05:27

GordonTheTurtle removed the dco/no label Jul 13, 2015

avagin reviewed Jul 13, 2015
View reviewed changes

mrunalp mentioned this pull request Jul 27, 2015

signal: Fix leak #154

Merged

This was referenced Aug 10, 2015

Enter existing user namespace if present #187

Closed

Allow shared PID namespaces moby/moby#10163

Closed

This was referenced Aug 26, 2015

Phase 1 implementation of user namespaces as a remapped container root moby/moby#12648

Merged

User namespaces - Phase 1 moby/moby#15187

Closed

dqminh force-pushed the libcontainer-pidns branch from 652a368 to 2142fbc Compare September 1, 2015 00:36

dqminh added 8 commits September 14, 2015 22:57

check if a namespace is supported

2370111

This adds `configs.IsNamespaceSupported(nsType)` to check if the host supports a namespace type. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

do not override the specified userns path

7903604

Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

integration tests for joining namespaces

420f4bb

Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

orderNamespacePaths gets correct order of ns

1f9dbd7

This adds orderNamespacePaths to get correct order of namespaces for the bootstrap program to join. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

remove joinExistingNamespaces

6955f14

we dont have a need for this method, since all namespaces are joined/created in nsenter Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

remove redundant setsid call in init's go layer

4a60a3f

setsid is called in nsenter, so we dont have to call it a second time. Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

reorder and remove unused imports in nsexec.c

df6b74b

Signed-off-by: Daniel, Dao Quang Minh <[email protected]>

dqminh force-pushed the libcontainer-pidns branch from 24e1066 to df6b74b Compare September 14, 2015 23:03

mrunalp mentioned this pull request Sep 21, 2015

Allow user namespace in nsexec. #281

Closed

mlaventure mentioned this pull request Jan 5, 2016

Move setns within nsexec #454

Merged

wking mentioned this pull request Jan 13, 2016

Separate container sandbox lifecycle from that of the processes inside it opencontainers/runtime-spec#299

Open

crosbymichael modified the milestone: 0.0.9 Feb 10, 2016

nickethier mentioned this pull request Feb 23, 2016

Shared PID and UTS namespaces kubernetes/kubernetes#1615

Closed

crosbymichael closed this Mar 1, 2016

rajasec mentioned this pull request Mar 22, 2016

handling error for userns #672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A container can join namespaces of another container #105

A container can join namespaces of another container #105

dqminh commented Jul 9, 2015

dqminh commented Jul 9, 2015

avagin Jul 9, 2015

dqminh Jul 10, 2015

mrunalp commented Jul 9, 2015

mrunalp Jul 9, 2015

vishvananda commented Jul 11, 2015

dqminh commented Jul 13, 2015

avagin Jul 13, 2015

vishvananda commented Jul 13, 2015

dqminh commented Aug 24, 2015

avagin commented Aug 25, 2015

dqminh commented Aug 25, 2015

mrunalp commented Aug 25, 2015

dqminh commented Sep 14, 2015

dqminh commented Sep 14, 2015

crosbymichael commented Oct 12, 2015

dqminh commented Oct 14, 2015

LK4D4 commented Oct 14, 2015

dqminh commented Oct 14, 2015

rootfs commented Dec 2, 2015

dqminh commented Dec 3, 2015

ianlewis commented Feb 2, 2016

estesp commented Mar 1, 2016

crosbymichael commented Mar 1, 2016

ashahab-altiscale commented Mar 1, 2016

estesp commented Mar 1, 2016

A container can join namespaces of another container #105

A container can join namespaces of another container #105

Conversation

dqminh commented Jul 9, 2015

dqminh commented Jul 9, 2015

avagin Jul 9, 2015

Choose a reason for hiding this comment

dqminh Jul 10, 2015

Choose a reason for hiding this comment

mrunalp commented Jul 9, 2015

mrunalp Jul 9, 2015

Choose a reason for hiding this comment

vishvananda commented Jul 11, 2015

dqminh commented Jul 13, 2015

avagin Jul 13, 2015

Choose a reason for hiding this comment

vishvananda commented Jul 13, 2015

dqminh commented Aug 24, 2015

avagin commented Aug 25, 2015

dqminh commented Aug 25, 2015

mrunalp commented Aug 25, 2015

dqminh commented Sep 14, 2015

dqminh commented Sep 14, 2015

crosbymichael commented Oct 12, 2015

dqminh commented Oct 14, 2015

LK4D4 commented Oct 14, 2015

dqminh commented Oct 14, 2015

rootfs commented Dec 2, 2015

dqminh commented Dec 3, 2015

ianlewis commented Feb 2, 2016

estesp commented Mar 1, 2016

crosbymichael commented Mar 1, 2016

ashahab-altiscale commented Mar 1, 2016

estesp commented Mar 1, 2016