Properly setuid/setgid after entering userns #606

estesp · 2016-03-01T18:22:46Z

The re-work of namespace entering lost the setuid/setgid that was part
of the Go-routine based process exec in the prior code. A side issue was
found with setting oom_score_adj before execve() in a userns that is
also solved here.

Docker-DCO-1.1-Signed-off-by: Phil Estes [email protected] (github: estesp)

estesp · 2016-03-01T18:31:24Z

strace details on the "0555" perms:

stat("/home/estesp/runc-test/rootfs.uid2000/sys",  <unfinished ...>
<... stat resumed> {st_dev=makedev(252, 1), st_ino=11013277, st_mode=S_IFDIR|0775, 
                   st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=4096,
                   st_atime=2016/03/01-12:57:26, st_mtime=2016/03/01-12:57:24, st_ctime=2016/03/01-12:57:24}) = 0
mount("sysfs", "/home/estesp/runc-test/rootfs.uid2000/sys", "sysfs", 
                   MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = 0
stat("/home/estesp/runc-test/rootfs.uid2000/sys/fs/cgroup", {st_dev=makedev(0, 74), st_ino=3, 
                    st_mode=S_IFDIR|0555, st_nlink=2, st_uid=65534, st_gid=65534, st_blksize=4096, 
                    st_blocks=0, st_size=0, st_atime=2016/03/01-13:14:58, st_mtime=2016/03/01-13:14:58, st_ctime=2016/03/01-13:14:58}) = 0
mount("tmpfs", "/home/estesp/runc-test/rootfs.uid2000/sys/fs/cgroup", "tmpfs", 
                       MS_NOSUID|MS_NODEV|MS_NOEXEC, "mode=755") = 0
fchmodat(AT_FDCWD, "/home/estesp/runc-test/rootfs.uid2000/sys/fs/cgroup", 0555)    = 0

The stat, mount, fchmodat relate to the code highlighted: https://github.com/opencontainers/runc/blob/master/libcontainer/rootfs_linux.go#L146-L160

estesp · 2016-03-01T18:43:01Z

Also, for further detail, this is git bisect to the first commit which exhibits mount/mkdir failure:

42d5d0480107a83300a59683a73b11d17538f3dc is the first bad commit
commit 42d5d0480107a83300a59683a73b11d17538f3dc
Author: Daniel, Dao Quang Minh <[email protected]>
Date:   Mon Sep 14 00:40:43 2015 +0000

    Sets custom namespaces for init processes

mrunalp · 2016-03-01T20:01:18Z

Interesting. I wonder if it is related to the umask which is somehow different now.. (just a theory).

estesp · 2016-03-01T21:56:46Z

@mrunalp: @crosbymichael thought something similar (and so did I initially) but I can't find anything different about umask before/after this commit.

What does seem different in a full strace comparison is the way Go process startup used to handle setuid/setgid after setting up the mappings. In fact, in testing after a local vendoring-in to Docker, I now have to explicitly set up the user string to "0:0" to not end up defaulted as nobody (65534:65534) with no perms. Using runc I don't have this problem as 0:0 already happens in the spec -> libcontainer.Process setup, but it does seem to reveal something different is happening with uid/gid given the modified flow of clone/child/parent actions happening in C in the new code?

UPDATE: However.. as a counterpoint, sprinking setuid()/setgid() into the C code where I think it might need to be changes nothing about the weird 0555 stat response.

codido · 2016-03-02T01:42:32Z

@estesp Are you sure the mode is the root cause here? Isn't 0555 actually valid?

$ unshare -rfmn
$ mount -t sysfs none /sys
$ ls -ld /sys/fs/cgroup
dr-xr-xr-x. 2 nfsnobody nfsnobody 0 Mar  1 20:40 /sys/fs/cgroup

However, I did notice that prior to the recent changes, setuid(0) was called before setting up rootfs, and this is no longer the case. As a quick hack, moving setgid/setuid before setupRootfs() seems to fix the issue.

estesp · 2016-03-02T14:58:13Z

@codido you are correct that 0555 is definitely valid, the only problem is that it prevents sub-mounts from being bound underneath that path--and my wild guess is that it has to do with not being perceived as being "really root" like it was prior to the changes.

Where did you put the setuid/setgid to resolve the issue? I tried several things yesterday without any luck.

codido · 2016-03-02T15:01:27Z

@estesp added that just before calling setupRootfs(). It's not a valid patch, just a hack to verify that this is indeed the culprit.

estesp · 2016-03-02T15:03:22Z

@codido understand, but was curious because I tried the same but have no change in behavior. Just tried again to confirm--still no change. :(

codido · 2016-03-02T15:10:10Z

@estesp Here's the hack I used to test this:

diff --git a/libcontainer/standard_init_linux.go b/libcontainer/standard_init_linux.go
index 935b2ee..fbfb886 100644
--- a/libcontainer/standard_init_linux.go
+++ b/libcontainer/standard_init_linux.go
@@ -82,6 +82,12 @@ func (l *linuxStandardInit) Init() error {
        label.Init()
        // InitializeMountNamespace() can be executed only for a new mount namespace
        if l.config.Config.Namespaces.Contains(configs.NEWNS) {
+               if err := system.Setgid(0); err != nil {
+                       return err
+               }
+               if err := system.Setuid(0); err != nil {
+                       return err
+               }
                if err := setupRootfs(l.config.Config, console, l.pipe); err != nil {
                        return err
                }

Before applying it, and using a more or less default spec with userns, I got this:

Timestamp: 2016-03-02 10:05:05.730733987 -0500 EST
Code: System error

Message: mkdir /data/fs/matrix/fs/sys/fs/cgroup/systemd: permission denied

Frames:
---
0: setupRootfs
Package: github.com/opencontainers/runc/libcontainer
File: rootfs_linux.go@43
---
1: Init
Package: github.com/opencontainers/runc/libcontainer.(*linuxStandardInit)
File: standard_init_linux.go@85
---
2: StartInitialization
Package: github.com/opencontainers/runc/libcontainer.(*LinuxFactory)
File: factory_linux.go@239
---
3: 1
Package: main.init
File: start.go@84
---
4: init
Package: main
File: utils.go@350
---
5: main
Package: runtime
File: proc.go@100
---
6: goexit
Package: runtime
File: asm_amd64.s@1721
WARN[0000] exit status 1                                
FATA[0000] Container start failed: [10] System error: could not synchronise with container process

With this hack applied the container started successfully.

estesp · 2016-03-02T15:20:04Z

Ok, that is quite interesting that using Golang pkg/syscall versus libcontainer's system package has different behavior. The system package is calling RawSyscall under the covers; will have to see what Golang pkg/syscall does in their Set{u,g}id implementation that's different.

UPDATE: Lesson learned on not checking errors even with a quick hack; from pkg/syscall/syscall_linux.go:

// On linux Setuid and Setgid only affects the current thread, not the process.
// This does not match what most callers expect so we must return an error
// here rather than letting the caller think that the call succeeded.

estesp · 2016-03-02T20:26:35Z

I guess in the end, root cause is this fact:

cur uid: 65534, gid: 65534

We are not root in the user-namespaced process by the time we are running the Init() back in the Go standard container initialization code. This is not the case before #454 and seems to be corrected later in setupUser() code which actually uses the config.User value to lookup (if necessary) the uid/gid pair, and then use system.Set{u,g}id to set the current thread to that pair before the execve of the container command.

mrunalp · 2016-03-02T21:23:42Z

@estesp Previously the setuid/setgid was done after mappings were written in go standard library. We could make the calls in the C code after setsid in nsexec.c

mrunalp · 2016-03-02T21:24:49Z

@estesp Also, they are safe to make as we always require a mapping for 0.

estesp · 2016-03-02T22:13:01Z

@mrunalp I think I'm only slightly losing my mind :)

If I put setuid/setgid in nsexec.c all the rest of the code flow works except writing to /proc/self/oom_score_adj. If I skip this and echo {somevalue} > /proc/self/oom_score_adj after starting a shell, it works fine!
If I put system.Setuid/system.Setgid in libcontainer/standard_init_linux.go everything works.

I have no idea why.

mrunalp · 2016-03-02T23:11:42Z

@estesp I noticed that as well but haven't root caused it :)

codido · 2016-03-02T23:47:40Z

@estesp @mrunalp Perhaps that's because at this point in time /proc is still the original host one?

mrunalp · 2016-03-03T00:25:21Z

@codido Even if we move setuid/setgid to beginning to Init (before the oomscoreadj) we see the same issue. What is a bit puzzling here is that previously also we were root at this point of time in the code.

mrunalp · 2016-03-03T00:25:58Z

@codido Also, the new proc mount and pivot root only happen in setupRootfs so that shouldn't be an issue as it was the same earlier, too.

mrunalp · 2016-03-03T00:30:44Z

I tried moving setOomScoreAdj after setupRootfs, but still see the same error.

codido · 2016-03-03T00:49:08Z

@mrunalp Right. It appears that a process that calls setuid() cannot access it's own oom_score_adj until it calls re-executes. Not sure if that's intended though:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

static void open_oom_score_adj(void)
{
    int fd;

    fd = open("/proc/self/oom_score_adj", O_WRONLY);
    if (fd < 0)
        perror("Open failed");
    else
        printf("Open returned %d\n", fd);

    close(fd);
}

int main(int argc, char *argv[])
{
    int fd;

    if (getuid() == 1000) {
        printf("After execve\n");
        open_oom_score_adj();
        return 0;
    }

    if (setuid(1000) < 0) {
        perror("setuid");
        return 1;
    }

    printf("Before execve\n");
    open_oom_score_adj();

    execv(argv[0], argv);

    return 0;
}

mrunalp · 2016-03-03T00:58:16Z

Hmm, we could try adjusting it from the parent on procReady.

Sent from my iPhone

On Mar 2, 2016, at 4:49 PM, Ido Yariv [email protected] wrote:

@mrunalp Right. It appears that a process that calls setuid() cannot access it's own oom_score_adj until it calls re-executes. Not sure if that's intended though:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

static void open_oom_score_adj(void)
{
int fd;
fd = open("/proc/self/oom_score_adj", O_WRONLY);
if (fd < 0)
    perror("Open failed");
else
    printf("Open returned %d\n", fd);

close(fd);
}

int main(int argc, char *argv[])
{
int fd;
if (getuid() == 1000) {
    printf("After execve\n");
    open_oom_score_adj();
    return 0;
}

if (setuid(1000) < 0) {
    perror("setuid");
    return 1;
}

printf("Before execve\n");
open_oom_score_adj();

execv(argv[0], argv);

return 0;
}
—
Reply to this email directly or view it on GitHub.

estesp · 2016-03-03T14:41:12Z

@mrunalp this has been reworked after your idea about setting oom_score_adj in the parent. PTAL

@crosbymichael hopefully this is the more appropriate fix so that I can vendor libcontainer into Docker and get shared namespaces.

mrunalp · 2016-03-03T14:50:54Z

@estesp Nice! I will test it out once before acking.

mlaventure · 2016-03-03T16:12:31Z

In that case it may work by moving setting the oom score in start_child within the nsenter package before writing the sync byte, wdyt?

mrunalp · 2016-03-03T16:46:20Z

Yes another option is to do it in C code but will have to pass additional config that we can avoid by setting it from parent.

Sent from my iPhone

On Mar 3, 2016, at 8:12 AM, Kenfe-Mickaël Laventure [email protected] wrote:

In that case it may work by moving setting the oom score in start_child within the nsenter package before writing the sync byte, wdyt?

—
Reply to this email directly or view it on GitHub.

estesp · 2016-03-03T17:02:20Z

@mrunalp @mlaventure fixed properly now. PTAL. I even tested :)

mrunalp · 2016-03-03T17:43:56Z

@estesp Thanks! LGTM.

mlaventure · 2016-03-03T17:44:49Z

nice, looks good!

The re-work of namespace entering lost the setuid/setgid that was part of the Go-routine based process exec in the prior code. A side issue was found with setting oom_score_adj before execve() in a userns that is also solved here. Docker-DCO-1.1-Signed-off-by: Phil Estes <[email protected]> (github: estesp)

estesp · 2016-03-04T16:17:14Z

rebased. @crosbymichael: I'd like to vendor this into Docker as well as another commit from earlier soon; PTAL when you have a chance

LK4D4 · 2016-03-04T16:38:21Z

btw according to http://www.openwall.com/lists/kernel-hardening/2016/01/20/14 you need to set uid and gid BEFORE you're joining namespace :)
But this LGTM

Properly setuid/setgid after entering userns

estesp · 2016-03-04T16:40:31Z

@LK4D4 it's on my todo list :)

mrunalp · 2016-03-04T16:44:58Z

Yeah we could do that now we are not bound to the go standard library :)

Sent from my iPhone

On Mar 4, 2016, at 8:38 AM, Alexander Morozov [email protected] wrote:

btw according to http://www.openwall.com/lists/kernel-hardening/2016/01/20/14 you need to set uid and gid BEFORE you're joining namespace :)
But this LGTM

—
Reply to this email directly or view it on GitHub.

mrunalp · 2016-03-04T16:48:04Z

Also if we make runc truly unprivileged for userns then we don't need to even that. I have a branch with a poc.

Sent from my iPhone

On Mar 4, 2016, at 8:44 AM, Mrunal Patel [email protected] wrote:

Yeah we could do that now we are not bound to the go standard library :)

Sent from my iPhone

On Mar 4, 2016, at 8:38 AM, Alexander Morozov [email protected] wrote:

btw according to http://www.openwall.com/lists/kernel-hardening/2016/01/20/14 you need to set uid and gid BEFORE you're joining namespace :)
But this LGTM

—
Reply to this email directly or view it on GitHub.

estesp force-pushed the mount-perm-userns branch from f36b31b to ce5be72 Compare March 3, 2016 14:37

estesp changed the title ~~Retain owner execute perm on cgroup mount~~ Properly setuid/setgid after entering userns Mar 3, 2016

estesp force-pushed the mount-perm-userns branch from ce5be72 to 820abf1 Compare March 3, 2016 14:39

estesp force-pushed the mount-perm-userns branch from 820abf1 to 50a4667 Compare March 3, 2016 17:00

estesp force-pushed the mount-perm-userns branch from 50a4667 to 178bad5 Compare March 4, 2016 16:12

LK4D4 added a commit that referenced this pull request Mar 4, 2016

Merge pull request #606 from estesp/mount-perm-userns

ad2c520

Properly setuid/setgid after entering userns

LK4D4 merged commit ad2c520 into opencontainers:master Mar 4, 2016

estesp deleted the mount-perm-userns branch March 4, 2016 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly setuid/setgid after entering userns #606

Properly setuid/setgid after entering userns #606

estesp commented Mar 1, 2016

estesp commented Mar 1, 2016

estesp commented Mar 1, 2016

mrunalp commented Mar 1, 2016

estesp commented Mar 1, 2016

codido commented Mar 2, 2016

estesp commented Mar 2, 2016

codido commented Mar 2, 2016

estesp commented Mar 2, 2016

codido commented Mar 2, 2016

estesp commented Mar 2, 2016

estesp commented Mar 2, 2016

mrunalp commented Mar 2, 2016

mrunalp commented Mar 2, 2016

estesp commented Mar 2, 2016

mrunalp commented Mar 2, 2016

codido commented Mar 2, 2016

mrunalp commented Mar 3, 2016

mrunalp commented Mar 3, 2016

mrunalp commented Mar 3, 2016

codido commented Mar 3, 2016

mrunalp commented Mar 3, 2016

estesp commented Mar 3, 2016

mrunalp commented Mar 3, 2016

mlaventure commented Mar 3, 2016

mrunalp commented Mar 3, 2016

estesp commented Mar 3, 2016

mrunalp commented Mar 3, 2016

mlaventure commented Mar 3, 2016

estesp commented Mar 4, 2016

LK4D4 commented Mar 4, 2016

estesp commented Mar 4, 2016

mrunalp commented Mar 4, 2016

mrunalp commented Mar 4, 2016

Properly setuid/setgid after entering userns #606

Properly setuid/setgid after entering userns #606

Conversation

estesp commented Mar 1, 2016

estesp commented Mar 1, 2016

estesp commented Mar 1, 2016

mrunalp commented Mar 1, 2016

estesp commented Mar 1, 2016

codido commented Mar 2, 2016

estesp commented Mar 2, 2016

codido commented Mar 2, 2016

estesp commented Mar 2, 2016

codido commented Mar 2, 2016

estesp commented Mar 2, 2016

estesp commented Mar 2, 2016

mrunalp commented Mar 2, 2016

mrunalp commented Mar 2, 2016

estesp commented Mar 2, 2016

mrunalp commented Mar 2, 2016

codido commented Mar 2, 2016

mrunalp commented Mar 3, 2016

mrunalp commented Mar 3, 2016

mrunalp commented Mar 3, 2016

codido commented Mar 3, 2016

mrunalp commented Mar 3, 2016

estesp commented Mar 3, 2016

mrunalp commented Mar 3, 2016

mlaventure commented Mar 3, 2016

mrunalp commented Mar 3, 2016

estesp commented Mar 3, 2016

mrunalp commented Mar 3, 2016

mlaventure commented Mar 3, 2016

estesp commented Mar 4, 2016

LK4D4 commented Mar 4, 2016

estesp commented Mar 4, 2016

mrunalp commented Mar 4, 2016

mrunalp commented Mar 4, 2016