starting container process caused 'process_linux.go:245: running exec setns process for init caused "exit status 6"' #1130

hkjn · 2016-10-20T06:20:38Z

Hi OCI folks,

We are seeing a failure to start Docker containers through runc, seemingly from this line:

https://github.com/opencontainers/runc/blob/master/libcontainer/process_linux.go#L245

This might well be a config or system issue (we're on somewhat old Kernel versions because CentOS..), but the logs don't give so much to go on here..

The man pages for setns is defining the error codes it should return:

http://man7.org/linux/man-pages/man2/setns.2.html

But if the following page can be trusted, exit status 6 should be ENXIO, which is not mentioned in the man pages:

http://www.virtsync.com/c-error-codes-include-errno

Any suggestions for how to debug further or what to check would be appreciated, thanks in advance!

Logs

/bin/docker: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 6\\\"\"\n".

System info

# uname -a
Linux ip-10-226-24-78 3.10.0-327.28.2.el7.x86_64 #1 SMP Wed Aug 3 11:11:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

# docker info
Containers: 1
 Running: 1
  Paused: 0
     Stopped: 0
     Images: 6
     Server Version: 1.12.2
     Storage Driver: overlay
      Backing Filesystem: xfs
        Logging Driver: json-file
        Cgroup Driver: cgroupfs
        Plugins:
         Volume: local
          Network: bridge null host overlay
            Swarm: inactive
            Runtimes: runc
            Default Runtime: runc
            Security Options: seccomp
            Kernel Version: 3.10.0-327.28.2.el7.x86_64
            Operating System: CentOS Linux 7 (Core)
            OSType: linux
            Architecture: x86_64
            CPUs: 2
            Total Memory: 7.389 GiB
            Name: ip-10-226-24-78
            ID: TNS5:V674:K6Y4:CSIT:ROPR:XJMI:LDSR:KTC3:DZS7:G7RD:426H:DFRN
            Docker Root Dir: /var/lib/docker
            Debug Mode (client): false
            Debug Mode (server): false
            Registry: https://index.docker.io/v1/
            WARNING: bridge-nf-call-iptables is disabled
            WARNING: bridge-nf-call-ip6tables is disabled
            Insecure Registries:
             127.0.0.0/8

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7566         207         453           5        6904        4230
Swap:          2047         463        1584

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               2400.082
BogoMIPS:              4800.16
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0,1

The text was updated successfully, but these errors were encountered:

cyphar · 2016-10-20T07:53:40Z

The exit status 6 is a pretty ugly hack I added that allows us to figure out where inside this file your code is failing. An error from process_linux.go with "exit status 6" means that the 6th bail in that file was executed (in the version of runC you're running).

To cut a long story short, this is the code that is failing:

    /*
     * We must fork to actually enter the PID namespace, and use
     * CLONE_PARENT so that the child init can have the right parent
     * (the bootstrap process). Also so we don't need to forward the
     * child's exit code or resend its death signal.
     */
    childpid = clone_parent(env, config->cloneflags);
    if (childpid < 0)
        bail("unable to fork"); /* this is where exit status 6 comes from */

So, the big question is -- does your system support all of the namespaces that you're trying to use? What is the output of ls -la /proc/self/ns?

hkjn · 2016-10-20T09:15:40Z

Ah, that helps explain the exit status, cheers.

What's odd here is that the failure was not consistent; sometimes the docker run command would work fine if we ran it manually, even if it failed with systemd, later it seemed to be failing with this symptom consistently.

The node degraded further and won't even let me ssh in now, so it's unfortunately hard to get more diagnostics from it.. another node which should be identically configured is giving the following output:

# ls -la /proc/self/ns
total 0
dr-x--x--x. 2 root root 0 Oct 20 09:13 .
dr-xr-xr-x. 9 root root 0 Oct 20 09:13 ..
lrwxrwxrwx. 1 root root 0 Oct 20 09:13 ipc -> ipc:[4026531839]
lrwxrwxrwx. 1 root root 0 Oct 20 09:13 mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Oct 20 09:13 net -> net:[4026532028]
lrwxrwxrwx. 1 root root 0 Oct 20 09:13 pid -> pid:[4026531836]
lrwxrwxrwx. 1 root root 0 Oct 20 09:13 user -> user:[4026531837]
lrwxrwxrwx. 1 root root 0 Oct 20 09:13 uts -> uts:[4026531838]

But that node does not seem to hit the same issue as the first one; all services seem to have their containers start up fine.

I'll attach the info from /proc/self/ns from a node with this issue if it pops up again, feel free to close this bug or leave it open for others to chip in if they also hit the same symptom (couldn't find anything on Google by searching for the symptoms myself), your call.

cyphar · 2016-10-20T10:56:14Z

@hkjn Actually, the best thing would be for you to attach an strace -f of runc when the issue occurs. Though, since you're using Docker this might prove difficult (and it will have very large performance effects that aren't favourable). If you can reproduce having a node like that again, please try running any runC container set up (without Docker) on that machine with strace -f runc run ... to see what breaks. Thanks.

rajasec · 2016-10-20T18:33:22Z

@cyphar
When I run nested runc ( runc inside runc), I'm getting the below error
nsenter: unable to fork: Operation not permitted
container_linux.go:247: starting container process caused "process_linux.go:245: running exec setns process for init caused "exit status 6""
May not be the right use case, thought of testing it out.

cyphar · 2016-10-21T00:07:32Z

@rajasec That's because you're trying to unshare namespaces you don't have the right to unshare. You'll have to take a look at the kernel code to figure out precisely what's happening (if you're trying to run runc from inside a chroot it's not going to work, for example).

jaredbroad · 2016-12-01T18:18:04Z

+1 have this error and don't use any runC for anything (though it might be used inside Mono). It also happens intermittently but mostly when the machine is tight on resources / overloaded.

Any other tips for debugging root cause if Im not using RunC?

jamiethermo · 2016-12-14T02:33:37Z

I have this error with docker (I assume docker-runc?). Not sure how I would debug it. Give me something to type and I'll type it?

cyphar · 2016-12-14T02:45:09Z

Some information that would be useful from anyone else who comments on this issue:

Are you running Docker with user namespaces enabled?
Is SELinux enabled on your host and/or container?
Can you use runc by itself -- outside of Docker? Read the README for information on how to start up a simple container.
What kernel version / distribution are you using?

jamiethermo · 2016-12-14T03:09:55Z

No user namespaces.
SELinux is enabled & permissive
Don't have "runc". I have "docker-runc" which says its 1.0.0-rc2. Is that runc?
Centos 7.2: 3.10.0-327.36.2.el7.x86_64 #1 SMP Mon Oct 10 23:08:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I'll have to tool around with it. I don't get a container following the runc readme. Doing something daft I expect.

cyphar · 2016-12-14T03:27:01Z

@jamiethermo docker-runc is just what Docker calls it's packaged version of runc.

You can create a container like this:

% mkdir -p bundle/rootfs
% docker create --name a_new_rootfs alpine:latest true
% docker export a_new_rootfs | tar xvfC - bundle/rootfs
% runc spec -b bundle
% runc run -b bundle container
/ # # This is inside the container now.

Does that help?

jamiethermo · 2016-12-14T03:30:04Z

Ok. That works.

cyphar · 2016-12-14T03:44:17Z

Alright, it would help to know what config.json the container is being started with (under Docker). Unfortunately Docker won't save the config.json if the container creation fails. You could try doing something like this:

% cat >/tmp/dodgy-runtime.sh <<EOF
#!/bin/sh

cat config.json >>/tmp/dodgy-runtime.log
exit 1
EOF
% chmod +x /tmp/dodgy-runtime.sh
% docker daemon --add-runtime="dodgy=/tmp/dodgy-runtime.sh" --default-runtime=dodgy

Then try to start a container. It will fail, but you should be able to get the config.json from /tmp/dodgy-runtime.log. You can then modify it so that the rootfs entry is equal to the string "rootfs" and then replace bundle/config.json in my previous comment with the old file.

Then runC should fail to start. Paste the config you got here.

jamiethermo · 2016-12-14T03:56:56Z

Ok. Can't do that right now. But since it seems arbitrary what is running and what is failing (the same docker image will run one minute and not the next), here's a config file that did get created. Don't know if that'll help. Will try the hack, above, tomorrow. Thanks!
config.json.zip

hqhq · 2016-12-14T11:24:53Z

For people who get "exit status x"，you can get the runc code you are using, then:

# cd libcontainer/nsenter
# gcc -E nsexec.c -o nsexec.i

Then you can find out which bail you hit from nsexec.i.

It's ugly though, we should improve it someday.

cyphar · 2016-12-14T12:23:34Z

@hqhq Or you can count from the start of the file (which is what I do). Vim even has a shortcut for it. But yes, the bail(...) code was a hack to get around the fact that we aren't writing our errors to the error pipe in nsexec -- the only information we get is the return code. :P

jamiethermo · 2016-12-14T20:18:02Z

@cyphar Could I replace docker-runc with a bash script that saves off the config.json somewhere if it crashes? Could we make runc do that by default?

cyphar · 2016-12-15T04:50:55Z

Could I replace docker-runc with a bash script that saves off the config.json somewhere if it crashes?

You could try that. By the way, if you haven't created an upstream bug report (in Docker) please do so.

Could we make runc do that by default?

I don't want to, mainly because it'd only be helpful for debugging things in certain cases under Docker. And runC is not just used inside Docker.

jamesongithub · 2017-02-01T23:24:52Z

ECS team thinks this issue is causing their agent to disconnect at times. Referenced aws/amazon-ecs-agent#658 (comment)

jaredbroad · 2017-02-01T23:40:40Z

I "fixed" by upgrading from Ubuntu 15.04 -> 16.04. It might be a bug in an old version that is no longer maintained.

…

On Wed, Feb 1, 2017 at 6:24 PM, James Yang ***@***.***> wrote: ECS team thinks this issue is causing their agent to disconnect at times. Referenced aws/amazon-ecs-agent#658 (comment) <aws/amazon-ecs-agent#658 (comment)> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1130 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACI6mXS1bSYH35c_Dv020e6jfsrqfnrEks5rYRRLgaJpZM4Kbwol> .

-- Jared Broad

jamesongithub · 2017-02-02T01:07:46Z

hm might have to try that

jamesongithub · 2017-02-06T19:14:59Z

@cyphar is there a workaround for this? besides upgrading to ubuntu 16?

cyphar · 2017-03-04T17:20:58Z

@jamesongithub It's likely that issues of this form are kernel issues (and since Ubuntu has interesting kernel policies, upgrading might be your only option), unless you have some very odd configurations. As I mentioned above, the error only tells us what line inside libcontainer/nsenter/nsexec.c failed (and unshare can fail for a wide variety of reasons).

freefood89 · 2017-03-08T03:08:31Z

I've been having this issue with RHEL 7.3 too
SELINUX=enforcing
SELINUXTYPE=targeted

Besides being inexperienced with stuff like ns and runc, I'm struggling to figure out what's going on because it's intermittent as mentioned by @jamesongithub

ls -la /proc/self/ns shows the same results as @hkjn

frezbo · 2018-01-10T08:44:40Z

@cyphar @rhatdan Same issue on RHEL 7.4, but exit status is 40, user namespace is enabled as per this doc: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/getting_started_with_containers/get_started_with_docker_formatted_container_images#user_namespaces_options.

On latest available kernel.

frezbo · 2018-01-10T08:47:06Z

For anyone having issues with RHEL only enable this option: namespace.unpriv_enable=1 and not this user_namespace.enable=1 having both in cmdline causes issues:

[ec2-user@ip-10-16-1-55 mycontainer]$ cat /proc/cmdline | grep "namespace.unpriv_enable=1"
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=de4def96-ff72-4eb9-ad5e-0847257d1866 ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 namespace.unpriv_enable=1
[ec2-user@ip-10-16-1-55 mycontainer]$ runc --root /tmp/runc run --no-pivot --no-new-keyring mycontainerid
/ #

chadfurman · 2018-08-10T04:00:58Z

I came here from google for a similar error. Turns out, I was trying to use the VOLUME directive in my dockerfile like this:

VOLUME . /src
thinking I could mount the current directory from the host as a volume like that, but that's not how it works.

You have to, instead, do this:

VOLUME /src
followed by
docker run -v /absolute/path/to/directory/on/host:/src <rest of your docker run command>

Note also (and somewhat unrelated) that I was getting similar errors on Fedora simply related to SELinux. And while I don't recommend doing the following for security reasons (see: http://stopdisablingselinux.com/), it did work for me:

sudo setenforce 0
sudo systemctl restart docker
docker build -t image .
docker run image

smileusd · 2018-08-21T08:24:47Z

I meet the same problem, when I build and start a image.

Sending build context to Docker daemon   220 MB
Step 1 : FROM warpdrive:tos-release-1-5
 ---> 769306738d96
Step 2 : COPY . /go/src/github.com/transwarp/warpdrive/
 ---> 07c99697b16e
Removing intermediate container 127c0e71a84b
Successfully built 07c99697b16e
/usr/bin/docker: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 6\\\"\"\n".
FATA[0301] exit status 125                              
make: *** [build] Error 1

Then I clean the a lot of images and containers and free the caches, the problem is disappear. But I think is not a cache problem because of the change of cache is tiny.

meirwah · 2019-02-14T17:59:33Z

seems related to:
https://forums.docker.com/t/centos7-docker-hello-world-fails/68941/3

yipingxx · 2019-03-26T02:16:34Z

It is bug of kernel(3.10.0-327),try to update your kernel version.

richardpen mentioned this issue Jan 6, 2017

oci runtime error: container_linux.go when trying to start ecs task aws/amazon-ecs-agent#658

Closed

teddyking mentioned this issue Jan 18, 2017

running exec setns process for init caused \"exit status 26\" #1281

Closed

CarltonSemple mentioned this issue Feb 10, 2017

Docker containers failing in /dev/.lxc/proc directory canonical/lxd#2825

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

starting container process caused 'process_linux.go:245: running exec setns process for init caused "exit status 6"' #1130

starting container process caused 'process_linux.go:245: running exec setns process for init caused "exit status 6"' #1130

hkjn commented Oct 20, 2016

cyphar commented Oct 20, 2016 •

edited

Loading

hkjn commented Oct 20, 2016

cyphar commented Oct 20, 2016

rajasec commented Oct 20, 2016

cyphar commented Oct 21, 2016

jaredbroad commented Dec 1, 2016

jamiethermo commented Dec 14, 2016

cyphar commented Dec 14, 2016 •

edited

Loading

jamiethermo commented Dec 14, 2016

cyphar commented Dec 14, 2016 •

edited

Loading

jamiethermo commented Dec 14, 2016

cyphar commented Dec 14, 2016

jamiethermo commented Dec 14, 2016

hqhq commented Dec 14, 2016

cyphar commented Dec 14, 2016 •

edited

Loading

jamiethermo commented Dec 14, 2016

cyphar commented Dec 15, 2016

jamesongithub commented Feb 1, 2017

jaredbroad commented Feb 1, 2017 via email

jamesongithub commented Feb 2, 2017

jamesongithub commented Feb 6, 2017

cyphar commented Mar 4, 2017

freefood89 commented Mar 8, 2017

frezbo commented Jan 10, 2018

frezbo commented Jan 10, 2018

chadfurman commented Aug 10, 2018 •

edited

Loading

smileusd commented Aug 21, 2018 •

edited

Loading

meirwah commented Feb 14, 2019

yipingxx commented Mar 26, 2019

starting container process caused 'process_linux.go:245: running exec setns process for init caused "exit status 6"' #1130

starting container process caused 'process_linux.go:245: running exec setns process for init caused "exit status 6"' #1130

Comments

hkjn commented Oct 20, 2016

Logs

System info

cyphar commented Oct 20, 2016 • edited Loading

hkjn commented Oct 20, 2016

cyphar commented Oct 20, 2016

rajasec commented Oct 20, 2016

cyphar commented Oct 21, 2016

jaredbroad commented Dec 1, 2016

jamiethermo commented Dec 14, 2016

cyphar commented Dec 14, 2016 • edited Loading

jamiethermo commented Dec 14, 2016

cyphar commented Dec 14, 2016 • edited Loading

jamiethermo commented Dec 14, 2016

cyphar commented Dec 14, 2016

jamiethermo commented Dec 14, 2016

hqhq commented Dec 14, 2016

cyphar commented Dec 14, 2016 • edited Loading

jamiethermo commented Dec 14, 2016

cyphar commented Dec 15, 2016

jamesongithub commented Feb 1, 2017

jaredbroad commented Feb 1, 2017 via email

jamesongithub commented Feb 2, 2017

jamesongithub commented Feb 6, 2017

cyphar commented Mar 4, 2017

freefood89 commented Mar 8, 2017

frezbo commented Jan 10, 2018

frezbo commented Jan 10, 2018

chadfurman commented Aug 10, 2018 • edited Loading

smileusd commented Aug 21, 2018 • edited Loading

meirwah commented Feb 14, 2019

yipingxx commented Mar 26, 2019

cyphar commented Oct 20, 2016 •

edited

Loading

cyphar commented Dec 14, 2016 •

edited

Loading

cyphar commented Dec 14, 2016 •

edited

Loading

cyphar commented Dec 14, 2016 •

edited

Loading

chadfurman commented Aug 10, 2018 •

edited

Loading

smileusd commented Aug 21, 2018 •

edited

Loading