oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused... #951

KensoDev · 2017-08-25T20:10:37Z

Summary

When trying to run a container with mounted volumed under ECS. Got this error. This is going to be a very detailed report since we got this error many times and debugged it to resolution.

Description

When trying to run a container with mounted volumes under ECS, got this error

017-08-24T21:55:25Z [INFO] TaskHandler, Sending container change: ContainerChange: arn:aws:ecs:us-west-2:308798440167:task/0b670bdf-d449-4678-870f-d6670dfb4823 heracles -> STOPPED, Reason CannotStartContainerError: API error (500): oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"rootfs_linux.go:54: mounting \\\"/opt/globality/nginx_conf/heracles\\\" to rootfs \\\"/var/lib/docker/overlay/9fe448624c1af3020353aa8e56ed126fa247ef997fdd56daf582e41ccd383cf3/merged\\\" at \\\"/var/lib/docker/overlay/9fe448624c1af3020353aa8e56ed126fa247ef997fdd56daf582e41ccd383cf3/merged/etc/nginx/sites-enabled\\\" caused \\\"mkdir /var/lib/docker/overlay/9fe448624c1af3020353aa8e56ed126fa247ef997fdd56daf582e41ccd383cf3/merged/etc/nginx/sites-enabled: read-only file system\\\"\""

Now, this has 2 symptoms, different on us-east-1 and us-west-2

us-east-1
After running the container manually on one of the instances in the cluster, the scheduler was able to recover and schedule containers across the cluster.

us-west-2
Even when running the container manually, it was not able to recover.

uname -a

Linux ip-10-50-72-174 4.4.0-1020-aws #29-Ubuntu SMP Wed Jun 14 15:54:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

uname -a

Linux ip-10-70-85-28 4.4.0-1020-aws #29-Ubuntu SMP Wed Jun 14 15:54:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Container configuration

nginx_conf
/opt/globality/nginx_conf/heracles

generic_nginx_conf
/opt/globality/nginx_conf/frontend/heracles

haproxy_conf
/opt/globality/haproxy

ecs_conf
/etc/ecs/ecs.config

mount points

      "mountPoints": [
        {
          "containerPath": "/etc/nginx/sites-enabled",
          "sourceVolume": "nginx_conf",
          "readOnly": true
        },
        {
          "containerPath": "/etc/nginx",
          "sourceVolume": "generic_nginx_conf",
          "readOnly": true
        }
      ],

As you can see, one of the mount points is mapped internally to another mount point. You have /etc/nginx and /etc/nginx/sites-enabled.

When doing this with docker run manually, it had no issues. Not on west and not on east which is what took a long time to debug and understand. This only happened through the scheduler.

Also tracked: #658 and under runc.

I started taking a look into the code. Maybe warning the scheduler and outputting an error that will say the child paths cannot be mounted will be helpful.

More data points.

The initial version of this, the mount point came from an EFS mounted drive.
Even when the mount is completely local, this still did not work
On us-east-1 after starting the container manually one time, it kept working with the scheduler
on us-west-2, it did not recover until I fixed the mount point to not be in a child path.

The text was updated successfully, but these errors were encountered:

richardpen · 2017-08-28T20:59:19Z

@KensoDev Can you provide more details regarding the environment, what's the docker version and ecs-agent version?

I don't think this was caused by that one mount points inside another mount point, as I tried the following task-definition which works fine for me on both ECS Optmized AMI(17.03.1-ce) and ubuntu(17.05.0-ce) , were you able to use the ecs-agent to launch other task and can you also share me with a task-definition that can reproduce this problem?

{
    "family": "mount-volumes",
    "containerDefinitions": [
        {
        "name": "container",
        "image": "ubuntu",
        "cpu": 10,
        "memory": 100,
        "command": ["sh", "-c", "while [ true ]; do sleep 1s; date +%T; done"],
        "mountPoints":[
            {
                "sourceVolume": "volume1",
                "containerPath": "/ecs/mount",
                "readOnly": true
            },
            {
                "sourceVolume": "volume2",
                "containerPath": "/ecs/mount/child",
                "readOnly": true
            }
        ]
        }
    ],
    "volumes":[
        {
            "name": "volume1",
            "host":{
                "sourcePath": "/home/ec2-user/volume1"
            }
        },
        {
            "name": "volume2",
            "host": {
                "sourcePath": "/opt"
            }
        }
    ]
}

Can you also check the permission about the directory /var/lib/docker/overlay, as from the error message it was caused by the command mkdir.

mkdir /var/lib/docker/overlay/9fe448624c1af3020353aa8e56ed126fa247ef997fdd56daf582e41ccd383cf3/merged/etc/nginx/sites-enabled: read-only file system

Thanks,
Peng

KensoDev · 2017-08-28T22:16:42Z

So. I reproduced this again just now

here's the task definition (only things marked REDACTED) are names. Other than that it is verbatim our deployed definition

The error:

docker logs a11374f7c06e
container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"rootfs_linux.go:54: mounting \\\"/opt/globality/nginx_conf/heracles\\\" to rootfs \\\"/var/lib/docker/overlay/db673a284ec0a66bfdc6f528384a0f298f3db8141890ff9a142ea0569ad61cdd/merged\\\" at \\\"/var/lib/docker/overlay/db673a284ec0a66bfdc6f528384a0f298f3db8141890ff9a142ea0569ad61cdd/merged/etc/nginx/sites-enabled\\\" caused \\\"mkdir /var/lib/docker/overlay/db673a284ec0a66bfdc6f528384a0f298f3db8141890ff9a142ea0569ad61cdd/merged/etc/nginx/sites-enabled: read-only file system\\\"\""

ll /var/lib/docker/overlay/db673a284ec0a66bfdc6f528384a0f298f3db8141890ff9a142ea0569ad61cdd/merged
total 8
drwx------ 2 root root 4096 Aug 28 22:12 ./
drwx------ 5 root root 4096 Aug 28 22:12 ../

AMI

ami-id: ami-4ec9d437

Task definition:

{
    "networkMode": "bridge",
    "taskRoleArn": "REDACTED",
    "containerDefinitions": [
        {
            "volumesFrom": [],
            "memory": 100,
            "extraHosts": null,
            "dnsServers": null,
            "disableNetworking": null,
            "dnsSearchDomains": null,
            "portMappings": [
                {
                    "hostPort": 0,
                    "containerPort": 80,
                    "protocol": "tcp"
                }
            ],
            "hostname": null,
            "essential": true,
            "entryPoint": null,
            "mountPoints": [
                {
                    "containerPath": "/etc/nginx/sites-enabled",
                    "sourceVolume": "nginx_conf",
                    "readOnly": true
                },
                {
                    "containerPath": "/etc/nginx",
                    "sourceVolume": "generic_nginx_conf",
                    "readOnly": true
                }
            ],
            "name": "NAME",
            "ulimits": null,
            "dockerSecurityOptions": null,
            "environment": [
                {
                    "name": "ENV",
                    "value": "beta"
                },
                {
                    "name": "NAME",
                    "value": "NAME"
                }
            ],
            "links": null,
            "workingDirectory": null,
            "readonlyRootFilesystem": null,
            "image": "REDACTED",
            "command": null,
            "user": null,
            "dockerLabels": null,
            "logConfiguration": {
                "logDriver": "json-file",
                "options": {
                    "max-size": "1M",
                    "max-file": "3"
                }
            },
            "cpu": 0,
            "privileged": null,
            "memoryReservation": null
        }
    ],
    "volumes": [
        {
            "host": {
                "sourcePath": "/opt/globality/nginx_conf/NAME"
            },
            "name": "nginx_conf"
        },
        {
            "host": {
                "sourcePath": "/opt/globality/nginx_conf/generic"
            },
            "name": "generic_nginx_conf"
        }
    ],
    "family": "NAME-beta",
    "placementConstraints": []
}

Cluster docker information:

/var/lib/docker/overlay
total 676
drwx------ 133 root root 147456 Aug 28 22:18 ./
drwx--x--x  12 root root   4096 Aug 24 22:40 ../
drwx------   5 root root   4096 Aug 28 22:14 05f29c68b85beb23beedca5dedac4da75704ce42aa02c928d88467b4b653e150/
drwx------   5 root root   4096 Aug 28 22:14 05f29c68b85beb23beedca5dedac4da75704ce42aa02c928d88467b4b653e150-init/
drwx------   5 root root   4096 Aug 25 22:46 07f50fdbb5f98a9437d1dd754ba40f7b8af5a3386aa1073c8caf15fc7caa224f/
drwx------   5 root root   4096 Aug 25 22:46 07f50fdbb5f98a9437d1dd754ba40f7b8af5a3386aa1073c8caf15fc7caa224f-init/
drwx------   3 root root   4096 Aug 24 22:41 09f67187398673d9519ad24e2a6287d7890699b60f339600d34afca7fd25aa56/
drwx------   5 root root   4096 Aug 24 22:43 0d8bde171d3f6f129c31a56f166315af1df8098b68a161caad5cc19dd452d785/
drwx------   5 root root   4096 Aug 24 22:43 0d8bde171d3f6f129c31a56f166315af1df8098b68a161caad5cc19dd452d785-init/
drwx------   5 root root   4096 Aug 28 22:17 11c04ad1baa9b6ee0dff3e9ee0bed39d16e1f742ee853489046610f1895e1c77/
REDACTED... (it's long)

richardpen · 2017-08-28T22:20:45Z

@KensoDev Thanks for providing the information, but I didn't see the task definition, did you forget to paste here or you send it somewhere else?

Edit, looks like there is some delay, I saw it now, will verify and get back to you. Thanks!

KensoDev · 2017-08-28T22:20:47Z

@richardpen I added all the information you asked for. Let me know if something else is unclear.

KensoDev · 2017-08-28T22:21:09Z

@richardpen task definition pasted in the comment

KensoDev · 2017-08-28T22:21:32Z

Here it is again

{
    "networkMode": "bridge",
    "taskRoleArn": "REDACTED",
    "containerDefinitions": [
        {
            "volumesFrom": [],
            "memory": 100,
            "extraHosts": null,
            "dnsServers": null,
            "disableNetworking": null,
            "dnsSearchDomains": null,
            "portMappings": [
                {
                    "hostPort": 0,
                    "containerPort": 80,
                    "protocol": "tcp"
                }
            ],
            "hostname": null,
            "essential": true,
            "entryPoint": null,
            "mountPoints": [
                {
                    "containerPath": "/etc/nginx/sites-enabled",
                    "sourceVolume": "nginx_conf",
                    "readOnly": true
                },
                {
                    "containerPath": "/etc/nginx",
                    "sourceVolume": "generic_nginx_conf",
                    "readOnly": true
                }
            ],
            "name": "NAME",
            "ulimits": null,
            "dockerSecurityOptions": null,
            "environment": [
                {
                    "name": "ENV",
                    "value": "beta"
                },
                {
                    "name": "NAME",
                    "value": "NAME"
                }
            ],
            "links": null,
            "workingDirectory": null,
            "readonlyRootFilesystem": null,
            "image": "REDACTED",
            "command": null,
            "user": null,
            "dockerLabels": null,
            "logConfiguration": {
                "logDriver": "json-file",
                "options": {
                    "max-size": "1M",
                    "max-file": "3"
                }
            },
            "cpu": 0,
            "privileged": null,
            "memoryReservation": null
        }
    ],
    "volumes": [
        {
            "host": {
                "sourcePath": "/opt/globality/nginx_conf/NAME"
            },
            "name": "nginx_conf"
        },
        {
            "host": {
                "sourcePath": "/opt/globality/nginx_conf/generic"
            },
            "name": "generic_nginx_conf"
        }
    ],
    "family": "NAME-beta",
    "placementConstraints": []
}

nmeyerhans · 2017-08-29T17:22:55Z

Strange that this would seem to have different symptoms based on region. The Ubuntu release and docker version are identical? There are no regional variances in the ECS Agent; the binary is identical for all regions.

KensoDev · 2017-08-29T17:24:59Z

@nmeyerhans This threw me off the most as well.
but it's 100% verified

us-east - once I run the docker container once manually. the scheduler will pick it up and continue working
us-west - same thing lead to a different result, the scheduler was not able to recover.

TBH, this is what led me to debug this more since I had a recovery path on the other env that I could easily "bypass" the bug with.

richardpen · 2017-09-12T22:50:33Z

@KensoDev Sorry for the late response, I can reproduce this issue with the provided task definition. But if you removed the line "readOnly": true in your container definition it will work.

The reason here is that you have a nested mount path: /etc/nginx/sites-enabled and /etc/nginx in your container. If /etc/nginx is marked read only, then /etc/nginx/sites-enabled wouldn't be able to create. So if you removed the readonly permission for /etc/nginx then the task definition should work.

Please let us know if that solves your problem, thanks.

KensoDev · 2017-09-12T23:13:33Z

I resolved it another way but not nesting it under one another, but I would certainly imagine that this will solve the issue.

richardpen added the more info needed label Aug 28, 2017

KensoDev mentioned this issue Aug 28, 2017

Feature/debugging ecs agent volume runc integration #953

Closed

richardpen removed the more info needed label Aug 28, 2017

KensoDev closed this as completed Sep 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused... #951

oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused... #951

KensoDev commented Aug 25, 2017 •

edited

Loading

richardpen commented Aug 28, 2017

KensoDev commented Aug 28, 2017 •

edited

Loading

richardpen commented Aug 28, 2017 •

edited

Loading

KensoDev commented Aug 28, 2017

KensoDev commented Aug 28, 2017

KensoDev commented Aug 28, 2017

nmeyerhans commented Aug 29, 2017

KensoDev commented Aug 29, 2017

richardpen commented Sep 12, 2017

KensoDev commented Sep 12, 2017

oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused... #951

oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused... #951

Comments

KensoDev commented Aug 25, 2017 • edited Loading

Summary

Description

richardpen commented Aug 28, 2017

KensoDev commented Aug 28, 2017 • edited Loading

richardpen commented Aug 28, 2017 • edited Loading

KensoDev commented Aug 28, 2017

KensoDev commented Aug 28, 2017

KensoDev commented Aug 28, 2017

nmeyerhans commented Aug 29, 2017

KensoDev commented Aug 29, 2017

richardpen commented Sep 12, 2017

KensoDev commented Sep 12, 2017

KensoDev commented Aug 25, 2017 •

edited

Loading

KensoDev commented Aug 28, 2017 •

edited

Loading

richardpen commented Aug 28, 2017 •

edited

Loading