Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused... #951

Closed
KensoDev opened this issue Aug 25, 2017 · 10 comments

Comments

@KensoDev
Copy link

KensoDev commented Aug 25, 2017

Summary

When trying to run a container with mounted volumed under ECS. Got this error. This is going to be a very detailed report since we got this error many times and debugged it to resolution.

Description

When trying to run a container with mounted volumes under ECS, got this error

017-08-24T21:55:25Z [INFO] TaskHandler, Sending container change: ContainerChange: arn:aws:ecs:us-west-2:308798440167:task/0b670bdf-d449-4678-870f-d6670dfb4823 heracles -> STOPPED, Reason CannotStartContainerError: API error (500): oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"rootfs_linux.go:54: mounting \\\"/opt/globality/nginx_conf/heracles\\\" to rootfs \\\"/var/lib/docker/overlay/9fe448624c1af3020353aa8e56ed126fa247ef997fdd56daf582e41ccd383cf3/merged\\\" at \\\"/var/lib/docker/overlay/9fe448624c1af3020353aa8e56ed126fa247ef997fdd56daf582e41ccd383cf3/merged/etc/nginx/sites-enabled\\\" caused \\\"mkdir /var/lib/docker/overlay/9fe448624c1af3020353aa8e56ed126fa247ef997fdd56daf582e41ccd383cf3/merged/etc/nginx/sites-enabled: read-only file system\\\"\""

Now, this has 2 symptoms, different on us-east-1 and us-west-2

us-east-1
After running the container manually on one of the instances in the cluster, the scheduler was able to recover and schedule containers across the cluster.

us-west-2
Even when running the container manually, it was not able to recover.

uname -a

Linux ip-10-50-72-174 4.4.0-1020-aws #29-Ubuntu SMP Wed Jun 14 15:54:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

uname -a

Linux ip-10-70-85-28 4.4.0-1020-aws #29-Ubuntu SMP Wed Jun 14 15:54:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Container configuration

nginx_conf
/opt/globality/nginx_conf/heracles

generic_nginx_conf
/opt/globality/nginx_conf/frontend/heracles

haproxy_conf
/opt/globality/haproxy

ecs_conf
/etc/ecs/ecs.config

mount points

      "mountPoints": [
        {
          "containerPath": "/etc/nginx/sites-enabled",
          "sourceVolume": "nginx_conf",
          "readOnly": true
        },
        {
          "containerPath": "/etc/nginx",
          "sourceVolume": "generic_nginx_conf",
          "readOnly": true
        }
      ],

As you can see, one of the mount points is mapped internally to another mount point. You have /etc/nginx and /etc/nginx/sites-enabled.

When doing this with docker run manually, it had no issues. Not on west and not on east which is what took a long time to debug and understand. This only happened through the scheduler.

Also tracked: #658 and under runc.

I started taking a look into the code. Maybe warning the scheduler and outputting an error that will say the child paths cannot be mounted will be helpful.

More data points.

  1. The initial version of this, the mount point came from an EFS mounted drive.
  2. Even when the mount is completely local, this still did not work
  3. On us-east-1 after starting the container manually one time, it kept working with the scheduler
  4. on us-west-2, it did not recover until I fixed the mount point to not be in a child path.
@richardpen
Copy link

@KensoDev Can you provide more details regarding the environment, what's the docker version and ecs-agent version?

I don't think this was caused by that one mount points inside another mount point, as I tried the following task-definition which works fine for me on both ECS Optmized AMI(17.03.1-ce) and ubuntu(17.05.0-ce) , were you able to use the ecs-agent to launch other task and can you also share me with a task-definition that can reproduce this problem?

{
    "family": "mount-volumes",
    "containerDefinitions": [
        {
        "name": "container",
        "image": "ubuntu",
        "cpu": 10,
        "memory": 100,
        "command": ["sh", "-c", "while [ true ]; do sleep 1s; date +%T; done"],
        "mountPoints":[
            {
                "sourceVolume": "volume1",
                "containerPath": "/ecs/mount",
                "readOnly": true
            },
            {
                "sourceVolume": "volume2",
                "containerPath": "/ecs/mount/child",
                "readOnly": true
            }
        ]
        }
    ],
    "volumes":[
        {
            "name": "volume1",
            "host":{
                "sourcePath": "/home/ec2-user/volume1"
            }
        },
        {
            "name": "volume2",
            "host": {
                "sourcePath": "/opt"
            }
        }
    ]
}

Can you also check the permission about the directory /var/lib/docker/overlay, as from the error message it was caused by the command mkdir.

mkdir /var/lib/docker/overlay/9fe448624c1af3020353aa8e56ed126fa247ef997fdd56daf582e41ccd383cf3/merged/etc/nginx/sites-enabled: read-only file system

Thanks,
Peng

@KensoDev
Copy link
Author

KensoDev commented Aug 28, 2017

So. I reproduced this again just now

here's the task definition (only things marked REDACTED) are names. Other than that it is verbatim our deployed definition

The error:

docker logs a11374f7c06e
container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"rootfs_linux.go:54: mounting \\\"/opt/globality/nginx_conf/heracles\\\" to rootfs \\\"/var/lib/docker/overlay/db673a284ec0a66bfdc6f528384a0f298f3db8141890ff9a142ea0569ad61cdd/merged\\\" at \\\"/var/lib/docker/overlay/db673a284ec0a66bfdc6f528384a0f298f3db8141890ff9a142ea0569ad61cdd/merged/etc/nginx/sites-enabled\\\" caused \\\"mkdir /var/lib/docker/overlay/db673a284ec0a66bfdc6f528384a0f298f3db8141890ff9a142ea0569ad61cdd/merged/etc/nginx/sites-enabled: read-only file system\\\"\""
ll /var/lib/docker/overlay/db673a284ec0a66bfdc6f528384a0f298f3db8141890ff9a142ea0569ad61cdd/merged
total 8
drwx------ 2 root root 4096 Aug 28 22:12 ./
drwx------ 5 root root 4096 Aug 28 22:12 ../

AMI

ami-id: ami-4ec9d437

Task definition:

{
    "networkMode": "bridge",
    "taskRoleArn": "REDACTED",
    "containerDefinitions": [
        {
            "volumesFrom": [],
            "memory": 100,
            "extraHosts": null,
            "dnsServers": null,
            "disableNetworking": null,
            "dnsSearchDomains": null,
            "portMappings": [
                {
                    "hostPort": 0,
                    "containerPort": 80,
                    "protocol": "tcp"
                }
            ],
            "hostname": null,
            "essential": true,
            "entryPoint": null,
            "mountPoints": [
                {
                    "containerPath": "/etc/nginx/sites-enabled",
                    "sourceVolume": "nginx_conf",
                    "readOnly": true
                },
                {
                    "containerPath": "/etc/nginx",
                    "sourceVolume": "generic_nginx_conf",
                    "readOnly": true
                }
            ],
            "name": "NAME",
            "ulimits": null,
            "dockerSecurityOptions": null,
            "environment": [
                {
                    "name": "ENV",
                    "value": "beta"
                },
                {
                    "name": "NAME",
                    "value": "NAME"
                }
            ],
            "links": null,
            "workingDirectory": null,
            "readonlyRootFilesystem": null,
            "image": "REDACTED",
            "command": null,
            "user": null,
            "dockerLabels": null,
            "logConfiguration": {
                "logDriver": "json-file",
                "options": {
                    "max-size": "1M",
                    "max-file": "3"
                }
            },
            "cpu": 0,
            "privileged": null,
            "memoryReservation": null
        }
    ],
    "volumes": [
        {
            "host": {
                "sourcePath": "/opt/globality/nginx_conf/NAME"
            },
            "name": "nginx_conf"
        },
        {
            "host": {
                "sourcePath": "/opt/globality/nginx_conf/generic"
            },
            "name": "generic_nginx_conf"
        }
    ],
    "family": "NAME-beta",
    "placementConstraints": []
}

Cluster docker information:

img

/var/lib/docker/overlay
total 676
drwx------ 133 root root 147456 Aug 28 22:18 ./
drwx--x--x  12 root root   4096 Aug 24 22:40 ../
drwx------   5 root root   4096 Aug 28 22:14 05f29c68b85beb23beedca5dedac4da75704ce42aa02c928d88467b4b653e150/
drwx------   5 root root   4096 Aug 28 22:14 05f29c68b85beb23beedca5dedac4da75704ce42aa02c928d88467b4b653e150-init/
drwx------   5 root root   4096 Aug 25 22:46 07f50fdbb5f98a9437d1dd754ba40f7b8af5a3386aa1073c8caf15fc7caa224f/
drwx------   5 root root   4096 Aug 25 22:46 07f50fdbb5f98a9437d1dd754ba40f7b8af5a3386aa1073c8caf15fc7caa224f-init/
drwx------   3 root root   4096 Aug 24 22:41 09f67187398673d9519ad24e2a6287d7890699b60f339600d34afca7fd25aa56/
drwx------   5 root root   4096 Aug 24 22:43 0d8bde171d3f6f129c31a56f166315af1df8098b68a161caad5cc19dd452d785/
drwx------   5 root root   4096 Aug 24 22:43 0d8bde171d3f6f129c31a56f166315af1df8098b68a161caad5cc19dd452d785-init/
drwx------   5 root root   4096 Aug 28 22:17 11c04ad1baa9b6ee0dff3e9ee0bed39d16e1f742ee853489046610f1895e1c77/
REDACTED... (it's long)

@richardpen
Copy link

richardpen commented Aug 28, 2017

@KensoDev Thanks for providing the information, but I didn't see the task definition, did you forget to paste here or you send it somewhere else?

Edit, looks like there is some delay, I saw it now, will verify and get back to you. Thanks!

@KensoDev
Copy link
Author

@richardpen I added all the information you asked for. Let me know if something else is unclear.

@KensoDev
Copy link
Author

@richardpen task definition pasted in the comment

@KensoDev
Copy link
Author

Here it is again

{
    "networkMode": "bridge",
    "taskRoleArn": "REDACTED",
    "containerDefinitions": [
        {
            "volumesFrom": [],
            "memory": 100,
            "extraHosts": null,
            "dnsServers": null,
            "disableNetworking": null,
            "dnsSearchDomains": null,
            "portMappings": [
                {
                    "hostPort": 0,
                    "containerPort": 80,
                    "protocol": "tcp"
                }
            ],
            "hostname": null,
            "essential": true,
            "entryPoint": null,
            "mountPoints": [
                {
                    "containerPath": "/etc/nginx/sites-enabled",
                    "sourceVolume": "nginx_conf",
                    "readOnly": true
                },
                {
                    "containerPath": "/etc/nginx",
                    "sourceVolume": "generic_nginx_conf",
                    "readOnly": true
                }
            ],
            "name": "NAME",
            "ulimits": null,
            "dockerSecurityOptions": null,
            "environment": [
                {
                    "name": "ENV",
                    "value": "beta"
                },
                {
                    "name": "NAME",
                    "value": "NAME"
                }
            ],
            "links": null,
            "workingDirectory": null,
            "readonlyRootFilesystem": null,
            "image": "REDACTED",
            "command": null,
            "user": null,
            "dockerLabels": null,
            "logConfiguration": {
                "logDriver": "json-file",
                "options": {
                    "max-size": "1M",
                    "max-file": "3"
                }
            },
            "cpu": 0,
            "privileged": null,
            "memoryReservation": null
        }
    ],
    "volumes": [
        {
            "host": {
                "sourcePath": "/opt/globality/nginx_conf/NAME"
            },
            "name": "nginx_conf"
        },
        {
            "host": {
                "sourcePath": "/opt/globality/nginx_conf/generic"
            },
            "name": "generic_nginx_conf"
        }
    ],
    "family": "NAME-beta",
    "placementConstraints": []
}

@nmeyerhans
Copy link
Contributor

Strange that this would seem to have different symptoms based on region. The Ubuntu release and docker version are identical? There are no regional variances in the ECS Agent; the binary is identical for all regions.

@KensoDev
Copy link
Author

@nmeyerhans This threw me off the most as well.
but it's 100% verified

us-east - once I run the docker container once manually. the scheduler will pick it up and continue working
us-west - same thing lead to a different result, the scheduler was not able to recover.

TBH, this is what led me to debug this more since I had a recovery path on the other env that I could easily "bypass" the bug with.

@richardpen
Copy link

@KensoDev Sorry for the late response, I can reproduce this issue with the provided task definition. But if you removed the line "readOnly": true in your container definition it will work.

The reason here is that you have a nested mount path: /etc/nginx/sites-enabled and /etc/nginx in your container. If /etc/nginx is marked read only, then /etc/nginx/sites-enabled wouldn't be able to create. So if you removed the readonly permission for /etc/nginx then the task definition should work.

Please let us know if that solves your problem, thanks.

@KensoDev
Copy link
Author

I resolved it another way but not nesting it under one another, but I would certainly imagine that this will solve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants