Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oci runtime error: container_linux.go when trying to start ecs task #658

Closed
jamesongithub opened this issue Jan 4, 2017 · 18 comments
Closed

Comments

@jamesongithub
Copy link

jamesongithub commented Jan 4, 2017

ecs-agent log

ecs-agent.log.2017-01-04-22:2017-01-04T22:32:36Z [INFO] Pulling container module="TaskEngine" task="my-task:57 arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (NONE->RUNNING) Containers: [my-task (NONE->RUNNING),]" container="my-task(111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest) (NONE->RUNNING)"
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:36Z [INFO] Creating container module="TaskEngine" task="my-task:57 arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (NONE->RUNNING) Containers: [my-task (PULLED->RUNNING),]" container="my-task(111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest) (PULLED->RUNNING)"
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:36Z [INFO] Created container name mapping for task my-task:57 arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (NONE->RUNNING) Containers: [my-task (PULLED->RUNNING),] - my-task(111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest) (PULLED->RUNNING) -> ecs-my-task-57-my-task-a0b4d9e6c0fbe4fff001
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [INFO] Created docker container for task my-task:57 arn:aws:ecs:us-east-1:111111111111:task196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (CREATED->RUNNING) Containers: [my-task (CREATED->RUNNING),]: my-task(111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest) (CREATED->RUNNING) -> b4b69da99da2e631d0c6b3eaffc17743aa8553d477f04d099f6de6cf3f228041
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [INFO] Starting container module="TaskEngine" task="my-task:57 arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (CREATED->RUNNING) Containers: [my-task (CREATED->RUNNING),]" container="my-task(111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest) (CREATED->RUNNING)"
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [INFO] Redundant container state change for task my-task:57 arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (CREATED->RUNNING) Containers: [my-task (CREATED->RUNNING),]: my-task(111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest) (CREATED->RUNNING) to CREATED, but already CREATED
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [INFO] Error transitioning container module="TaskEngine" task="my-task:57 arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (CREATED->RUNNING) Containers: [my-task (CREATED->RUNNING),]" container="my-task(111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest) (CREATED->RUNNING)" state="RUNNING"
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [WARN] Error with docker; stopping container module="TaskEngine" task="my-task:57 arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (CREATED->RUNNING) Containers: [my-task (RUNNING->RUNNING),]" container="my-task(111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest) (RUNNING->RUNNING)" err="API error (500): invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 6\\\"\"\n"
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c Status:STOPPED Reason: SentStatus:NONE}"
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c my-task -> STOPPED, Reason CannotStartContainerError: API error (500): invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 6\\\"\"\n"
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c -> STOPPED, Known Sent: NONE"
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c my-task -> STOPPED, Reason CannotStartContainerError: API error (500): invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 6\\\"\"\n"
ecs-agent.log.2017-01-04-22:, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c my-task -> STOPPED, Reason CannotStartContainerError: API error (500): invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 6\\\"\"\n"
ecs-agent.log.2017-01-04-22:2017-01-04T22:32:37Z [INFO] Sending task change module="eventhandler" event="TaskChange: arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c -> STOPPED, Known Sent: NONE" change="TaskChange: arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c -> STOPPED, Known Sent: NONE"
ecs-agent.log.2017-01-04-22:2017-01-04T22:33:37Z [INFO] Cleaning up task's containers and data module="TaskEngine" task="my-task:57 arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (STOPPED->STOPPED) Containers: [my-task (STOPPED->STOPPED),]"
ecs-agent.log.2017-01-04-22:2017-01-04T22:33:37Z [INFO] Removing container module="TaskEngine" task="my-task:57 arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c, Status: (STOPPED->STOPPED) Containers: [my-task (STOPPED->STOPPED),]" container="my-task(111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest) (STOPPED->STOPPED)"

docker log

docker.err:time="2017-01-04T22:32:36.815457647Z" level=debug msg="form data: {\"Cmd\":[],\"CpuShares\":80,\"Entrypoint\":null,\"Env\":[\"AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/v2/credentials/bc2f4779-750e-420d-a915-b156b020435c\"],\"HostConfig\":{\"Binds\":[\"/ecs-agent/logs/my-task:/my-task/logs\"],\"LogConfig\":{\"Type\":\"json-file\"},\"MemoryReservation\":5.36870912e+08,\"RestartPolicy\":{}},\"Hostname\":\"my-host\",\"Image\":\"111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest\",\"Labels\":{\"com.amazonaws.ecs.cluster\":\"my-cluster\",\"com.amazonaws.ecs.container-name\":\"my-task\",\"com.amazonaws.ecs.task-arn\":\"arn:aws:ecs:us-east-1:111111111111:task/196421ad-9b30-41fd-b919-1a5bd591a34c\",\"com.amazonaws.ecs.task-definition-family\":\"preprod-secrets-agent\",\"com.amazonaws.ecs.task-definition-version\":\"57\",\"env\":\"env\",\"service-version\":\"latest\",\"task-definition-version\":\"1483569098\"},\"Memory\":1.073741824e+09}"

docker info

Containers: 13
 Running: 9
 Paused: 0
 Stopped: 4
Images: 9
Server Version: 1.12.5
Storage Driver: devicemapper
 Pool Name: poolName
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: 
 Metadata file: 
 Data Space Used: 8.027 GB
 Data Space Total: 439.8 GB
 Data Space Available: 431.8 GB
 Metadata Space Used: 10.54 MB
 Metadata Space Total: 5.499 GB
 Metadata Space Available: 5.488 GB
 Thin Pool Minimum Free Space: 43.98 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.77 (2012-10-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.13.0-96-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 14.93 GiB
Name: HOSTNAME
ID: EH3I:35DZ:XCTT:352Y:QJSC:KR7S:4GXF:T36I:Z7D2:SFT7:FVIO:G5TJ
Docker Root Dir: /root/dir/1.12.5-0
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 83
 Goroutines: 91
 System Time: 2017-01-04T22:58:32.68408782Z
 EventsListeners: 1
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

image

This happens intermittently. Have to blow away our ECS instance to fix.

@richardpen
Copy link

@jamesongithub Thanks for reporting to us, this issue seems to be related to opencontainers/runc#1130. Can you answer the question in this comment in that issue to help debug this problem.

Thanks

@jamesongithub
Copy link
Author

@richardpen

  1. No using user namespaces.
  2. No SELinux, but using AppArmor
  3. the docker-runc commands seem to work and I can run a container
  4. kernel version: Linux 3.13.0-96-generic x86_64
    distribution:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS"
NAME="Ubuntu"
VERSION="14.04.5 LTS, Trusty Tahr"

@aaithal
Copy link
Contributor

aaithal commented Jan 24, 2017

@jamesongithub It does seem like this is an issue with docker/runc as per opencontainers/runc#1130. If you're still seeing this, a quick way to verify this would be to repeatedly launch containers outside of ECS (using a script on the instance using docker commands) and see if that reproduces the issue. Can you please let us know if that's feasible for you? Also, if you can report your findings in the runc thread, it might help folks who are investigating that issue to work on a fix for this.

@jamesongithub
Copy link
Author

@aaithal When the issue was created, that's what was happening like 3/10 times. Haven't had to launch it recently. Pretty sure it's still an issue though.

@jamesongithub
Copy link
Author

image

@jamesongithub
Copy link
Author

image

@jamesongithub
Copy link
Author

@aaithal is there a workaround for this? it's basically a showstopper. have to kill the host to fix this.

@aaithal
Copy link
Contributor

aaithal commented Apr 7, 2017

@jamesongithub Apologies for the delayed response here. Since the linked issue is still being worked on, I'm not sure if we have a mitigation for you at this point of time. We'll let you know when we have more information here.

@jamesongithub
Copy link
Author

thanks

@samuelkarp
Copy link
Contributor

@jamesongithub Just checking in here; the latest on opencontainers/runc#1130 seems to suggest upgrading to Ubuntu 16.04. Are you still on Ubuntu 14.04 and experiencing these problems?

@jamesongithub
Copy link
Author

yes we're still on 14.04. in the process of upgrading to 16.04

@adnxn
Copy link
Contributor

adnxn commented Nov 27, 2017

@jamesongithub Did upgrading to Ubuntu 16.04 resolve this issue?

@jamesongithub
Copy link
Author

Haven't seen it since so most likely. Closing for now.

@web3rover
Copy link

@jamesongithub still facing same issue

@jamesongithub
Copy link
Author

@Narayanprusty can you upload your diagnosis info?

@jamesongithub jamesongithub reopened this May 9, 2019
@web3rover
Copy link

@jamesongithub it's an issue with my container. I have fixed it.

@manchicken
Copy link

@jamesongithub it's an issue with my container. I have fixed it.

Any word on what was wrong with the container?

@ryanjdillon
Copy link

ryanjdillon commented Sep 16, 2021

In my case, I am using CDK to define the JobDefinition, and passed a JobDefintionContainer struct defining only the things I was interested in (logConfiguration). This resulted in the command parameter of the container struct to default to an empty command (i.e. []). My Dockerfile only defined a command, not an entrypoint, so that got overriden with no command.

I changed my dockerfile to define a generic entrypoint python3, then passed the script to be run as the command in the command field to the JobDefinitionContainer struct.

e.g.

    const jobDefTreeId =
        new batch.JobDefinition(this, `JobDefinitionTreeId${props.stage}`, {
          container : {
            image : imageTreeId,
            vcpus: 1,
            command: ['run.py'],
            memoryLimitMiB: 1000,
            gpuCount: 0,
            volumes: [],
            logConfiguration : {
              logDriver : batch.LogDriver.AWSLOGS,
              options : {
                'awslogs-group' : batchLogGroup.logGroupName,
                'awslogs-stream-prefix' : 'treeid'
              }
            }
          }
        });

The real difficult part of sorting this out, was the fact that the Reason the container was failing was truncated, with the import tail end of the trace being excluded. Still haven't sorted THAT out, which is frustrating. The CLI, console, and a CloudWatch batch API logging event all yielded the same truncated reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants