Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our images now fail to run with OCI error #1024

Open
webmutation opened this issue Feb 4, 2020 · 20 comments
Open

Our images now fail to run with OCI error #1024

webmutation opened this issue Feb 4, 2020 · 20 comments
Labels
area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) in progress kind/bug Something isn't working priority/p0 Highest priority. Break user flow. We are actively looking at delivering it. regression

Comments

@webmutation
Copy link

webmutation commented Feb 4, 2020

Actual behavior
The new images return Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: "/usr/bin/java": stat /usr/bin/java: no such file or directory": unknown

Expected behavior
Previously it run without issue. No OCI runtime

To Reproduce

  1. Build image with Kaniko
  2. Try to run image docker run
    Additional Information
FROM openjdk:8-jre-slim
 
# Expose ports to enable running the service
# Ports should be standardized to make it easier to debug
# Exposing two services in the same port can create conflicts
 
ENV PORT 8080
EXPOSE 8080
 
# List of ARGS input from Kaniko Build
ARG IMAGE_DATE
ARG VCS_REVISION
ARG VCS_SEMVER
ARG PKG_WORKDIR
 
# Labeling based on https://github.com/opencontainers/image-spec/blob/master/annotations.md
LABEL org.opencontainers.image.created="${IMAGE_DATE}"              \
      org.opencontainers.image.revision="${VCS_REVISION}"           \
      org.opencontainers.image.version="${VCS_SEMVER}"              \
      org.opencontainers.image.title="mytitle"                      \
      org.opencontainers.image.description="mydescription"          \
      org.opencontainers.image.authors="myauthors"                  \
      org.opencontainers.image.vendor="myvendor"                    \
      org.opencontainers.image.url="myurl"                          \
      org.opencontainers.image.documentation="mydocumentationlink"  \
      org.opencontainers.image.source="mygitrepourl"
 
# Copy of distribution/target folder artifacts
# In case additional Artifacts are required
 
# All containers should run in least privileged mode, meaning not ROOT.
# NOTE: On OpenShift there is a warning when you try to run as ROOT
RUN addgroup -g 1001 -S cc && \
    adduser -u 1001 -S -G cc cc && \
    chown -R 1001:0 /home/cc && \
    chmod -R g=u /home/cc
     
COPY --chown=1001:0 ${PKG_WORKDIR}/target/*.jar /home/cc/service.jar
 
USER 1001
# Command to initialize the service
CMD ["/usr/bin/java", "-jar", "home/cc/service.jar"]
@RemcodM
Copy link

RemcodM commented Feb 4, 2020

I have similar problems, but they already occur when performing a docker pull on a freshly build image with kaniko debug-v0.17.0.

The build seems to go fine without any errors, but when pulling the image, I get random errors related to missing files, what the error exactly details depends on the image.

For now, I have gone back to debug-v0.16.0 which builds the same images fine... No errors when pulling or running.

@liemdo
Copy link

liemdo commented Feb 4, 2020

We get a different error and cannot build the image in Google Cloud Build error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/docker.sock: device or resource busy.

@tejal29
Copy link
Contributor

tejal29 commented Feb 4, 2020

@liemdo are you mounting docker.sock in your Kaniko build? Can you please specify your Dockerfile and kaniko command

@tejal29
Copy link
Contributor

tejal29 commented Feb 4, 2020

@liemdo PR in progress. #1025

Patch fix coming soon.

@webmutation
Copy link
Author

Indeed fix for us was force 0.16 on pipeline. Only a dozen microservices affected.

@tejal29
Copy link
Contributor

tejal29 commented Feb 4, 2020

@webmutation Unfortunately i am not able to reproduce your error. Would it be possible for you to give us -v=trace logs?

@tejal29
Copy link
Contributor

tejal29 commented Feb 4, 2020

We get a different error and cannot build the image in Google Cloud Build error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/docker.sock: device or resource busy.

@liemdo That error should be fixed by v0.17.1 release due to #1025

@tejal29
Copy link
Contributor

tejal29 commented Feb 4, 2020

I have similar problems, but they already occur when performing a docker pull on a freshly build image with kaniko debug-v0.17.0.

The build seems to go fine without any errors, but when pulling the image, I get random errors related to missing files, what the error exactly details depends on the image.

For now, I have gone back to debug-v0.16.0 which builds the same images fine... No errors when pulling or running.

@RemcodM Sorry to hear about that. Were your issues related to files in /var/run Is yes we fixed that.
Please let me know if you are having other issues.

@webmutation
Copy link
Author

webmutation commented Feb 5, 2020

@tejal29 I will update the pipeline to 0.17.1 and see if the problem is related and solved by the patch.

EDIT: Same issue still, this only happens with FROM openjdk:8-jre-alpine not sure why. For the time being we are stuck on 0.16.

@afirth
Copy link

afirth commented Feb 5, 2020

I have similar problems, but they already occur when performing a docker pull on a freshly build image with kaniko debug-v0.17.0.
The build seems to go fine without any errors, but when pulling the image, I get random errors related to missing files, what the error exactly details depends on the image.
For now, I have gone back to debug-v0.16.0 which builds the same images fine... No errors when pulling or running.

@RemcodM Sorry to hear about that. Were your issues related to files in /var/run Is yes we fixed that.
Please let me know if you are having other issues.

#1028 for this one I think

@RemcodM
Copy link

RemcodM commented Feb 5, 2020

@afirth Indeed, #1028 seems like the problem I am experiencing with 0.17.0 (and thus also with 0.17.1). So if this issue is unrelated it can be closed.

@webmutation
Copy link
Author

Bad image creation with OCI error continues with v0.17.1.

@cvgw
Copy link
Contributor

cvgw commented Feb 8, 2020

I suspect this is related to #1039

@cvgw cvgw added area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) in progress kind/bug Something isn't working priority/p0 Highest priority. Break user flow. We are actively looking at delivering it. labels Feb 8, 2020
@cvgw cvgw self-assigned this Feb 8, 2020
@cvgw
Copy link
Contributor

cvgw commented Feb 25, 2020

We've committed a change which I believe will fix this. If anyone feels like testing tags a1af057 and debug-a1af057f997316bfb1c4d2d82719d78481a02a79 have the new code

@bitsofinfo
Copy link

experiencing the same

@bitsofinfo
Copy link

only reverting to 0.15.0 lets me actually get working images

@tejal29
Copy link
Contributor

tejal29 commented Mar 6, 2020

@webmutation i verfied your Dockerfile on latest master build
I changed the base image to openjdk:8

FROM openjdk:8

# Expose ports to enable running the service
# Ports should be standardized to make it easier to debug
# Exposing two services in the same port can create conflicts

ENV PORT 8080
EXPOSE 8080

# List of ARGS input from Kaniko Build
ARG IMAGE_DATE
ARG VCS_REVISION
ARG VCS_SEMVER
ARG PKG_WORKDIR

# Labeling based on https://github.com/opencontainers/image-spec/blob/master/annotations.md
LABEL org.opencontainers.image.created="${IMAGE_DATE}"              \
      org.opencontainers.image.revision="${VCS_REVISION}"           \
      org.opencontainers.image.version="${VCS_SEMVER}"              \
      org.opencontainers.image.title="mytitle"                      \
      org.opencontainers.image.description="mydescription"          \
      org.opencontainers.image.authors="myauthors"                  \
      org.opencontainers.image.vendor="myvendor"                    \
      org.opencontainers.image.url="myurl"                          \
      org.opencontainers.image.documentation="mydocumentationlink"  \
      org.opencontainers.image.source="mygitrepourl"

# Copy of distribution/target folder artifacts
# In case additional Artifacts are required

# All containers should run in least privileged mode, meaning not ROOT.
# NOTE: On OpenShift there is a warning when you try to run as ROOT
RUN addgroup -gid 1001 -system cc && \
    adduser cc -u 1001 -system -gid 1001 && \
    chown -R 1001:0 /home/cc && \
    chmod -R g=u /home/cc

COPY --chown=1001:0 target/*.jar /home/cc/service.jar

USER 1001
# Command to initialize the service
CMD ["/usr/bin/java", "-jar", "home/cc/service.jar"]

I ran the following command

 docker run -v /usr/local/google/home/tejaldesai/.config/gcloud:/root/.config/gcloud -v /usr/local/google/home/tejaldesai/workspace/kaniko/integration:/workspace gcr.io/tejal-test/executor:debug -f dockerfiles/Dockerfile1 --context=dir://workspace --destination=gcr.io/tejal-test/test_10241
INFO[0014] Applying label org.opencontainers.image.created= 
INFO[0014] Applying label org.opencontainers.image.revision= 
INFO[0014] Applying label org.opencontainers.image.version= 
INFO[0014] Applying label org.opencontainers.image.title=mytitle 
INFO[0014] Applying label org.opencontainers.image.description=mydescription 
INFO[0014] Applying label org.opencontainers.image.authors=myauthors 
INFO[0014] Applying label org.opencontainers.image.vendor=myvendor 
INFO[0014] Applying label org.opencontainers.image.url=myurl 
INFO[0014] Applying label org.opencontainers.image.documentation=mydocumentationlink 
INFO[0014] Applying label org.opencontainers.image.source=mygitrepourl 
INFO[0014] RUN addgroup -gid 1001 -system cc &&     adduser cc -u 1001 -system -gid 1001 &&     chown -R 1001:0 /home/cc &&     chmod -R g=u /home/cc 
INFO[0014] cmd: /bin/sh                                 
INFO[0014] args: [-c addgroup -gid 1001 -system cc &&     adduser cc -u 1001 -system -gid 1001 &&     chown -R 1001:0 /home/cc &&     chmod -R g=u /home/cc] 
Adding group `cc' (GID 1001) ...
Done.
Adding system user `cc' (UID 1001) ...
Adding new user `cc' (UID 1001) with group `cc' ...
Creating home directory `/home/cc' ...
INFO[0014] Taking snapshot of full filesystem...        
INFO[0014] Resolving paths                              
INFO[0016] Resolving srcs [target/*.jar]...             
INFO[0016] COPY --chown=1001:0 target/*.jar /home/cc/service.jar 
INFO[0016] Resolving srcs [target/*.jar]...             
INFO[0016] Resolving paths                              
INFO[0016] Taking snapshot of files...                  
INFO[0016] USER 1001                                    
INFO[0016] cmd: USER                                    
INFO[0016] CMD ["/usr/bin/java", "-jar", "home/cc/service.jar"] 

I docker ran the new image

tejaldesai@@kaniko (r-v0.18.0)$ docker run gcr.io/tejal-test/test_10241
no main manifest attribute, in home/cc/service.jar

and as expected it complains no main in manifest.

``

kwant-bot pushed a commit to kwant-project/kwant that referenced this issue Mar 20, 2020
Temporary fix for CI. Correspondent Kaniko issue:
GoogleContainerTools/kaniko#1024
@cvgw cvgw removed their assignment Mar 27, 2020
RafalSkolasinski pushed a commit to quantum-tinkerer/research-docker that referenced this issue Mar 30, 2020
@imranismail
Copy link

This is still happening even post 1.0, we've been stuck at 0.16 for while now

@dabeeeenster
Copy link

We're seeing something similar trying to use kaniko to build docker images as part of a gitlab-runner pipeline:

root@dev:~# docker --version
Docker version 20.10.0, build 7287ab3

root@dev:~# gitlab-runner --version
Version:      12.9.0
Git revision: 4c96e5ad
Git branch:   12-9-stable
GO version:   go1.13.8
Built:        2020-03-20T13:01:56+0000
OS/Arch:      linux/amd64

This is a fragment from our gitlab-ci.yml:

build-dockerhub:
  stage: build
  image:
    # TODO: use latest instead of debug once we get to the bottom of issue using latest tag
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  variables:
    DOCKER_HUB_AUTH: $DOCKER_HUB_AUTH
  script:
    - if [ "$CI_COMMIT_REF_NAME" == "master" ]; then IMAGE_TAG="latest"; else IMAGE_TAG=$CI_COMMIT_REF_SLUG; fi
    - echo $CI_COMMIT_REF_NAME > $CI_PROJECT_DIR/src/CI_COMMIT_REF_NAME
    - echo $CI_COMMIT_SHA > $CI_PROJECT_DIR/src/CI_COMMIT_SHA
    - echo $IMAGE_TAG > $CI_PROJECT_DIR/src/IMAGE_TAG
    - echo "{\"auths\":{\"https://index.docker.io/v1/\":{\"auth\":\"$DOCKER_HUB_AUTH\"}}}" > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/docker/Dockerfile --destination flagsmith/flagsmith-api:$IMAGE_TAG

For some reason, using debug in place of latest fixes the issue.

@TBG-FR
Copy link

TBG-FR commented Oct 20, 2021

We're seeing something similar trying to use kaniko to build docker images as part of a gitlab-runner pipeline:

[...]

For some reason, using debug in place of latest fixes the issue.

From https://github.com/GoogleContainerTools/kaniko#debug-image

The kaniko executor image is based on scratch and doesn't contain a shell. We provide gcr.io/kaniko-project/executor:debug, a debug image which consists of the kaniko executor image along with a busybox shell to enter.

You can launch the debug image with a shell entrypoint:

docker run -it --entrypoint=/busybox/sh gcr.io/kaniko-project/executor:debug

You need to use a debug tagged image in order to achieve what you want with Gitlab CI (so do I...)
If you need a specific version, you can use version-debug, for example v1.7.0-debug (complete list here https://console.cloud.google.com/gcr/images/kaniko-project/GLOBAL/executor)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) in progress kind/bug Something isn't working priority/p0 Highest priority. Break user flow. We are actively looking at delivering it. regression
Projects
None yet
Development

No branches or pull requests

10 participants