Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaniko builds fail in Cloud Build without --additional-whitelist=/var/run #1001

Closed
dinvlad opened this issue Jan 25, 2020 · 36 comments
Closed
Assignees
Labels
area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) priority/p0 Highest priority. Break user flow. We are actively looking at delivering it.

Comments

@dinvlad
Copy link

dinvlad commented Jan 25, 2020

Actual behavior
Kaniko builds fail in Cloud Build, in standard configuration.

Expected behavior
Kaniko builds work just fine, without the extra --additional-whitelist=/var/run flag.

To Reproduce
Steps to reproduce the behavior:

  1. Configure a Cloud Build job in standard debug configuration:
steps:
  - name: gcr.io/kaniko-project/executor:debug
    args: ["--dockerfile=<path to Dockerfile within the build context>",
           "--context=dir://<path to build context>",
           "--destination=<gcr.io/$PROJECT/$IMAGE:$TAG>"]
  1. Observe that the build fails with
error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/docker.sock: device or resource busy
  1. Add --additional-whitelist=/var/run to the build step, and it succeeds.

Additional Information
Any Dockerfile (unverified).

Triage Notes for the Maintainers

Initially reported in #903 (comment)

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@ejose19
Copy link
Contributor

ejose19 commented Jan 25, 2020

Getting this since latest debug image @sha256:d39c342cd6cbf7b85d2ca57dcd51f2a704d027b9b87cc6b7264e1def9949a633

@StepanKuksenko
Copy link

also have this issue on latest debug image. Changed it to 0.15.0

@atikhono
Copy link

I've hit this as well. Hoping for a fix soon.

@ycreachcadec
Copy link

Same problem here

@keylowgee
Copy link

I ran into this issue this morning.
I was using the kaniko image image: gcr.io/kaniko-project/executor:debug
I changed my image to use an explicit version gcr.io/kaniko-project/executor:debug-v0.16.0

Appears to not have that /var/run issue with that version

@cvgw
Copy link
Contributor

cvgw commented Jan 27, 2020

This is probably expected behavior. Couple of notes.

I highly recommend that you set an explicit kaniko version in cloudbuild, the debug image is pushed on every commit to master. Use something like debug-v0.16.0.

The --additional-whitelist flag was added and /var/run removed from the default whitelist in this PR #973 which was not part of the v0.16.0 release.

So specifying --additional-whitelist="/var/run" replicates the previous, default behavior

I think this is expected behavior because this is probably one of the original reasons /var/run was whitelisted. Mounting a unix socket into a container is gonna require you to whitelist the path.

Unfortunately it is a breaking change for some builds, which was discussed in #973

cc @tejal29

@cvgw cvgw added the area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) label Jan 27, 2020
@dinvlad
Copy link
Author

dinvlad commented Jan 27, 2020

@cvgw In that case, could you update the README for Cloud Build so it's more clear that we have to set that flag explicitly? Thanks

@tejal29
Copy link
Contributor

tejal29 commented Jan 27, 2020

@dinvlad, @keylowgee, @artichaulo, @atikhono, @wintersolutions,
The gcr.io/kaniko-project/executor:debug now points to gcr.io/kaniko-project/executor:debug-v0.16.0

tejaldesai@@kaniko (remove_debug_trigger)$ docker tag gcr.io/kaniko-project/executor:debug-v0.16.0 gcr.io/kaniko-project/executor:debug
tejaldesai@@kaniko (remove_debug_trigger)$ docker push gcr.io/kaniko-project/executor:debug

We will also add a fix to retain the default behavior which is whitelisting /var/run so that the current users are not affected.

@tejal29 tejal29 self-assigned this Jan 28, 2020
@tejal29 tejal29 added the priority/p0 Highest priority. Break user flow. We are actively looking at delivering it. label Jan 28, 2020
@marekaf
Copy link

marekaf commented Jan 28, 2020

Now it might fail with --additional-whitelist due to #1006

@jdurzi
Copy link

jdurzi commented Jan 28, 2020

Now it might fail with --additional-whitelist due to #1006

Yeah, I could never get this flag to actually work. I guess I'll revert to 0.15.0 while this gets resolved in some way.

@douddle
Copy link

douddle commented Jan 28, 2020

+1
I rollback into 0.15.0 to solve the problem and get all my Gitlab CI job back to normal.

@tejal29
Copy link
Contributor

tejal29 commented Jan 30, 2020

hey folks, if you want to allow var\run in your images,

  1. Please wait for our release v0.17.0 scheduled on Friday.
  2. Use --whitelist-var-run=false with executor:v0.17.0 and executor:debug-v0.17.0

@tejal29
Copy link
Contributor

tejal29 commented Feb 3, 2020

Hey folks, our v0.17.0 release is out!

Please use --whitelist-var-run=false to include /var/run in your destination image.

@tejal29 tejal29 closed this as completed Feb 3, 2020
@dinvlad
Copy link
Author

dinvlad commented Feb 3, 2020

Could you add a note that we should use it in Cloud Build by default from now on? I.e. the default Cloud Build config suggested in README.md should be

steps:
  - name: gcr.io/kaniko-project/executor:latest
    args: ["--dockerfile=<path to Dockerfile within the build context>",
           "--context=dir://<path to build context>",
           "--destination=<gcr.io/$PROJECT/$IMAGE:$TAG>",
           "--whitelist-var-run=false"]

Otherwise, it seems like this solution is not any different from where it started..

@dinvlad
Copy link
Author

dinvlad commented Feb 3, 2020

Alternatively, would it be possible to make --whitelist-var-run=false the default instead, so that it preserves the previous argument lists in Cloud Build without having to explicitly specify --whitelist-var-run=false in every step?

Thanks

@ejose19
Copy link
Contributor

ejose19 commented Feb 3, 2020

@tejal29 I'm still getting the error after v0.17.0. Here's my config

- name: "gcr.io/kaniko-project/executor:v0.17.0"
    args:
      [
        "--dockerfile=Dockerfile",
        "--destination=<url>,
        "--whitelist-var-run=false",
      ]

I get

Step #1: error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/docker.sock: device or resource busy

Am I missing something? Using v15 without the whitelist flag just works.

@dinvlad
Copy link
Author

dinvlad commented Feb 3, 2020

Yes, and we also got that error in Cloud Build! I suspect it's because CB has those images cached (because 0.17.0 was re-tagged). Maybe we could try the full sha256-based image tag..

Would it be possible to detect Cloud Build environment and set this option automatically instead, so that DevOps don't have to apply this fix retroactively? Thanks

@ejose19
Copy link
Contributor

ejose19 commented Feb 3, 2020

@dinvlad Tried doing sha256 tag

Step #1: Already have image (with digest): gcr.io/kaniko-project/executor@sha256:c65c64d157bb6b1f15278e8ee28b02184e83e39340ddc25d346f18396c24da1d

but still get the error. I took the image from
https://console.cloud.google.com/gcr/images/kaniko-project/GLOBAL/executor

@dinvlad
Copy link
Author

dinvlad commented Feb 3, 2020

Yep, same behavior. FWIW, i've just reverted to debug-v0.16.0 and that seems to work great without any extra options. We'll keep using that version for now, until this issue is resolved..

@dinvlad
Copy link
Author

dinvlad commented Feb 3, 2020

To re-iterate, what we'd like to be able to see (ideally) is that Kaniko auto-detects Cloud Build environment, and automatically whitelists /var/run. If that's not possible, at least there should be (ideally) an option that preserves the old behavior by default (i.e. no need to whitelist /var/run explicitly), and then for those users who don't need to whitelist it (i.e. non-Cloud Build users), they will be able to --whitelist-var-run=false to disable this whitelisting explicitly..

Otherwise, the current behavior breaks all Cloud Build jobs that are using executor:latest or executor:debug images, by requiring DevOps teams to adjust every build manually (to add the new argument). In addition, the current behavior does not seem to work at all in Cloud Build, with or without the new option (and both for true or false), so the only workaround atm is to use a pinned version <=0.16.0 (which incidentally, also requires us to manually go and adjust all builds..)

@CAFxX
Copy link

CAFxX commented Feb 4, 2020

Just ran into this as well. Do you plan to roll out a fix soon, or should we roll back?

@afirth
Copy link

afirth commented Feb 4, 2020

/reopen

@stevehipwell
Copy link

This is broken when using the Kaniko debug image in GitLab CI.

@afirth
Copy link

afirth commented Feb 4, 2020

looks like #1021 is a dup of the bottom comments in here. broken for non debug image at 0.17.0 for everyone in cloudbuild i think

@tejal29
Copy link
Contributor

tejal29 commented Feb 4, 2020

@tejal29 I'm still getting the error after v0.17.0. Here's my config

- name: "gcr.io/kaniko-project/executor:v0.17.0"
    args:
      [
        "--dockerfile=Dockerfile",
        "--destination=<url>,
        "--whitelist-var-run=false",
      ]

I get

Step #1: error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/docker.sock: device or resource busy

Am I missing something? Using v15 without the whitelist flag just works.

hey folks, \var\run was always whitelisted in kaniko. If you were relying on previous kaniko behaviour which is \var\run is whitelisted, please do not use this flag.

@tejal29 tejal29 reopened this Feb 4, 2020
@tejal29
Copy link
Contributor

tejal29 commented Feb 4, 2020

To re-iterate, what we'd like to be able to see (ideally) is that Kaniko auto-detects Cloud Build environment, and automatically whitelists /var/run. If that's not possible, at least there should be (ideally) an option that preserves the old behavior by default (i.e. no need to whitelist /var/run explicitly), and then for those users who don't need to whitelist it (i.e. non-Cloud Build users), they will be able to --whitelist-var-run=false to disable this whitelisting explicitly..

Otherwise, the current behavior breaks all Cloud Build jobs that are using executor:latest or executor:debug images, by requiring DevOps teams to adjust every build manually (to add the new argument). In addition, the current behavior does not seem to work at all in Cloud Build, with or without the new option (and both for true or false), so the only workaround atm is to use a pinned version <=0.16.0 (which incidentally, also requires us to manually go and adjust all builds..)

@dinvlad Previous versions of kaniko i.e. 0.16.0 always whitelisted \var\run. We have preserved the default behavior.

RootCmd.PersistentFlags().BoolVarP(&opts.WhitelistVarRun, "whitelist-var-run", "", true, ...)

If you are relying on the previous behavior your cloudbuild.yaml shd work as is. No need to change anythig.

@higgs01
Copy link

higgs01 commented Feb 4, 2020

@tejal29 using GitLab CI/CD kubernetes runner, whether I set --whitelist-var-run to true, false or don't include it. Builds fail with debug-0.17.0. Everything works fine when using debug-v0.16.0.

@ejose19
Copy link
Contributor

ejose19 commented Feb 4, 2020

@tejal29 I'm still getting the error after v0.17.0. Here's my config

- name: "gcr.io/kaniko-project/executor:v0.17.0"
    args:
      [
        "--dockerfile=Dockerfile",
        "--destination=<url>,
        "--whitelist-var-run=false",
      ]

I get

Step #1: error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/docker.sock: device or resource busy

Am I missing something? Using v15 without the whitelist flag just works.

hey folks, \var\run was always whitelisted in kaniko. If you were relying on previous kaniko behaviour which is \var\run is whitelisted, please do not use this flag.

Yes, what I meant is that previous versions works without doing anything. As @higgs01 commented, version 0.17.0 fails whenever you set the flag to true,false or don't include it.

@tejal29
Copy link
Contributor

tejal29 commented Feb 4, 2020

@higgs01 and @ejose19 sorry for the breakage.

Can you please give us some debug logs with -v=debug.
I am also trying to reproduce this at my end using your GCB step

steps:
  - name: gcr.io/kaniko-project/executor:debug
    args: ["--dockerfile=<path to Dockerfile within the build context>",
           "--context=dir://<path to build context>",
           "--destination=<gcr.io/$PROJECT/$IMAGE:$TAG>"]

@higgs01
Copy link

higgs01 commented Feb 4, 2020

@tejal29 currently I can't access the system I've encountered the problem with (corporate network). But I will send the debug logs tomorrow if it's still necessary then.

@ejose19
Copy link
Contributor

ejose19 commented Feb 4, 2020

@tejal29 here it is

Cloudbuild:

steps:
  - name: "gcr.io/kaniko-project/executor:debug"
    args:
      ["--dockerfile=Dockerfile", "--destination=<url>", "--verbosity=debug"]

Dockerfile:

FROM alpine
RUN apk add nano

Logs:

BUILD
Pulling image: gcr.io/kaniko-project/executor:debug
debug: Pulling from kaniko-project/executor
bfb70510d7c5: Pulling fs layer
dc2057c58a5b: Pulling fs layer
1fcacdcafaa9: Pulling fs layer
d06d96ef79d9: Pulling fs layer
a7090596b381: Pulling fs layer
e6f4337a185f: Pulling fs layer
4536006be0b7: Pulling fs layer
d06d96ef79d9: Waiting
a7090596b381: Waiting
e6f4337a185f: Waiting
4536006be0b7: Waiting
dc2057c58a5b: Verifying Checksum
dc2057c58a5b: Download complete
1fcacdcafaa9: Verifying Checksum
1fcacdcafaa9: Download complete
a7090596b381: Verifying Checksum
a7090596b381: Download complete
d06d96ef79d9: Verifying Checksum
d06d96ef79d9: Download complete
e6f4337a185f: Verifying Checksum
e6f4337a185f: Download complete
4536006be0b7: Verifying Checksum
4536006be0b7: Download complete
bfb70510d7c5: Verifying Checksum
bfb70510d7c5: Download complete
bfb70510d7c5: Pull complete
dc2057c58a5b: Pull complete
1fcacdcafaa9: Pull complete
d06d96ef79d9: Pull complete
a7090596b381: Pull complete
e6f4337a185f: Pull complete
4536006be0b7: Pull complete
Digest: sha256:53bf8a6d56fed34914676e8d930fd96c3969914d96082de08dc99bf31f09c636
Status: Downloaded newer image for gcr.io/kaniko-project/executor:debug
gcr.io/kaniko-project/executor:debug
DEBU[0000] Copying file /workspace/Dockerfile to /kaniko/Dockerfile 
DEBU[0000] Skip resolving path /kaniko/Dockerfile       
DEBU[0000] Skip resolving path /workspace/              
DEBU[0000] Skip resolving path /cache                   
DEBU[0000] Skip resolving path                          
DEBU[0000] Skip resolving path                          
DEBU[0000] Skip resolving path                          
INFO[0000] Resolved base name alpine to alpine          
INFO[0000] Resolved base name alpine to alpine          
INFO[0000] Retrieving image manifest alpine             
DEBU[0001] No file found for cache key sha256:ddba4d27a7ffc3f86dd6c2f92041af252a1f23a8e742c90e6e1297bfa1bc0c45 stat /cache/sha256:ddba4d27a7ffc3f86dd6c2f92041af252a1f23a8e742c90e6e1297bfa1bc0c45: no such file or directory 
DEBU[0001] Image alpine not found in cache              
INFO[0001] Retrieving image manifest alpine             
INFO[0002] Built cross stage deps: map[]                
INFO[0002] Retrieving image manifest alpine             
DEBU[0003] No file found for cache key sha256:ddba4d27a7ffc3f86dd6c2f92041af252a1f23a8e742c90e6e1297bfa1bc0c45 stat /cache/sha256:ddba4d27a7ffc3f86dd6c2f92041af252a1f23a8e742c90e6e1297bfa1bc0c45: no such file or directory 
DEBU[0003] Image alpine not found in cache              
INFO[0003] Retrieving image manifest alpine             
INFO[0004] Unpacking rootfs as cmd RUN apk add nano requires it. 
DEBU[0004] Mounted directories: [{/kaniko false} {/etc/mtab false} {/tmp/apt-key-gpghome true} {/proc false} {/dev false} {/dev/pts false} {/sys false} {/sys/fs/cgroup false} {/sys/fs/cgroup/systemd false} {/sys/fs/cgroup/hugetlb false} {/sys/fs/cgroup/rdma false} {/sys/fs/cgroup/net_cls,net_prio false} {/sys/fs/cgroup/blkio false} {/sys/fs/cgroup/memory false} {/sys/fs/cgroup/cpu,cpuacct false} {/sys/fs/cgroup/pids false} {/sys/fs/cgroup/devices false} {/sys/fs/cgroup/freezer false} {/sys/fs/cgroup/cpuset false} {/sys/fs/cgroup/perf_event false} {/dev/mqueue false} {/workspace false} {/busybox false} {/builder/home false} {/builder/outputs false} {/root/tokencache false} {/etc/resolv.conf false} {/etc/hostname false} {/etc/hosts false} {/dev/shm false} {/var/run/docker.sock false}] 
DEBU[0004] Not adding /dev because it is whitelisted    
DEBU[0004] Not adding /etc/hostname because it is whitelisted 
DEBU[0004] Not adding /etc/hosts because it is whitelisted 
DEBU[0004] Not adding /etc/mtab because it is whitelisted 
DEBU[0004] Not adding /proc because it is whitelisted   
DEBU[0004] Not adding /sys because it is whitelisted    
error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/docker.sock: device or resource busy
ERROR
ERROR: build step 0 "gcr.io/kaniko-project/executor:debug" failed: exit status 1

@claudioweiler
Copy link

@tejal29

From a GitLab CI build:

DEBU[0000] Copying file /builds/base/ubuntu-base/Dockerfile to /kaniko/Dockerfile 
DEBU[0000] Skip resolving path /kaniko/Dockerfile       
DEBU[0000] Skip resolving path /builds/base/ubuntu-base 
DEBU[0000] Skip resolving path /cache                   
DEBU[0000] Skip resolving path                          
DEBU[0000] Skip resolving path                          
DEBU[0000] Skip resolving path                          
DEBU[0002] No file found for cache key sha256:bc025862c3e8ec4a8754ea4756e33da6c41cba38330d7e324abd25c8e0b93300 stat /cache/sha256:bc025862c3e8ec4a8754ea4756e33da6c41cba38330d7e324abd25c8e0b93300: no such file or directory 
DEBU[0002] Image ubuntu:18.04 not found in cache        
DEBU[0004] No file found for cache key sha256:bc025862c3e8ec4a8754ea4756e33da6c41cba38330d7e324abd25c8e0b93300 stat /cache/sha256:bc025862c3e8ec4a8754ea4756e33da6c41cba38330d7e324abd25c8e0b93300: no such file or directory 
DEBU[0004] Image ubuntu:18.04 not found in cache        
DEBU[0005] Mounted directories: [{/kaniko false} {/etc/mtab false} {/tmp/apt-key-gpghome true} {/proc false} {/dev false} {/dev/pts false} {/sys false} {/sys/fs/cgroup false} {/sys/fs/cgroup/systemd false} {/sys/fs/cgroup/net_prio,net_cls false} {/sys/fs/cgroup/cpuset false} {/sys/fs/cgroup/pids false} {/sys/fs/cgroup/blkio false} {/sys/fs/cgroup/memory false} {/sys/fs/cgroup/cpuacct,cpu false} {/sys/fs/cgroup/hugetlb false} {/sys/fs/cgroup/freezer false} {/sys/fs/cgroup/devices false} {/sys/fs/cgroup/perf_event false} {/dev/mqueue false} {/builds false} {/busybox false} {/dev/termination-log false} {/etc/resolv.conf false} {/etc/hostname false} {/etc/hosts false} {/dev/shm false} {/run/secrets false} {/var/run/secrets/kubernetes.io/serviceaccount false} {/proc/bus false} {/proc/fs false} {/proc/irq false} {/proc/sys false} {/proc/sysrq-trigger false} {/proc/acpi false} {/proc/kcore false} {/proc/keys false} {/proc/timer_list false} {/proc/timer_stats false} {/proc/sched_debug false} {/proc/scsi false} {/sys/firmware false}] 
DEBU[0006] Not adding /dev because it is whitelisted    
DEBU[0006] Not adding /etc/hostname because it is whitelisted 
DEBU[0006] Not adding /etc/hosts because it is whitelisted 
DEBU[0006] Not adding /etc/resolv.conf because it is whitelisted 
DEBU[0006] Not adding /proc because it is whitelisted   
DEBU[0006] Not adding /sys because it is whitelisted    
error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/secrets/kubernetes.io/serviceaccount/..2020_02_04_18_36_43.168969777: read-only file system

@tejal29
Copy link
Contributor

tejal29 commented Feb 4, 2020

hey folks, PR in progress. #1025

Patch fix coming soon.

@ejose19
Copy link
Contributor

ejose19 commented Feb 4, 2020

@tejal29 can confirm now it works with both v0.17.1 and debug-v0.17.1 without needing to add any additional flag (like it was before). Thanks!

@afirth
Copy link

afirth commented Feb 5, 2020

Thanks for the quick fix @tejal29 ! Confirmed fixed for us too on cloudbuild with latest

@dinvlad
Copy link
Author

dinvlad commented Feb 5, 2020

Thanks again @tejal29! I'm marking this as closed, anyone please feel free to re-open if you still have issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) priority/p0 Highest priority. Break user flow. We are actively looking at delivering it.
Projects
None yet
Development

No branches or pull requests