-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulling from public gcr.io repositories fails when using Kaniko 1.8.0 #1984
Comments
Facing the same issue here, while building image with GitLab CI & Google Container Registry. Falling back to |
We experienced a similar issue in case of pushing We push within OpenShift from a Tekton pipeline into the Project related docker image registry. Using a Service Account with the correct permissions As we upgraded to 1.8.0 we get this error
As we reverted back to 1.7.0 everything is working again. |
While I've upgraded to v1.9.0-debug I get same issue when building images with GitLab CI and Google Container Registry same as @martinezleoml going back to v1.6.0-debug solved the issue for me again. |
Meanwhile, this issue is older than 1 year. Aren't there any ideas about what causes this problem? For me switching to the same (public) image hosted in the Docker registry worked around the issue. |
We're also running into this issue.
By "running in GitLab CI" I mean in GitLab's shared CI runner. I believe that's in GCP, so we're starting to wonder if this is a Google+Google+Google issue — like something is taking a shortcut when it sees that it's in GCP talking to gcr.io and ends up tripping over a bug in that special-case code. The failures look like:
The "after 0 attempts" part seems sketchy. Is that an off-by-one error, or did it really fail without even trying? |
Could be fixed by adding scope EDIT: Maybe Kaniko uses GCP SDK and it tries to login through GCP VM instance metadata and failed due to scope for gcr API is not provided |
This is the shared gitlab.com runner, so I don't think we have this level of control over the vm. |
Confirming this is an ongoing issue with Gitlab CI / Kaniko / gcr.io distroless images. I replicated this in https://gitlab.com/mxmCherry/kaniko-gcr-io-debug , getting the same error as reported by the issue author:
I basically copied the official Gitlab recipe for image publishing, just added verbosity=trace + after_script with wget-ing the error-ed URL (it works fine). That's done for Dockerfile with You can see the full Gitlab CI pipelines / job logs here: https://gitlab.com/mxmCherry/kaniko-gcr-io-debug/-/pipelines I had successful previous builds for another (work) project, and the only/main difference I noticed between successful job log vs failure is: Last successful build (Aug 12, 2024):
First failed build (Sep 20, 2024 -- this could start failing before, this is just the date WE had to touch this project):
Note the successful build being older "green" instance, and failing one being newer "blue" instance. I also tried downgrading kaniko image for this (work) project by decrementing minor version 1.23.2 -> ... -> 1.19.0 (got way too bored to bother decrementing further). All of these versions failed with the same / very similar message. This makes me think it could be some Gitlab upgrade issue, though I cannot prove it properly. Could also be some issue with Google transitioning GCR -> Artifact Registry -- also cannot tie it to their messages: from what I've read/googled, gcr.io should still work just fine 🤷 Btw, we also build other (public) images using docker-in-docker approach and it seems to still be working, had last successful build on Sep 15: https://gitlab.com/bsm/docker/ffmpeg/-/pipelines/1454267223 This image is published to docker.io registry, which Kaniko pulls without any problems. So Kaniko (or the used https://github.com/google/go-containerregistry) pulls gcr.io somewhat differently than docker-in-docker or plain old |
Having the same problem, not a problem when using DinD to build the image but switching to kaniko causes this failure. Same failure exists using gitlab.com shared runners or private kubernetes runner hosted in GKE. |
I am also having this issue. I have tried a few different ways to authorize to artifact registry hoping this would resolve it, but no success. |
We did some analysis in the other thread #3328 (comment) and we found that is specifically this commit that started to break things 633f555. It's a fix for #1856 It tries to fix implicit auth in GCR and it does it well, that's the problem. When running in gitlab.com's gitlab-runner kaniko tries to get oauth token from metadata-server and succeeds! However, this token does not have permissions to pull from gcr, hence the later failure. The token that we receive only has limited permissions, specifically these: "scope": "https://www.googleapis.com/auth/monitoring.write https://www.googleapis.com/auth/logging.write" gitlab is informed and they don't consider it a security risk, however they consider removing those permissions anyways. The best workaround is to disable GCR authentication, credits go to @jameshartig: variables:
GOOGLE_APPLICATION_CREDENTIALS: /dev/null There are multiple problems coming together in my opinion:
|
Actual behavior
After the 1.8.0 release, builds for some of my container images started to fail with an authentication error during base image pull:
The Dockerfile uses crane as the base image, which is located in gcr.io. It's possible to pull the image without any credentials regularly, but it seems to fail when using Kaniko 1.8.0. I wasn't able to reproduce the issue in Kaniko 1.7.0, so I've decided to revert back to that for now.
Expected behavior
I expected Kaniko to be able to pull the crane image and continue with the image build process.
To Reproduce
Steps to reproduce the behavior:
gcr.io/go-containerregistry/crane:debug
as the base image.Additional Information
registry.gitlab.com/lepovirta/dis/kaniko@sha256:558f105d3d4fe2cfbb94a851a203d5d4e87105fdfb662e98934a0bcf5f16b892
Triage Notes for the Maintainers
--cache
flagThe text was updated successfully, but these errors were encountered: