Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulling from public gcr.io repositories fails when using Kaniko 1.8.0 #1984

Open
jpallari opened this issue Mar 13, 2022 · 11 comments
Open

Pulling from public gcr.io repositories fails when using Kaniko 1.8.0 #1984

jpallari opened this issue Mar 13, 2022 · 11 comments
Labels
area/registry For all bugs having to do with pushing/pulling into registries kind/bug Something isn't working priority/p1 Basic need feature compatibility with docker build. we should be working on this next. registry/gcr regression

Comments

@jpallari
Copy link

Actual behavior
After the 1.8.0 release, builds for some of my container images started to fail with an authentication error during base image pull:

$ /kaniko/executor --context ${BUILD_CONTEXT} --dockerfile ${DOCKERFILE_PATH} --destination ${CI_APPLICATION_REPOSITORY}:$CI_COMMIT_SHA
INFO[0000] Retrieving image manifest gcr.io/go-containerregistry/crane:debug 
INFO[0000] Retrieving image gcr.io/go-containerregistry/crane:debug from registry gcr.io 
error building image: GET https://gcr.io/v2/token?scope=repository%3Ago-containerregistry%2Fcrane%3Apull&service=gcr.io: UNAUTHORIZED: failed authentication

The Dockerfile uses crane as the base image, which is located in gcr.io. It's possible to pull the image without any credentials regularly, but it seems to fail when using Kaniko 1.8.0. I wasn't able to reproduce the issue in Kaniko 1.7.0, so I've decided to revert back to that for now.

Expected behavior
I expected Kaniko to be able to pull the crane image and continue with the image build process.

To Reproduce
Steps to reproduce the behavior:

  1. Create a Dockerfile that uses gcr.io/go-containerregistry/crane:debug as the base image.
  2. Use Kaniko 1.8.0 to build an image from the Dockerfile

Additional Information

  • Dockerfile that fails to build
  • Build Context
  • Used custom Kaniko image (= latest Kaniko debug with auth script installed): registry.gitlab.com/lepovirta/dis/kaniko@sha256:558f105d3d4fe2cfbb94a851a203d5d4e87105fdfb662e98934a0bcf5f16b892
  • Failed build in GitLab CI

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@martinezleoml
Copy link

Facing the same issue here, while building image with GitLab CI & Google Container Registry.

Falling back to gcr.io/kaniko-project/executor:v1.6.0-debug solved the issue (note that downgrading to v1.7.0 didn't solve the issue)

@mdeknowis
Copy link

We experienced a similar issue in case of pushing

We push within OpenShift from a Tekton pipeline into the Project related docker image registry. Using a Service Account with the correct permissions

As we upgraded to 1.8.0 we get this error

error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "image-registry.openshift-image-registry.svc:5000/myproject/mydocker:0.1.0": POST https://image-registry.openshift-image-registry.svc:5000/v2/myproject/mydocker/blobs/uploads/: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:myproject/mydocker Type:repository] map[Action:push Class: Name:myproject/mydocker Type:repository]]

As we reverted back to 1.7.0 everything is working again.

@Tamir5ht
Copy link

While I've upgraded to v1.9.0-debug I get same issue when building images with GitLab CI and Google Container Registry same as @martinezleoml

going back to v1.6.0-debug solved the issue for me again.

@markusheiden
Copy link

markusheiden commented May 28, 2023

Meanwhile, this issue is older than 1 year. Aren't there any ideas about what causes this problem?

For me switching to the same (public) image hosted in the Docker registry worked around the issue.

@aaron-prindle aaron-prindle added regression priority/p1 Basic need feature compatibility with docker build. we should be working on this next. area/registry For all bugs having to do with pushing/pulling into registries kind/bug Something isn't working labels May 30, 2023
@xenomachina
Copy link

We're also running into this issue.

  • We can pull images from gcr.io with jib running in GitLab CI.

  • We can pull images from docker hub with Kaniko running in GitLab CI.

  • We can pull images from gcr.io with Kaniko running locally.

  • We cannot pull images from gcr.io with Kaniko running in GitLab CI.

By "running in GitLab CI" I mean in GitLab's shared CI runner. I believe that's in GCP, so we're starting to wonder if this is a Google+Google+Google issue — like something is taking a shortcut when it sees that it's in GCP talking to gcr.io and ends up tripping over a bug in that special-case code.

The failures look like:

$ /kaniko/executor $EXTRA_ARGS --context=$KANIKO_CONTEXT --dockerfile=$DOCKERFILE --no-push
INFO[0000] Resolved base name golang:1.21 to build-env  
INFO[0000] Retrieving image manifest golang:1.21        
INFO[0000] Retrieving image golang:1.21 from registry index.docker.io 
INFO[0000] Retrieving image manifest gcr.io/distroless/static-debian12 
INFO[0000] Retrieving image gcr.io/distroless/static-debian12 from registry gcr.io 
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fstatic-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed

The "after 0 attempts" part seems sketchy. Is that an off-by-one error, or did it really fail without even trying?

@rvadim
Copy link

rvadim commented Sep 5, 2024

Could be fixed by adding scope https://www.googleapis.com/auth/devstorage.read_only to the vm instance where gitlab runner is working.
https://cloud.google.com/sdk/gcloud/reference/beta/compute/instances/set-scopes
Seems like additional security on GCP side.

EDIT: Maybe Kaniko uses GCP SDK and it tries to login through GCP VM instance metadata and failed due to scope for gcr API is not provided

@xenomachina
Copy link

Could be fixed by adding scope https://www.googleapis.com/auth/devstorage.read_only to the vm instance where gitlab runner is working.

This is the shared gitlab.com runner, so I don't think we have this level of control over the vm.

@mxmCherry
Copy link

mxmCherry commented Sep 20, 2024

Confirming this is an ongoing issue with Gitlab CI / Kaniko / gcr.io distroless images.

I replicated this in https://gitlab.com/mxmCherry/kaniko-gcr-io-debug , getting the same error as reported by the issue author:

INFO[0000] Retrieving image manifest gcr.io/distroless/base-debian12:nonroot 
INFO[0000] Retrieving image gcr.io/distroless/base-debian12:nonroot from registry gcr.io 
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed

I basically copied the official Gitlab recipe for image publishing, just added verbosity=trace + after_script with wget-ing the error-ed URL (it works fine).

That's done for Dockerfile with FROM gcr.io/distroless/base-debian12:nonroot.

You can see the full Gitlab CI pipelines / job logs here: https://gitlab.com/mxmCherry/kaniko-gcr-io-debug/-/pipelines


I had successful previous builds for another (work) project, and the only/main difference I noticed between successful job log vs failure is:

Last successful build (Aug 12, 2024):

Running with gitlab-runner 17.0.0~pre.88.g761ae5dd (761ae5dd)
  on green-1.saas-linux-small-amd64.runners-manager.gitlab.com/default <REDACTED>, system ID: <REDACTED>
...
$ /kaniko/executor <FLAGS_REDACTED>
ERRO[0000] Error while retrieving image from cache: gcr.io/distroless/base-debian12:nonroot unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed 
Cleaning up project directory and file based variables
Job succeeded

First failed build (Sep 20, 2024 -- this could start failing before, this is just the date WE had to touch this project):

Running with gitlab-runner 17.4.0~pre.110.g27400594 (27400594)
  on blue-6.saas-linux-small-amd64.runners-manager.gitlab.com/default <REDACTED>, system ID: <REDACTED>
...
$ /kaniko/executor <FLAGS_REDACTED>
ERRO[0000] Error while retrieving image from cache: gcr.io/distroless/base-debian12:nonroot unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed 
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1

Note the successful build being older "green" instance, and failing one being newer "blue" instance.

I also tried downgrading kaniko image for this (work) project by decrementing minor version 1.23.2 -> ... -> 1.19.0 (got way too bored to bother decrementing further). All of these versions failed with the same / very similar message.

This makes me think it could be some Gitlab upgrade issue, though I cannot prove it properly. Could also be some issue with Google transitioning GCR -> Artifact Registry -- also cannot tie it to their messages: from what I've read/googled, gcr.io should still work just fine 🤷


Btw, we also build other (public) images using docker-in-docker approach and it seems to still be working, had last successful build on Sep 15: https://gitlab.com/bsm/docker/ffmpeg/-/pipelines/1454267223

This image is published to docker.io registry, which Kaniko pulls without any problems. So Kaniko (or the used https://github.com/google/go-containerregistry) pulls gcr.io somewhat differently than docker-in-docker or plain old wget which still succeed.

@alm-pro
Copy link

alm-pro commented Oct 10, 2024

Having the same problem, not a problem when using DinD to build the image but switching to kaniko causes this failure. Same failure exists using gitlab.com shared runners or private kubernetes runner hosted in GKE.

@jgsuess
Copy link

jgsuess commented Oct 12, 2024

I am also having this issue. I have tried a few different ways to authorize to artifact registry hoping this would resolve it, but no success.

@mzihlmann
Copy link

We did some analysis in the other thread #3328 (comment) and we found that is specifically this commit that started to break things 633f555. It's a fix for #1856 Fix implicit GCR auth.

It tries to fix implicit auth in GCR and it does it well, that's the problem. When running in gitlab.com's gitlab-runner kaniko tries to get oauth token from metadata-server and succeeds! However, this token does not have permissions to pull from gcr, hence the later failure.

The token that we receive only has limited permissions, specifically these:

"scope": "https://www.googleapis.com/auth/monitoring.write https://www.googleapis.com/auth/logging.write"

gitlab is informed and they don't consider it a security risk, however they consider removing those permissions anyways.

The best workaround is to disable GCR authentication, credits go to @jameshartig:

  variables:
    GOOGLE_APPLICATION_CREDENTIALS: /dev/null

There are multiple problems coming together in my opinion:

  • when we request the token we do so with cloud-platform scope and it works, later we switch to gcr registry scope and it fails
  • when token authentication fails we don't retry without authentication.
  • imo gitlab should change their infrastructure to not expose metadata server to the user at all (none of my runners do)
  • imo kaniko should not try to do things implicitly, as implicit is just another word for surprising. This one would be a hard ask though as it breaks with their philosophy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/registry For all bugs having to do with pushing/pulling into registries kind/bug Something isn't working priority/p1 Basic need feature compatibility with docker build. we should be working on this next. registry/gcr regression
Projects
None yet
Development

No branches or pull requests