Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Support Google Container Registry #529

Closed
jpellizzari opened this issue Apr 14, 2017 · 53 comments
Closed

Support Google Container Registry #529

jpellizzari opened this issue Apr 14, 2017 · 53 comments

Comments

@jpellizzari
Copy link
Contributor

GCR does authentication differently from other container image registry providers. GCR wraps the Docker CLI and handles authentication on its own using the gcloud utility. It appears that GCE uses RSA keys rather than username/password like other container registry providers (quay.io, gitlab, docker hub).

More info here: https://cloud.google.com/container-registry/docs/using-with-third-party-solutions

This issue was reported by Weave Cloud users.

@wassemgtk
Copy link

@jpellizzari any timeline to support Google Container Registry, Should we wait or should we find other solutions ?

@jpellizzari
Copy link
Contributor Author

@wassemgtk Currently not prioritized. I'll update you if that changes.

@squaremo
Copy link
Member

@wassemgtk We're currently rewriting the daemon fluxd so that registry access is done from there rather than the service.

This will mean that how it accesses the registry is up to how you run the daemon -- it just expects a docker-style config file to be available, which I think you'd be able to supply with a small amount of preparation (judging by https://cloud.google.com/container-registry/docs/advanced-authentication).

This change will land in the next <small number> weeks.

@errordeveloper
Copy link
Contributor

I've previously used Flux with GCR and the config I've put together looked like this:

registry:
  auths:
    "gcr.io": { "auth": <BASE64_ENCODED_STRIGN> }

Where echo <BASE64_ENCODED_STRIGN> | base64 -D is a _json_key:<SERVICE_ACCOUNT_JSON>, namely:

_json_key:{
  "type": "service_account",
  "project_id": "flux-demo-159311",
  "private_key_id": "011d0b9a3fada1ccbaac0d603771f9bcebdb1168",
  "private_key": "-----BEGIN PRIVATE KEY-----\n<PRIVATE_KEY>\n-----END PRIVATE KEY-----\n",
  "client_email": "[email protected]",
  "client_id": "113774258020897955380",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/flux-549%40flux-demo-159311.iam.gserviceaccount.com"
}

So if you have service-account.json, you can produce this config like so:

printf "registry: { auths: { "gcr.io":  %s } }" "$(printf "_json_key:%s" "$(cat service-account.json)" | base64)"

@wassemgtk hope the above makes sense, I'd be happy to help if you jump on Slack and we can figure out how to update the docs and talk more about how we could make it easier for you.

@wassemgtk
Copy link

Amazing!! Yayyy. I will check it out and let you know on Slake

@wassemgtk
Copy link

@errordeveloper Tried it but still facing issues
kubectl apply -f "https://cloud.weave.works/k8s/flux.yaml?service-token=******************************-version=$(kubectl version | base64 0)" Unable to open '0': No such file or directory error: unable to read URL "https://cloud.weave.works/k8s/flux.yaml?service-token=******************************-version=", server reported 400 Bad Request, status code=400

@rade
Copy link
Contributor

rade commented Apr 20, 2017

Instead of base64 0 you want base64 -w 0.

@wassemgtk
Copy link

Same error
`Waseems-MBP:flux waseem$ kubectl apply -f "https://cloud.weave.works/k8s/flux.yaml?service-token=***********************************-version=$(kubectl version | base64 -w 0)"
base64: invalid option -- w
Usage: base64 [-hvD] [-b num] [-i in_file] [-o out_file]
-h, --help display this message
-D, --decode decodes input
-b, --break break encoded string into num character lines
-i, --input input file (default: "-" for stdin)
-o, --output output file (default: "-" for stdout)

error: unable to read URL "https://cloud.weave.works/k8s/flux.yaml?service-token=***********************************version=", server reported 400 Bad Request, status code=400`

@rade
Copy link
Contributor

rade commented Apr 20, 2017

Different error.

Looks like you have a base64 binary that does not support the -w option :(

So instead of

$(kubectl version | base64 -w 0)

try

$(kubectl version | base64 | tr -d '\n')

@wassemgtk
Copy link

wassemgtk commented Apr 20, 2017

:(

Waseems-MBP:flux waseem$ kubectl apply -f "https://cloud.weave.works/k8s/flux.yaml?service-token=u*************************-version=$(kubectl version | base64 | tr -d '\n')" serviceaccount "weave-flux" configured deployment "weave-flux-agent" configured clusterrolebinding "weave-flux" created Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-flux" is forbidden: attempt to grant extra privileges: [{[*] [*] [*] [] []} {[*] [] [] [] [*]}] user=&{waseem@********.com [system:authenticated] map[]} ownerrules=[{[create] [authorization.k8s.io] [selfsubjectaccessreviews] [] []} {[get] [] [] [] [/api /api/* /apis /apis/* /healthz /swaggerapi /swaggerapi/* /version]}] ruleResolutionErrors=[]

@wassemgtk
Copy link

screenshot 2017-04-20 14 40 17

@rade
Copy link
Contributor

rade commented Apr 20, 2017

We are making progress...

Please post the output of kubectl version.

@wassemgtk
Copy link

Waseems-MBP:flux waseem$ kubectl version Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

@rade
Copy link
Contributor

rade commented Apr 20, 2017

So the error is due to the new role-based access control (RBAC) feature in k8s 1.6. Let's see whether it works without that:

kubectl apply -f "https://cloud.weave.works/k8s/v1.5/flux.yaml?service-token=***********************************"

@wassemgtk
Copy link

wassemgtk commented Apr 20, 2017

No error But no new Daemon

Waseems-MBP:flux waseem$ kubectl apply -f "https://cloud.weave.works/k8s/v1.5/flux.yaml?service-token=*************************" serviceaccount "weave-flux" configured deployment "weave-flux-agent" configured Waseems-MBP:flux waseem$

screenshot 2017-04-20 14 40 17

screenshot 2017-04-20 15 02 14

@rade
Copy link
Contributor

rade commented Apr 20, 2017

the weave flux agent is not a daemon in the k8s sense of the word.

@rade
Copy link
Contributor

rade commented Apr 20, 2017

the agent should show up as a Deployment.

@wassemgtk
Copy link

Nothing there :( I can't see it

@rade
Copy link
Contributor

rade commented Apr 20, 2017

I'm afraid this has exhausted my k8s knowledge. One of my colleagues will pick this up tomorrow.

@squaremo
Copy link
Member

@wassemgtk The output quoted in #529 (comment) certainly indicates that a deployment for Flux was created.

Just to check, did you look at the Deployments page in the k8s dashboard?

@wassemgtk
Copy link

Yes I did and I can't see the deployment

@squaremo
Copy link
Member

I see. If kubernetes reports that it created something, but you can't see it in kubernetes' dashboard, I don't know what to suggest to you. Perhaps try verifying with kubectl:

kubectl get deployments --all-namespaces

@wassemgtk
Copy link

I tried again and now I can see
weave-cortex-agent
weave-flux-agent

But Not connected to Weave Cloud

screenshot 2017-04-21 09 02 19

@jpellizzari
Copy link
Contributor Author

@wassemgtk That indicator is based on the git repo being configured for your instance. Have you configured the git repo in the configuration section of the 'Deploy' menu?

@wassemgtk
Copy link

Yayyyy I can see the images. But still, have one more issue, can't release
screenshot 2017-04-21 09 46 41

@squaremo
Copy link
Member

Cool! Progress.

This could indicate at least a couple of things:

  • there really is no branch "master" in the upstream repo, but it just needs to be told the correct branch
  • there's some other problem (the repo is completely empty? not sure) that git reports this way

@q-michelle
Copy link

I'm also having trouble releasing. After making a change to the code & generating a new image, the Release button says there is nothing to do. Any idea why? It's a freshly generated image, built with code changes.

screen shot 2017-04-25 at 4 11 56 pm

screen shot 2017-04-25 at 4 13 09 pm

screen shot 2017-04-25 at 4 02 35 pm

@jpellizzari
Copy link
Contributor Author

jpellizzari commented Apr 25, 2017

@q-michelle Does your new image use the same tag as your old one? Flux determines if images are new by the image tag. See #498

Whoops didn't see that last screenshot. Disregard.

@q-michelle
Copy link

No, different tags. the container registry image above shows tag assignments.

@q-michelle
Copy link

It reports that it finds nothing to do, but where is it looking? Should I have a .yaml in a specific location or is it looking through all the objects in the cluster to find the one that uses this image? I'm unclear how the image is linked to the deployment/pod that runs it.

@squaremo
Copy link
Member

@q-michelle,

It reports that it finds nothing to do, but where is it looking?

Flux looks at the config git repo to see what services* are defined. In the first screenshot, it's saying "0 services found", which means there's nothing there that satisfies its idea of a service, so it doesn't proceed.

You might ask: why can I click on default/affect-router if Flux doesn't recognise it as something I can deploy to? The answer is that it's an accident of history -- we started by looking at the running system to see what could be deployed, and have not yet arrived at a more consistent model.

What can you do? You can make sure you have a definition of the service in the config repo, in the form that Flux expects it (explained just below). If I've jumped to a conclusion, and you do in fact have a definition, would you mind opening another issue and posting the details? Or you can hop on our Slack: https://weaveworks.github.io/community-slack/ (I am @mbridgen there).

--
*For a slightly funny definition of "service", which is " a Kubernetes Service resource and a Deployment resource that makes pods fitting the Service's selector". This comes from our own initial use of Kubernetes, and won't match everyone's use, so we will revisit it at some point.

@q-michelle
Copy link

@squaremo that makes sense. We are using the same definition of service, but I had both the Service & Deployment bundled in the same file. Separating them out worked. Weavecloud now registers a change, but throws an error during deployment.

Status: Failed: qordoba/affect-router: applying definition to qordoba/affect-router: running kubectl: error: group map[authorization.k8s.io:0xc8203d70a0 componentconfig:0xc8203d71f0 extensions:0xc8203d7260 storage.k8s.io:0xc82033f3b0 :0xc8203d6fc0 apps:0xc8203d7030 autoscaling:0xc8203d7110 batch:0xc8203d7180 policy:0xc8203d72d0 rbac.authorization.k8s.io:0xc8203d7340 authentication.k8s.io:0xc8203d7420 federation:0xc8203d6f50] is already registered

1) Queued.
2) Calculating updates for release.
3) Cloning git repository.
4) Finding defined services.
5) Found service qordoba/affect-router
6) Found 1 services.
7) Looking up images.
8) Will update qordoba/affect-router container affect-router: gcr.io/qordoba-devel/affect-router:0.0.8 -> 0.0.8-flux-test
9) Pushing changes.
10) Applying changes.
11) Sending notifications.
12) Failed: qordoba/affect-router: applying definition to qordoba/affect-router: running kubectl: error: group map[authorization.k8s.io:0xc8203d70a0 componentconfig:0xc8203d71f0 extensions:0xc8203d7260 storage.k8s.io:0xc82033f3b0 :0xc8203d6fc0 apps:0xc8203d7030 autoscaling:0xc8203d7110 batch:0xc8203d7180 policy:0xc8203d72d0 rbac.authorization.k8s.io:0xc8203d7340 authentication.k8s.io:0xc8203d7420 federation:0xc8203d6f50] is already registered

Failure

I suspect it is a version incompatibility. I found this change that has been reverted. Do you think updating the version of kubectl would resolve this error?

@q-michelle
Copy link

Container Engine is running 1.6.1
screen shot 2017-04-27 at 2 41 56 pm

@squaremo
Copy link
Member

Do you think updating the version of kubectl would resolve this error?

I think you are likely correct that it's a version incompatibility. In master we include kubectl 1.6.1 in the fluxd image, but we've not released an official image with it. I guess I'll look into doing that!

@q-michelle
Copy link

Is there a timing estimate for when Cloud Weave will support a newer version of kubectl? Or are there updated agents we could deploy on our cluster?

@squaremo
Copy link
Member

squaremo commented May 8, 2017

OK, there's a new release of fluxd in quay.io, with a newer kubectl bundled into the image, which is worth trying. It's quay.io/weaveworks/fluxd:0.3.0, if you want to change the manifest directly.
There's also a new fluxctl binary on the github release page https://github.com/weaveworks/flux/releases/tag/0.3.0

@q-michelle
Copy link

Thanks for pushing the new image! I updated the image version in the weave-flux-agent deployment object, but still can't get a deployment to succeed. I can see a green connection in the config screen "Deploy agent configured", but am seeing "Failed: connection is shut down" when I try to initiate a deploy. I tried deleting & recreating the weave-flux-agent deployment object, but am seeing these errors on recreation:
deployment "weave-flux-agent" created
Error from server (AlreadyExists): serviceaccounts "weave-flux" already exists
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-flux" is forbidden: attempt to grant extra privileges: [{[] [] [] [] []} {[] [] [] [] []}] user=&{[email protected] [system:authenticated] map[]} ownerrules=[{[create] [authorization.k8s.io] [selfsubjectaccessreviews] [] []} {[get] [] [] [] [/api /api/ /apis /apis/* /healthz /swaggerapi /swaggerapi/* /version]}] ruleResolutionErrors=[]
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "weave-flux" already exists
I've tried everything I can think of, including modifications of the flux.yaml file found here: https://cloud.weave.works/k8s/v1.6/flux.yaml. Any ideas what might be going wrong?

@q-michelle
Copy link

screen shot 2017-05-08 at 5 28 55 pm

@squaremo
Copy link
Member

squaremo commented May 9, 2017

Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-flux" is forbidden: attempt to grant extra privileges: [{[] [] [] [] []} {[] [] [] [] []}] user=&{[email protected] [system:authenticated] map[]} ownerrules=[{[create] [authorization.k8s.io] [selfsubjectaccessreviews] [] []} {[get] [] [] [] [/api /api/ /apis /apis/* /healthz /swaggerapi /swaggerapi/* /version]}] ruleResolutionErrors=[]
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "weave-flux" already exists
I've tried everything I can think of, including modifications of the flux.yaml file found here: https://cloud.weave.works/k8s/v1.6/flux.yaml. Any ideas what might be going wrong?

This error has also been observed over at #556; it seems like GKE has an idiosyncratic implementation of RBAC.

https://cloud.google.com/container-engine/docs/role-based-access-control gives a few gotchas, but according to @mongrelion, the workarounds do not work around them. (Although his comment there seems ambiguous, so it is worth verifying for yourself)

What modifications did you make to the downloaded manifest? (So we can at least rules things out)

@mongrelion
Copy link

@squaremo how can I assist to clear up the ambiguity in my previous comment on #556? :D

@q-michelle
Copy link

I think I understand @mongrelion's comments. Like Carlos, I'm also an owner of the project, which means I should have all required privileges. I tried granting myself explicit container engine privileges just in case, but saw the same failures.

I was able to narrow it down to a problem with the rules in the ClusterRole object. I tried many different derivatives of:

  - apiGroups: ["*"]
    resources: ["*"]
    verbs: ["*"]
    nonResourceURLs: ["*"]

All resulted in the same error:

Error from server (Forbidden): error when creating "flux.yaml": clusterroles.rbac.authorization.k8s.io "weave-flux" is forbidden: attempt to grant extra privileges: [{[*] [*] [*] [] []} {[*] [] [] [] [*]}] user=&{[email protected]  [system:authenticated] map[]} ownerrules=[{[create] [authorization.k8s.io] [selfsubjectaccessreviews] [] []} {[get] [] [] [] [/api /api/* /apis /apis/* /healthz /swaggerapi /swaggerapi/* /version]}] ruleResolutionErrors=[]

I found #400 & the corresponding PR for weave-kube, but I can't see how to translate that into the permissions that flux needs.

At this point, I'm looking for one of two things:

  1. A set of rules that will not throw a Forbidden exception
  2. A set of project permissions for my user that will allow granting of the above rules. If the Owner role is insufficient, what would work?

@squaremo
Copy link
Member

I tried granting myself explicit container engine privileges just in case, but saw the same failures.

The Google Cloud documentation says "An example workaround is to create a RoleBinding that gives your Google identity a cluster-admin role before attempting to create additional Role or ClusterRole permissions." Is that what you did, @q-michelle?

@q-michelle
Copy link

Yes, if it's the Container Engine Cluster Admin role. This is everything I granted explicitly:
screen shot 2017-05-15 at 12 43 10 pm

@squaremo
Copy link
Member

OK, thanks for confirming that. It sounds like the workaround given in the GKE docs does not in fact work around.

Do either of you @q-michelle @mongrelion have evidence that one can grant permissions on GKE, e.g., if you just do it via the command line, for some ad-hoc job? (I'm sorry I have been unable to dig into this myself, I'm fighting battles on several fronts atm.)

@q-michelle
Copy link

@squaremo No, I'm unable to grant permissions for other objects, which appears to be because I haven't setup authentication on my cluster. I'm a bit out of my element trying to get it to work. Is there an auth type that you recommend or could you point me at some docs that would help me get it setup properly?

@samb1729
Copy link
Contributor

samb1729 commented Jun 5, 2017

@q-michelle the instructions for setting up the cluster-admin role in our new docs may solve your issue.

@q-michelle
Copy link

Thanks @Sambooo - that makes sense! The command in your link ran successfully to grant my user cluster-admin privs. I no longer get errors when running kubectl apply -n kube-system -f "https://cloud.weave.works/k8s.yaml?service-token=REDACTED&k8s-version=$(kubectl version | base64 | tr -d '\n')", but I still get an error when trying to deploy, so it does not look like it resolved the original problem.
screen shot 2017-06-05 at 1 29 24 pm
screen shot 2017-06-05 at 1 30 31 pm

@q-michelle
Copy link

The cluster & repo were somehow not in sync from using Flux 0.2.0, but now that everything is in sync on 0.3.0, everything is working as expected. Thank you everyone for the help getting things up and running!

@errordeveloper
Copy link
Contributor

Just looking at this thread now, it appears that headline issue, GCR auth configuration, is resolved and basically needs to be documented properly.

@q-michelle sounds like you issues has been resolved too.

I'm tempted to close this and open at least one issue with a title "Document GCR auth config steps", but I'd like to hear what others think, as I might have misses something while reading this thread.

@q-michelle
Copy link

Yes, that sounds like a logical resolution for this thread. I've automated a few more services and haven't had any additional problems - everything is working well now.

@errordeveloper
Copy link
Contributor

Looks like this doesn't need to be open any more. If anyone opposes, please do feel free to re-open :)

@RochesterinNYC
Copy link

Where can I find the GCR-related documentation that this issue resulted in?

@squaremo
Copy link
Member

squaremo commented Jan 4, 2018

Where can I find the GCR-related documentation that this issue resulted in?

The main outcome of this issue was that we put additional instructions in Weave Cloud bootstrapping, for GKE users -- this amounts to running

kubectl create clusterrolebinding "cluster-admin-$(whoami)" --clusterrole=cluster-admin --user="$(gcloud config get-value core/account)"

to give your account the necessary permissions to run the weave agents.

We also added gcr.io-specific authentication (https://github.com/weaveworks/flux/blob/master/registry/credentials.go#L89), which now -- in the next release -- works for eu.gcr.io and so on too. It assumes you are accessing GCR from GKE. If you are using GCR from elsewhere, I think it will work like any other registry and pick up credentials from the imagePullSecrets.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants