Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

RBAC support #1292

Merged
merged 1 commit into from
May 31, 2017
Merged

RBAC support #1292

merged 1 commit into from
May 31, 2017

Conversation

Bregor
Copy link
Contributor

@Bregor Bregor commented May 7, 2017

With this change deis-controller became available to work in RBAC-only clusters

Works with both Kubernetes 1.5 and 1.6 (see templates/_helpers.tmpl for details)
Actually tested with 1.5.7 and 1.6.4

ClusterRole allows deis-controller:

  • namespaces: get, list, create and delete
  • services: get, list, create, update and delete
  • nodes: get and list
  • events: list and create
  • secrets: list, get, create, update and delete
  • replicationcontrollers: get, list, create, update and delete
  • replicationcontrollers/scale: get and update
  • pods/log: get
  • pods: get, list and delete
  • resourcequotas: get and create
  • apps/deployments: get, list, create, update and delete
  • autoscaling/horizontalpodautoscalers: get, list, create, update and delete
  • extensions/deployments: get, list, create, update and delete
  • extensions/deployments/scale: get and update
  • extensions/replicasets: get, list, delete and update
  • extensions/replicasets/scale: get and update
  • extensions/horizontalpodautoscalers: get, list, create, update and delete
  • extensions/ingresses: get, list, watch, create, update and delete

@deis-admin
Copy link

Thanks for the contribution! Please ensure your commits follow our style guide. This code will be tested once a Deis maintainer reviews it.

@vdice
Copy link
Member

vdice commented May 8, 2017

Jenkins, OK to test

@vdice vdice added this to the v2.15 milestone May 8, 2017
@vdice
Copy link
Member

vdice commented May 8, 2017

@Bregor thanks for this feature addition!

Before I dive into testing on an RBAC-only cluster, I wanted make sure installing charts on non-RBAC clusters (as CI does) still works. As it stands, installation fails either b/c the necessary api version is missing (from ci cluster, looks like v1.5.3):

Error: apiVersion "rbac.authorization.k8s.io/v1alpha1" in workflow/charts/controller/templates/controller-clusterrolebinding.yaml is not available

or the cluster otherwise isn't RBAC-ready (the following from my v1.6.0 mk cluster):

 $ helm install --wait workflow-pr/workflow --namespace=deis --version v2.14.1-20170508161641-sha.3d6bbf9 --set fluentd.docker_tag=git-740bd61 --name deis-workflow
Error: release deis-workflow failed: clusterroles.rbac.authorization.k8s.io "deis:deis-logger-fluentd" is forbidden: attempt to grant extra privileges: [{[list] [] [pods] [] []} {[get] [] [pods] [] []} {[watch] [] [pods] [] []}] user=&{system:serviceaccount:kube-system:default e9349b4a-311c-11e7-bc10-7665e4a656ee [system:serviceaccounts system:serviceaccounts:kube-system system:authenticated] map[]} ownerrules=[{[create] [authorization.k8s.io] [selfsubjectaccessreviews] [] []} {[get] [] [] [] [/api /api/* /apis /apis/* /healthz /swaggerapi /swaggerapi/* /version]}] ruleResolutionErrors=[]

Therefore, I think the first priority is to wrap the template generation in a feature flag. Perhaps global.use_rbac or similar, defaulting to false (or commented out). When false/not set, the templates added here should not be generated. When true, they would be.

How does that sound?

@vdice
Copy link
Member

vdice commented May 9, 2017

Jenkins, add to whitelist

@Bregor Bregor mentioned this pull request May 12, 2017
@vdice vdice requested review from vdice and mboersma May 16, 2017 17:01
@vdice vdice added the LGTM1 label May 16, 2017
@mboersma
Copy link
Member

mboersma commented May 16, 2017

Looks like we need specific delete permission on a namespace as well. At least, when I do deis destroy, I get an error when the controller tries to delete the namespace in an RBAC environment.

172.17.0.17 "POST /v2/apps/ HTTP/1.1" 201 170 "Deis Client dev-209c985"
INFO [aerial-yodeling]: deleting environment
ERROR:root:Uncaught Exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 480, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 93, in destroy
    self.perform_destroy(instance)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 97, in perform_destroy
    instance.delete()
  File "/app/api/models/app.py", line 260, in delete
    self._scheduler.ns.delete(self.id)
  File "/app/scheduler/resources/namespace.py", line 52, in delete
    raise KubeHTTPException(response, 'delete Namespace "{}"', namespace)
  File "/app/scheduler/exceptions.py", line 10, in __init__
    data = response.json()
  File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 866, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
172.17.0.17 "DELETE /v2/apps/aerial-yodeling/ HTTP/1.1" 500 25 "Deis Client dev-209c985"

@Bregor
Copy link
Contributor Author

Bregor commented May 17, 2017

@mboersma added and rebased

@codecov-io
Copy link

codecov-io commented May 17, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@4b015cd). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master    #1292   +/-   ##
=========================================
  Coverage          ?   87.04%           
=========================================
  Files             ?       45           
  Lines             ?     3929           
  Branches          ?      681           
=========================================
  Hits              ?     3420           
  Misses            ?      338           
  Partials          ?      171

@Bregor
Copy link
Contributor Author

Bregor commented May 17, 2017

@mboersma, quick note about namespace deletion:
RBAC absolutely lacks ownership
you can grant to specific user creation and deletion of namespaces, BUT! you can’t list particular namespaces which he can delete.
So this user will have permission to delete any namespace in cluster
The only way to “hack” it is to create role (just role, not clusterrole) in every namespace belongs to user with ability to namespace deletion.
And then gather all this roles in clusterrolebinding for this user.
For this reason I intentionally disabled namespace deletion from the beginning.
I hope we will find a way to create proper workaround for this "issue" at last :)

@mboersma
Copy link
Member

mboersma commented May 17, 2017

For this reason I intentionally disabled namespace deletion from the beginning.

Ah, ok--that makes a lot more sense after your explanation. That's too bad the namespace-delete perm is global. It was a shortcut to have the controller just delete the entire app namespace in the first place.

It would probably be a smarter implementation to delete just the objects that controller actually created, and to print a warning if that all succeeds but the namespace deletion fails. But for now this makes RBAC workable for Workflow.

@Bregor I'll test more tomorrow. I heard you're vacationing for a week or so--enjoy! We will keep testing this and let you know if we find other issues. There should be plenty of time to get this merged for the next release. Thanks again!

@kartojal
Copy link

kartojal commented May 18, 2017

EDIT: Fixed my case after adding the verb "get" to "pods" resource at deis-controller RBAC ClusterRole.

The error code was 403 at URL /api/v1/namespaces/some-namespace/pods/some-pod-13gfabcfg2


I applied the RBAC rules, and all seems be ok, less one thing. Was trying to update a release, doing a second "git push deis some-app", and no matter if it fails or if succeeds, it always prints the next message in the controller logs:

INFO [some-app]: waited 30s and 1 pods are in service
INFO:scheduler:[some-app]: waited 30s and 1 pods are in service
ERROR:root:Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
  File "/app/api/models/build.py", line 65, in create
    self.app.deploy(new_release)
  File "/app/api/models/app.py", line 572, in deploy
    release.cleanup_old()
  File "/app/api/models/release.py", line 307, in cleanup_old
    self._scheduler.pod.delete(self.app.id, pod['metadata']['name'])
  File "/app/scheduler/resources/pod.py", line 350, in delete
    pod = self.pod.get(namespace, name).json()
  File "/app/scheduler/resources/pod.py", line 32, in get
    raise KubeHTTPException(response, message, *args)
  File "/app/scheduler/exceptions.py", line 10, in __init__
    data = response.json()
  File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 866, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 480, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 527, in create
    super(BuildHookViewSet, self).create(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 533, in post_save
    build.create(self.user)
  File "/app/api/models/build.py", line 79, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: Expecting value: line 1 column 1 (char 0)

Sometimes does not delete the original pod and creates an extra one, indicating 2 pods running when only 1 is required by the deployment. The output of git push deis some-app after a good deployment without pod failures:

remote: 2017/05/18 10:44:33 Error running git receive hook [The controller returned an error when publishing the release: Unknown Error (400): {"detail":"Expecting value: line 1 column 1 (char 0)"}]
To ssh://[email protected]:2222/some-app.git
 ! [remote rejected] some-app -> some-app (pre-receive hook declined)
error: failed to push some refs to 'ssh://[email protected]:2222/some-app.git'

So, it tags a good release to a "failed" one: "release h12f1ff1df23 which failed"

My RBAC controller rules look like this, maybe something is missing:

rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get
  - list
  - create
  - delete
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - get
  - create
  - update
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - list
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - list
  - get
  - create
  - update
- apiGroups:
  - ""
  resources:
  - replicationcontrollers
  verbs:
  - list
  - get
- apiGroups:
  - ""
  resources:
  - pods/log
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - list
  - delete
- apiGroups:
  - extensions
  resources:
  - replicasets
  verbs:
  - list
  - delete
  - update
- apiGroups:
  - extensions
  - apps
  resources:
  - deployments
  verbs:
  - get
  - list
  - create
  - update
  - delete
- apiGroups:
  - extensions
  resources:
  - deployments/scale
  - replicasets/scale
  verbs:
  - get
  - update
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - delete

@Bregor
Copy link
Contributor Author

Bregor commented May 19, 2017

@kartojal you mean deis-controller needs ability to get pods in all namespaces (clusterrole), not only in workflow's parent namespace?
And second question: have you access to apiserver's audit.log? If yes, could you please grep system:serviceaccount:deis:deis-controller /path/to/audit.log while trying to update a release? This way we can catch abilities lacks in role/clusterrole. Unfortunately I can't do it myself till May 23, because I haven't access to my clusters while on vacation.
Thanks in advance!

@kartojal
Copy link

kartojal commented May 19, 2017 via email

@vdice
Copy link
Member

vdice commented May 25, 2017

@Bregor I believe I've reproduced the same issue mentioned by @kartojal above. grepped /audit.log shows:

$ grep system:serviceaccount:deis:deis-controller /audit.log | grep pods
2017-05-25T00:09:34.153574773Z AUDIT: id="a5a3be49-c646-4652-b498-59c897c5afc4" ip="172.17.0.13" method="GET" user="system:serviceaccount:deis:deis-controller" groups="\"system:serviceaccounts\",\"system:serviceaccounts:deis\",\"system:authenticated\"" as="<self>" asgroups="<lookup>" namespace="mk-edp" uri="/api/v1/namespaces/mk-edp/pods?labelSelector=heritage%3Ddeis"

which correlates to the fix mentioned in #1292 (comment), namely adding the get verb for the pods resource. Shall we add this in?

edit: here's the full audit.log particular to the deis-controller serviceaccount, if helps: https://gist.github.com/vdice/5648dc621a7594273db7b5b167a534f2

@Bregor
Copy link
Contributor Author

Bregor commented May 25, 2017

Sure, thanks for testing, guys! I'll rebase and push ASAP

With this change deis-controller became available to work in RBAC-only clusters

Works with both Kubernetes 1.5 and 1.6 (see templates/_helpers.tmpl for details)
Actually tested with 1.5.7 and 1.6.4

ClusterRole allows deis-controller:
- namespaces: `get`, `list`, `create` and delete
- services: `get`, `list`, `create`, `update` and `delete`
- nodes: `get` and `list`
- events: `list` and `create`
- secrets: `list`, `get`, `create`, `update` and `delete`
- replicationcontrollers: `get`, `list`, `create`, `update` and `delete`
- replicationcontrollers/scale: `get` and `update`
- pods/log: `get`
- pods: `get`, `list` and `delete`
- resourcequotas: `get` and `create`
- apps/deployments: `get`, `list`, `create`, `update` and `delete`
- autoscaling/horizontalpodautoscalers: `get`, `list`, `create`, `update` and `delete`
- extensions/deployments: `get`, `list`, `create`, `update` and `delete`
- extensions/deployments/scale: `get` and `update`
- extensions/replicasets: `get`, `list`, `delete` and `update`
- extensions/replicasets/scale: `get` and `update`
- extensions/horizontalpodautoscalers: `get`, `list`, `create`, `update` and `delete`
- extensions/ingresses: `get`, `list`, `watch`, `create`, `update` and `delete`
@Bregor
Copy link
Contributor Author

Bregor commented May 26, 2017

@vdice, @mboersma I read carefully all the code in https://github.com/deis/controller/tree/master/rootfs/scheduler/resources and updated clusterrole accordingly.

@vdice
Copy link
Member

vdice commented May 26, 2017

Things look good after an additional round of testing, with specific focus on items relating to updated permissions in latest commits:

  • app updating/re-deploying (see comments above)
  • app autoscaling (deis autoscale)
  • app resource limiting (deis limits)

Planning on making sure others see the same and hopefully merging this along w/ the rest of the RBAC PRs on Tuesday (US holiday here on Monday).

@vdice vdice merged commit 0397cd1 into deis:master May 31, 2017
vdice pushed a commit to deis/workflow that referenced this pull request May 31, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants