Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBAC: access denied on central dashboard #2832

Closed
7 tasks done
pritamdodeja opened this issue Aug 3, 2024 · 12 comments
Closed
7 tasks done

RBAC: access denied on central dashboard #2832

pritamdodeja opened this issue Aug 3, 2024 · 12 comments

Comments

@pritamdodeja
Copy link

pritamdodeja commented Aug 3, 2024

Validation Checklist

Version

1.9

Describe your issue

After installation, post running

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

And using localhost:8080, I do not get any response.

When I try to go to the centraldashboard, I get RBAC: access denied

Steps to reproduce the issue

Create default storage class with rook-ceph
Follow instructions from https://github.com/kubeflow/manifests?tab=readme-ov-file#upgrading-and-extending after checking out 1.9 release from manifests repo.

Possibly related, seeing

kubectl get pods --all-namespaces | grep -vi Running
NAMESPACE NAME READY STATUS RESTARTS AGE
istio-system kubeflow-m2m-oidc-configurator-28711075-5ktvt 0/1 Error 1 (13s ago) 18s
rook-ceph rook-ceph-osd-prepare-distml-6f6kh 0/1 Completed 0 92m

Put here any screenshots or videos (optional)

No response

@juliusvonkohout
Copy link
Member

You need to check the pod logs. Our tutorial is for Kind and it might be different on other Kubernetes cluster types.

@pritamdodeja
Copy link
Author

I have set this up in kind as well as k8s with version 1.9.0. Kind is working as expected. Will see if I can figure out what the delta is. Would appreciate any direction you can provide. Thank you!

@juliusvonkohout
Copy link
Member

istio-system kubeflow-m2m-oidc-configurator-28711075-5ktvt must be checked and fixed. There is commercial consulting and there are commercial distributions available if you are interested.

@juliusvonkohout
Copy link
Member

Please check #2840 as well

@thesuperzapper
Copy link
Member

@pritamdodeja by any chance are you using EKS?

@pritamdodeja
Copy link
Author

@pritamdodeja by any chance are you using EKS?

I'm using k8s on fedora 40 locally. Machine with two gpus, hoping to get flink operator to do some distributed processing with tfx pipelines. A pipe dream maybe, but that's the goal :)

@thesuperzapper
Copy link
Member

thesuperzapper commented Aug 20, 2024

@pritamdodeja @juliusvonkohout my bet is that because kubectl apply does not clean up removed resources, people are leaving old AuthorizationPolicy resources which are breaking the new oauth2-proxy based auth.

We need to give people a command to remove the ones from <1.8.0 so they dont all run into this issue.

This is part of why I made deployKF, because there is really no safe upgrade path without using something like ArgoCD to manage the cleanup of resources.


To help people clean up extra AuthorizationPolicies, here is a list of all the ones from a stock 1.9.0 install on my test cluster:

> kubectl get authorizationpolicy --all-namespaces
NAMESPACE                   NAME                                ACTION   AGE
istio-system                cluster-local-gateway               ALLOW    24h
istio-system                global-deny-all                              24h
istio-system                istio-ingressgateway                ALLOW    24h
istio-system                istio-ingressgateway-oauth2-proxy   CUSTOM   24h
knative-serving             activator-service                   ALLOW    24h
knative-serving             autoscaler                          ALLOW    24h
knative-serving             controller                          ALLOW    24h
knative-serving             istio-webhook                       ALLOW    24h
knative-serving             webhook                             ALLOW    24h
kubeflow-user-example-com   ml-pipeline-visualizationserver              24h
kubeflow-user-example-com   ns-owner-access-istio                        24h
kubeflow                    central-dashboard                   ALLOW    24h
kubeflow                    jupyter-web-app                     ALLOW    24h
kubeflow                    katib-ui                            ALLOW    24h
kubeflow                    kserve-models-web-app               ALLOW    24h
kubeflow                    metadata-grpc-service               ALLOW    24h
kubeflow                    minio-service                       ALLOW    24h
kubeflow                    ml-pipeline                                  24h
kubeflow                    ml-pipeline-ui                               24h
kubeflow                    ml-pipeline-visualizationserver              24h
kubeflow                    mysql                                        24h
kubeflow                    profiles-kfam                       ALLOW    24h
kubeflow                    service-cache-server                         24h
kubeflow                    tensorboards-web-app                ALLOW    24h
kubeflow                    volumes-web-app                     ALLOW    24h

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Aug 20, 2024

Well you can use labels and pruning as mentioned in the readme to get it done. But these are only rough guidelines so far, not detailed enough for new users.

Given some volunteers to work on it we could provide detailed upgrade instructions.

We can include a few upgrade commands in the readme.

@pritamdodeja
Copy link
Author

My situation actually is a new install, and I set up a default storage class (rook-ceph) as listed in the documentation. I'd love to help out in whichever way possible. I do have pretty good linux knowledge and have used KubeflowDagrunner to port local tfx pipelines to vertex etc., and have also just finished the CNCF class on Kubeflow pipelines. Thank you both!

@thesuperzapper
Copy link
Member

Well you can use labels and pruning as mentioned in the readme to get it done. But these are only rough guidelines so far, not detailed enough for new users.

Given some volunteers to work on it we could provide detailed upgrade instructions.

We can include a few upgrade commands in the readme.

@juliusvonkohout I still believe the manifests should be aimed at distribution vendors and highly advanced users who want to effectively roll their own distribution.

As soon as you start talking about opinionated ways to do updates, you probably are better off making your own distribution based on the manifests and advertising it to users to let the market decide which approach is best.

I'm not saying we can't list some basic suggestions, but it's hard to imagine a proper update solution that wouldn't become so opinionated as to make the manifest less useful downstream vendors.

For the vast majority of users, an opinionated Kubeflow distribution from a vendor they know will keep maintaining it is going to save them a lot of pain, and may be the difference between using Kubeflow or not.

@thesuperzapper
Copy link
Member

My situation actually is a new install, and I set up a default storage class (rook-ceph) as listed in the documentation. I'd love to help out in whichever way possible. I do have pretty good linux knowledge and have used KubeflowDagrunner to port local tfx pipelines to vertex etc., and have also just finished the CNCF class on Kubeflow pipelines. Thank you both!

@pritamdodeja then you might just have some other issue with your cluster, especially because you're also seeing the CronJob fail.

We are working on a fix that removes the need for the CronJob in #2850, if you want to try it out.

Although, that job failing should not prevent you accessing the central dashboard, so perhaps your cluster is just running out of resources / open file descriptors?

@juliusvonkohout
Copy link
Member

Lets merge into #2850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants