-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default workload identity bindings for deployments that need GCP permissions #61
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
/assign @Bobgy |
@Bobgy Isn't the profiles-manager already bound to a GSA? |
I looked at one of the auto deployments It looks like profiles manager is using KSA: serviceAccount: profiles-controller-service-account Which doesn't have a GSA
I think this is a bug. If the profiles manager isn't bound to any GSA it won't be able to attach GSA's to KSAs in newly provisioned namespaces. Looking at a v1 deployment it looks like it is bound to the admin account
|
that's true, It was originaly bound by kfctl We need to bind It by kcc now |
Thanks @Bobgy that makes sense; thanks for reporting and fixing this |
Notebook tests should build a docker image to run the notebook in. * kubeflow#613 currently the way we run notebook tests is by firing off a K8s job on the KF cluster which runs the notebook. * The K8s job uses init containers to pull in source code and install dependencies like papermill. * This is a bit brittle. * To fix this we will instead use Tekton to build a docker image that takes the notebook image and then adds the notebook code to it. * Dockerfile.notebook_runner dockerfile to build the test image. The pipeline to run the notebook consists of two tasks 1. A Tekton Task to build a docker image to run the notebook in 1. A tekton task that fires off a K8s job to run the notebook on the Kubeflow cluster. Here's a list of changes to make this work * tekton_client should provide methods to upload artifacts but not parse junits * Add a tekton_client method to construct the full image URL based on the digest returned from kaniko * Copy over the code for running the notebook tests from kubeflow/examples and start modifying it. * Create a simple CLI to wait for nomos to sync resources to the cluster * This is used in some syntactic sugar make rules to aid the dev-test loop The mnist test isn't completing successfully yet because GoogleCloudPlatform/kubeflow-distribution#61 means the KF deployments don't have proper GSA's to write to GCS. Related to: kubeflow#613
@Bobgy Lets not close this issue until
|
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
Notebook tests should build a docker image to run the notebook in. * kubeflow#613 currently the way we run notebook tests is by firing off a K8s job on the KF cluster which runs the notebook. * The K8s job uses init containers to pull in source code and install dependencies like papermill. * This is a bit brittle. * To fix this we will instead use Tekton to build a docker image that takes the notebook image and then adds the notebook code to it. * Dockerfile.notebook_runner dockerfile to build the test image. The pipeline to run the notebook consists of two tasks 1. A Tekton Task to build a docker image to run the notebook in 1. A tekton task that fires off a K8s job to run the notebook on the Kubeflow cluster. Here's a list of changes to make this work * tekton_client should provide methods to upload artifacts but not parse junits * Add a tekton_client method to construct the full image URL based on the digest returned from kaniko * Copy over the code for running the notebook tests from kubeflow/examples and start modifying it. * Create a simple CLI to wait for nomos to sync resources to the cluster * This is used in some syntactic sugar make rules to aid the dev-test loop The mnist test isn't completing successfully yet because GoogleCloudPlatform/kubeflow-distribution#61 means the KF deployments don't have proper GSA's to write to GCS. Related to: kubeflow#613
The most recent deploy "kf-vbp-0629-571" runs pipelines successfully, but workload identity binding is still not properly setup.
|
I'm guessing the change kubeflow/testing#705 didn't make it into the auto deploy. Is there a way we can trigger a manual deploy? |
Looked at the tekton task https://kf-ci-v1.endpoints.kubeflow-ci.cloud.goog/tekton/#/namespaces/auto-deploy/pipelineruns/deploy-kf-master-p5sb2, it's using
@jlewi Can you fix this? |
* Revamp how Tekton pipelines to run notebooks work. Notebook tests should build a docker image to run the notebook in. * #613 currently the way we run notebook tests is by firing off a K8s job on the KF cluster which runs the notebook. * The K8s job uses init containers to pull in source code and install dependencies like papermill. * This is a bit brittle. * To fix this we will instead use Tekton to build a docker image that takes the notebook image and then adds the notebook code to it. * Dockerfile.notebook_runner dockerfile to build the test image. The pipeline to run the notebook consists of two tasks 1. A Tekton Task to build a docker image to run the notebook in 1. A tekton task that fires off a K8s job to run the notebook on the Kubeflow cluster. Here's a list of changes to make this work * tekton_client should provide methods to upload artifacts but not parse junits * Add a tekton_client method to construct the full image URL based on the digest returned from kaniko * Copy over the code for running the notebook tests from kubeflow/examples and start modifying it. * Create a simple CLI to wait for nomos to sync resources to the cluster * This is used in some syntactic sugar make rules to aid the dev-test loop The mnist test isn't completing successfully yet because GoogleCloudPlatform/kubeflow-distribution#61 means the KF deployments don't have proper GSA's to write to GCS. Related to: #613 * tekton_client.py can't use format strings yet because we are still running under python2. * Remove f-style strings. * Fix typo. * Address PR comments. * * copy-buckets should not abort on error as this prevents artifacts from being copied and thus the results from showing up in testgrid see #703
* Use ACM to deploy the auto-deploy infrastructure; previously ACM was only being used to deploy the Tekton tasks; we weren't including the other manifests in the hydration. * Related to: GoogleCloudPlatform/kubeflow-distribution#61
* Use ACM to deploy the auto-deploy infrastructure; previously ACM was only being used to deploy the Tekton tasks; we weren't including the other manifests in the hydration. * Related to: GoogleCloudPlatform/kubeflow-distribution#61
The current auto-deployment was deployed before kubeflow/testing#707 was merged.
The next auto-deployment will hopefully have the fix. |
Thanks for the heads up! I'll wait for the next one then |
This is working properly now. I verified in https://kf-vbp-0630-8f5.endpoints.kubeflow-ci-deployment.cloud.goog/_/pipeline/#/runs/details/c7af3e03-87ab-4a6b-b30c-20d8b9a5a8ea, the tfx taxi sample ran successfully. |
I found a problem, Root cause: I used We should use |
@Bobgy Thanks; we will need to cherry-pick this into the 1.0 branch of manifests. |
We should verify permissions of notebooks whether this is good in auto deploys before closing this. |
I've found quite a few deployments don't work properly due to lack of GCP permissions, including:
/cc @jlewi
I think we should bind workload identity with these KSAs by default.
Proposal
Add workload identity binding for
default-editor will be automatically bound to user GSA when profiles manager works properly.
@jlewi I can implement this, thoughts?
The text was updated successfully, but these errors were encountered: