-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find an automated way to set JWKS keys in RequestAuthentication for oauth2-proxy M2M (EKS, AKS, GKE, Rancher, Openshift) #2850
Comments
@juliusvonkohout @kimwnasptd we need to release a Kubeflow 1.9.1 patch with the CronJob removed as described above. |
The only remaining question is if we should do the issuer patching in a loop, so that even if someone accidentally reverts it after syncing, it will go back to being correct after x seconds. |
Ok, it seems like EKS does not actually serve all of its JWKS keys under At least you can automatically get the issuer URL from the cluster's But honestly, if EKS needs a hard coded workaround, I think many other distributions are going to need one also. Perhaps we should just go back to having KFP do the JWT validation itself, and let all requests with a JWT and no Or at very least disable the m2m feature by default, and have people configure it for their cluster when they enable it, because each cluster will be different. Plus, this is an advanced feature that most users will never need. |
i have follow step from
the
|
@jaffe-fly what distribution of Kubernetes are you using? Because as I was saying in #2850 (comment), this fix does not work yet on EKS. |
@thesuperzapper first of all thank you for the PR. I addresed most other issues and PRs in this repository, but my time is currently limited. I am fine with changing the technical implementation for 1.9.1, but we need to have it enabled by default and we rely on it in many tests already. |
v1.29.5 |
@juliusvonkohout you can just have a "kind overlay" which you additionally apply in your tests that enables M2M with the But as pretty much every K8s distribution will need a custom workaround (because each of them implements cluster OIDC in a different way), I think it's best to leave the specific implementation up to the vendors, and not apply the "kind overlay" in the default install (to avoid users installing it on a cluster like EKS). The fact that the current M2M implementation will fail on most managed Kubernetes services is a significant unacceptable regression in the default manifests for 1.9.0, the "in-cluster KFP SDK access" no longer works in most cases. Because the M2M feature literally only affects the KFP API (and the API already validates JWTs that are passed to it), we can just revert to an AuthorizationPolicy that allows any request to the KFP API from inside the cluster, as long as it does not set the user-id header. We should probably also add a basic non-m2m test for the default, to ensure this does not regress in the future. |
First of all we should document it in the main readme and provide a workaround until we find a long-term solution. |
Hi @thesuperzapper, Would it be possible to use Kubeflow Dex as JWT issuer ? or that is bad idea and we need to use k8s cluster native jwt issuer ?
My setup is, using example manifest and using RKE + rancher for k8s cluster. Thanks |
@kimwnasptd can you help with the documentation or implementation to enable M2M issuers in other clusters than kind (AKS, EKS Etc.)? For the time being we could also apply a workaround or just document better how to disable it. Some users get it configured on EKS etc. while some are overwhelmed. Nevertheless no one complained during the long distribution testing. @kromanow94 pledged to help as well in the last platform meeting. I might have more time in October after my two GSOC students have finished. |
@juliusvonkohout @kimwnasptd as there is not going to be a general solution for all clusters (each k8s distro uses a different auth system), I strongly believe our next steps need to be:
Also, I can't think of a situation where you would be running on the cluster and not have the cluster-issued JWT available to talk to the If we want to enable JWT access from outside the cluster (e.g. through the Istio gateway), then we should NOT be allowing cluster JWTs anyway (it's very insecure to exfiltrate cluster-issued JWTs), so we only need to trust our own Dex/other JWTs which is trivial. |
Given my limited availability due to GSOC my current knowledge is the following: The serviceaccount tokens for machine to machine communication is core functionality since Kubeflow 1.7, even with the oidc-authservice it was available. There is also Dex for user sessions. We rely in many of our tests on the authentication via Kubernetes serviceaccount tokens. Not just for KFP, but the whole Kubeflow API via the istio ingressgateway. So "Because the M2M feature literally only affects the KFP API" is not true. Again its core functionality and not something we will change for 1.9.1. These workflows which tests KFP, Jupyterlabs etc. programmatically are used as reference. "Update the "kind M2M" overlay to not need a CronJob" This is a realistic target for 1.9.1. "Remove the "kind M2M" overlay from the default while ! kustomize build example ... install" is not realistic and would violate the premises above. "Add some docs to the manifests readme about how to use the "kind m2m" overlay. Instead, we should revert to the Kubeflow Pipelines API only beeing accessible with a JWT if you DONT pass a kubeflow-userid header (enforced with an AuthorizationPolicy)". We can add documentation and warning but keep it the default and not add the overlay. If it is possible to support both ways in parallel this would also be interesting. https://github.com/kubeflow/manifests/blob/master/proposals/20240606-jwt-handling.md#requiring-a-jwt might help here. Another technical solution to support Kubernetes serviceaccount tokens as bearer tokens for the whole API would also be interesting, but please keep https://github.com/kubeflow/manifests/blob/master/proposals/20240606-jwt-handling.md in mind. For 1.10 we can do larger changes, but for the 1.9.1 patch release we need to be realistic. So is someone willing to update the documentation with a warning for now and We can also investigate supporting both authentication mechanisms at the same time with https://github.com/kubeflow/manifests/blob/master/proposals/20240606-jwt-handling.md#requiring-a-jwt Afterwards we need to test it with EKS, AKS, GKE, Rancher. @tarekabouzeid do you want to start with the documentation and point to manifests/common/oauth2-proxy/overlays/m2m/component-overwrite-m2m-token-issuer/kustomization.yaml Line 8 in afc358d
|
@juliusvonkohout and others watching, I have raised a PR which will stop Istio from verifying Kubernetes JWTs in the first place, it was unnecessary to enable M2M authentication so we can actually remove all this complexity. |
@juliusvonkohout Ok, I have made a significant update to the PR #2864, which reworks the auth flow, and provides 3 options for users to pick from, depending on their needs:
|
Hey, thank you for moving forward with this issue. At the same time, I'm sorry I didn't have the time to chime in earlier. I totally agree that the For EKS, GKE and other cluster with OIDC Issuer served behind publicly trusted certificates, it should be enough to set the OIDC Issuer URL here: manifests/common/oauth2-proxy/overlays/m2m/component-overwrite-m2m-token-issuer/kustomization.yaml Line 8 in da0255f
m2m-self-signed overlay. This does not deploy the CronJob and makes the m2m setup functional.
@jaffe-fly, the above paragraph should make the EKS setup functional. Basically, don't use the To add a crucial detail to the I hoped to explain that in https://github.com/kubeflow/manifests/blob/da0255f10d875040c2d845cd61b7938236c0dfaa/common/oauth2-proxy/components/configure-self-signed-kubernetes-oidc-issuer/README.md but, it seems I didn't do a good job in there. The idea behind I like the idea behind So, my opinion in a summary would circle around a few points:
|
@kromanow94 I think some of your remarks are already answered in the PR instead of the issue. |
@thesuperzapper I followed step from 0 to 3 and it works on AKS. AKS: 1.29.7 |
We can of course refine it in other issues/PRs so feel free to create them as needed. |
Related Issues
What's the problem?
We have a
RequestAuthentication/m2m-token-issuer
in theistio-system
namespace which is there to allow Istio to validate JWTs which are actually Kubernetes ServiceAccount tokens.This is only relevant to requests to
http://ml-pipeline-ui.kubeflow.svc.cluster.local
, as this is the only place we enforce asource.requestPrincipals
(see:AuthorizationPolicy/ml-pipeline
from thekubeflow
namespace).Retrieving the JWKS keys
Currently, we are using a CronJob called
kubeflow-m2m-oidc-configurator
to populate thejwks
information in theRequestAuthentication/m2m-token-issuer
:This is a bad idea for many reasons as it can result in failures if the JWKS tokens are rotated, or if the strange script fails for any reason (which seems like what is happening on all EKS clusters).
Setting the correct Issuer
Some clusters don't use
https://kubernetes.default.svc.cluster.local
as their cluster's JWT issuer (notably EKS and GKE), so any RquestAuthentication which assumes this will fail to validate JWTs issued by these clusters.The only way to know what a cluster's issuer should be is to retrieve it from the
/.well-known/openid-configuration
cluster endpoint.What's the solution?
We can instead directly set the
spec.jwtRules[0].jwksUri
of theRequestAuthentication
tohttps://KUBERNETES_API/openid/v1/jwks
as this always serves the cluster's JWSK.However, the
/openid/v1/jwks
endpoint is not available without authentication (read: accessed by a Pod in the cluster, presenting its ServiceAccount token as authorization).So as a workaround we can create a
kubectl proxy ...
pod which only exposes this endpoint of the API (note, this is not a security risk, as these are just public signing keys).ALSO, in terms of restricting proxy service to specific Pods, I don't think we can easily do that, because each Istio sidecar will need to access it to retrieve the JWSK for its requests.
0 - Create the required RBAC
First, create this
ServiceAccount
,Role
, andRoleBinding
:1 - Create the Proxy Service
First, create this
Deployment
:Next, create this
Service
:2 - Delete existing CronJob
Run this command to delete the
CronJob
:3 - Update the RequestAuthentication
Patch the
RequestAuthentication
as follows:The text was updated successfully, but these errors were encountered: