-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] RayService not deploying when enableInTreeAutoscaling is true #643
Comments
cc. @DmitriGekhtman |
Thanks for all of the details! I will take a look. |
Reproduced the issue with the provided config. Looking into causes. |
I've identified the issue -- there's a bug stemming from inconsistency in the RayCluster controller's naming for the autoscaler's Role. The bug only occurs when the name of RayCluster is long enough, which is liable to happen with the RayCluster name generated by the RayService controller. I will open a PR fixing the bug. The short-term workaround is to use a shorter name for your RayService. |
I was able to deploy the RayService successfully by shortening its name to "rxam". |
Not to conflate issues, but we're also seeing an issue where the head-svc endpoint names are being truncated for names > 50 characters, could this be related? |
Truncation is necessary due to K8s length limits. |
Fix: #689 |
…689) Due to inconsistent truncation of RBAC names, it's not possible to deploy an autoscaling RayService with a long name. This PR fixes that issue. Closes #643. Signed-off-by: Dmitri Gekhtman <[email protected]>
…ay-project#689) Due to inconsistent truncation of RBAC names, it's not possible to deploy an autoscaling RayService with a long name. This PR fixes that issue. Closes ray-project#643. Signed-off-by: Dmitri Gekhtman <[email protected]>
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
When deploying a RayService, pods of the ray cluster are not being started if
enableInTreeAutoscaling
is true. I can see that the RayCluster and RayService resources exist in the Kubernetes cluster.Here are the logs of the operator:
Reproduction script
Note that the following RayService successfully deploys without
enableInTreeAutoscaling: true
Anything else
I'm using a namespace-scope operator and the nightly image of Kuberay
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: