Support scheduling constraints in Helm-deployed clusters #181

ebr · 2022-03-08T19:37:51Z

Why are these changes needed?

Currently, tolerations and nodeSelector specified in the values.yaml are not being passed to the Helm deployment. This PR fixes that.

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

Jeffwan · 2022-03-10T01:12:32Z

Thanks for the contribution. We should build helm chart validation in CI later to help us check yaml.

Jeffwan · 2022-03-10T01:25:41Z

@ebr I am trying to get some feedbacks from your side. Do you want to use helm to deploy a cluster? or prefer to use YAML directly? We do have helm charts for both operator and sample cluster. I am trying to evaluate how easy to use helm for cluster provision over YAML (kubectl)

ebr · 2022-03-10T15:56:53Z

@Jeffwan thank you for replying to me in Slack. To expand on what I said there: we had built some custom tooling to create Ray clusters from YAML+Jinja2 templates, and apply them with the Python K8s client. But this was based the old version of the CRD (from the kopf-based ray-operator). Once we switched to kuberay, it looked like we would have to do a major refactor of our code to use the new CRD. So we decided that using Helm charts from this repo would help us stay more in sync with the developments on this fast-moving project. Now I am beginning to realize this is still a challenge, because Helm charts still need to separately implement features supported by the Ray cluster spec (hence my PRs from the last few days - many thanks for the reviews and merges btw).

To answer your question: ultimately we would prefer to not use helm. But this seems to be the best middle ground right now for our users and devs to deploy Ray clusters in a parametrized fashion (helm install . --set [...] is easy enough). We don't want to force excessive YAML editing on the users. I.e., if we went with kustomize manifests, we'd still need to 1) build some CLI wrapper around it, and 2) rely on a local copy of those YAML files.

The kuberay CLI seems very promising (it essentially fulfills the same function as our homegrown tool and implements the same abstractions), but our users already have access to k8s, so exposing and securing the apiserver adds an untenable complication for us at the moment.

So it seems like the kuberay CLI without the apiserver in the middle (i.e. using the k8s client, and/or to rendering complete YAML manifests) would be the ideal solution for us, though looking through the CLI code it seems like maintaining feature parity between the CLI and the raycluster spec would still be manual (correct me if i'm wrong).

Hope this makes sense - I'd be happy to continue the conversation on Slack and/or contribute elsewhere. Thanks!

…#181) Co-authored-by: Eugene Brodsky <[email protected]>

…ray-project#181) Co-authored-by: Eugene Brodsky <[email protected]>

support nodeSelector and tolerations in ray clusters deployed by helm

4f00599

Jeffwan approved these changes Mar 10, 2022

View reviewed changes

Jeffwan merged commit 2db6bfa into ray-project:master Mar 10, 2022

ebr deleted the helm-scheduling branch March 10, 2022 15:56

chenk008 pushed a commit that referenced this pull request Mar 22, 2022

Support nodeSelector and tolerations in ray clusters deployed by helm (…

1085b0f

…#181) Co-authored-by: Eugene Brodsky <[email protected]>

lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023

Support nodeSelector and tolerations in ray clusters deployed by helm (…

038b4a6

…ray-project#181) Co-authored-by: Eugene Brodsky <[email protected]>

walterddr mentioned this pull request May 29, 2024

add priority and priorityClassName to ray-cluster template #2171

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support scheduling constraints in Helm-deployed clusters #181

Support scheduling constraints in Helm-deployed clusters #181

ebr commented Mar 8, 2022

Jeffwan commented Mar 10, 2022

Jeffwan commented Mar 10, 2022

ebr commented Mar 10, 2022

Support scheduling constraints in Helm-deployed clusters #181

Support scheduling constraints in Helm-deployed clusters #181

Conversation

ebr commented Mar 8, 2022

Why are these changes needed?

Checks

Jeffwan commented Mar 10, 2022

Jeffwan commented Mar 10, 2022

ebr commented Mar 10, 2022