Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support scheduling constraints in Helm-deployed clusters #181

Merged
merged 1 commit into from
Mar 10, 2022

Conversation

ebr
Copy link
Contributor

@ebr ebr commented Mar 8, 2022

Why are these changes needed?

Currently, tolerations and nodeSelector specified in the values.yaml are not being passed to the Helm deployment. This PR fixes that.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@Jeffwan
Copy link
Collaborator

Jeffwan commented Mar 10, 2022

Thanks for the contribution. We should build helm chart validation in CI later to help us check yaml.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Mar 10, 2022

@ebr I am trying to get some feedbacks from your side. Do you want to use helm to deploy a cluster? or prefer to use YAML directly? We do have helm charts for both operator and sample cluster. I am trying to evaluate how easy to use helm for cluster provision over YAML (kubectl)

@Jeffwan Jeffwan merged commit 2db6bfa into ray-project:master Mar 10, 2022
@ebr
Copy link
Contributor Author

ebr commented Mar 10, 2022

@Jeffwan thank you for replying to me in Slack. To expand on what I said there: we had built some custom tooling to create Ray clusters from YAML+Jinja2 templates, and apply them with the Python K8s client. But this was based the old version of the CRD (from the kopf-based ray-operator). Once we switched to kuberay, it looked like we would have to do a major refactor of our code to use the new CRD. So we decided that using Helm charts from this repo would help us stay more in sync with the developments on this fast-moving project. Now I am beginning to realize this is still a challenge, because Helm charts still need to separately implement features supported by the Ray cluster spec (hence my PRs from the last few days - many thanks for the reviews and merges btw).

To answer your question: ultimately we would prefer to not use helm. But this seems to be the best middle ground right now for our users and devs to deploy Ray clusters in a parametrized fashion (helm install . --set [...] is easy enough). We don't want to force excessive YAML editing on the users. I.e., if we went with kustomize manifests, we'd still need to 1) build some CLI wrapper around it, and 2) rely on a local copy of those YAML files.

The kuberay CLI seems very promising (it essentially fulfills the same function as our homegrown tool and implements the same abstractions), but our users already have access to k8s, so exposing and securing the apiserver adds an untenable complication for us at the moment.

So it seems like the kuberay CLI without the apiserver in the middle (i.e. using the k8s client, and/or to rendering complete YAML manifests) would be the ideal solution for us, though looking through the CLI code it seems like maintaining feature parity between the CLI and the raycluster spec would still be manual (correct me if i'm wrong).

Hope this makes sense - I'd be happy to continue the conversation on Slack and/or contribute elsewhere. Thanks!

@ebr ebr deleted the helm-scheduling branch March 10, 2022 15:56
chenk008 pushed a commit that referenced this pull request Mar 22, 2022
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants