Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: KubeRay a toolkit to run Ray applications on Kubernetes #33

Closed
Jeffwan opened this issue Sep 15, 2021 · 3 comments
Closed

Proposal: KubeRay a toolkit to run Ray applications on Kubernetes #33

Jeffwan opened this issue Sep 15, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@Jeffwan
Copy link
Collaborator

Jeffwan commented Sep 15, 2021

Background

Infrastructure management and compute orchestration is critical to production Ray users and users likes to scale their applications in an infinite compute environment with zero code changes. Since Kubernetes becomes de-facto container orchestrator for enterprise, users leverage Kubernetes as a substrate for execution of distributed Ray programs.

Community provides a Python ray operator implementation. However, due to some special needs, Ant Group, Microsoft and Bytedance put some efforts to build a Golang based operator and decouple autoscaler from operator itself (see design for details). All of us are using this solution in our production environments.

Due to historical reason, we have three folders named with company name in this project. After our collaboration in #17, there's only one ray-operator under ray-contrib which is big step for further evolution.

Proposal

In order to keep reducing maintenance efforts and simplifying user experience, more tools around Kubernetes and Ray operator become better developed. That means we plan to contribute more tools to this repo. However, we feel current repo name is not properly used. Technically speaking, any ray project could be added here. We think it might be better to reorganize Kubernetes related work in a separate repo which concentrate on Ray user's experiences on Kubernetes.

Besides ray-operator, some tools we plan to work on or already developed in downstream are

  • Kubectl plugin/CLI to operate CRD objects
  • Kubernetes event dumper for ray clusters/pod/services
  • Operator Integration with Kubernetes node problem detector
  • Kubernetes based workspace to easily submit ray jobs.
  • Prometheus stack integration for monitoring
  • ...
    (credits @chenk008 @caitengwei from AntGroup, @Jeffwan from Bytedance and @akanso from Microsoft)

Maybe we can call it KubeRay, a toolkit consist of different Kubernetes components and user can choose combination based on their Kubernetes environments. I think create a new repo like ray-project/kuberay is better and ray-contrib can be used for some incubated ideas. I think KubeRay will help attract more people participate in the community and It also help grows ray’s influence in CNCF/Kubernetes community. Lots of users are moving ML/DL workloads to Kubernetes and they should try Ray using this solution.

WDYT? Any feedbacks are welcomed!

/cc @chenk008 @caitengwei @akanso @chaomengyuan
/cc @zhe-thoughts @ericl @richardliaw @DmitriGekhtman @yiranwang52

@Jeffwan Jeffwan added the enhancement New feature or request label Sep 16, 2021
@chenk008
Copy link
Contributor

A toolkit which consist of different Kubernetes components can help user to deploy and manage ray cluster on Kubernetes, not only ray-operator. I think KubeRay is good idea.

@akanso
Copy link
Collaborator

akanso commented Sep 23, 2021

KubeRay is a good idea, it makes the repo much more focused.

One downside is other contributions (e.g. a community contributed Graphana dashboard) might not fit under Kube umbrella.

I think for now making this repo more focused, and K8s oriented, has more value than keeping it generic.

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Oct 27, 2021

project has been founded and this is a great joint efforts from everyone. We can close this issue.

@Jeffwan Jeffwan closed this as completed Oct 27, 2021
oksanabaza pushed a commit to oksanabaza/kuberay that referenced this issue Nov 19, 2024
Cherrypick the ODH dev branch changes to RHDS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants