-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Generic data abstraction on top of CRD #53
Comments
API DefinitionIn order to better manage resources at the API level, a few proto files will be defined to describe resources. Technically, we can reuse Kubernetes resource directly. However, RayCluster CRD is probably not the best data structure to describe a cluster. At the same time, we want to leave some flexibility to leave some flexibilities to use database to store history data in the near future (for example, pagination etc).
ComputeRuntime is equivalent to our header and worker pod template spec. Currently we only define some basic information, for rich feature like node affinity, tolerance etc. We have not include them yet.
ClusterRuntime is used to build node image. This is inspired by Anyscale. This is optional to some cluster, people can use base image + job level runtime as well.
Tech stack
|
Just a small comment: the names "ClusterRuntime" and "ComputeRuntime" are a little bit confusing. For me, the actual definition of "ComputeRuntime" is more like a "ClusterRuntime". |
@chaomengyuan I think we can come up other ideas for images and try not to confuse user. |
I think so, too. |
I make some changes to API definition. /cc @chenk008 @chaomengyuan
Please have a check. I am also thinking if we want to use reference like a foreign key or embed objects here. Since we don't use DB, we need to translate object to ConfigMap and then link everything together at cluster level.
|
Let's split this story into separate sub issues
|
@Jeffwan |
Search before asking
Description
In our system, not everyone is using
kubectl
to operate clusters directly. There're few major reasons.Current ray operator is very friendly to users who is familiar with Kubernetes operator pattern. For most data scientists, this way actually increase their learning curve.
Using kubectl requires sophisticated permission system. I think some kubernetes cluster doesn't enable user level authentication. In my company, we use loose RBAC management and corp SSO system is not integrated with Kubernetes OIDC at all.
Due to above reason, I think it's worth to build some generic abstraction on top of
RayCluster
CRD. With the core api support, we can easily build backend services, CLI, etc to bridge users. Underneath, it still use Kubernetes to manage real data.Are you willing to submit a PR?
/cc @chenk008 @akanso @chaomengyuan
The text was updated successfully, but these errors were encountered: