-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] rolling upgrade design and implementation for Kuberay #527
Comments
This would be great to have. Let's figure out a design... |
Note: we need to review #231 and make a new design. |
@DmitriGekhtman I think we can separate the update for two role:
here me and Wilson will come up with detail design and we may have several round discussion. |
I like the strategy of splitting the discussion (and potentially even implementation) into updates for head and updates for worker. cc @brucez-anyscale for the head node HA aspect.
That's great! I'm looking forward to discussing the design of this functionality -- I think it's very important. |
Right now. RayService does the whole cluster level upgrading, so RayService works itself for now. |
@wilsonwang371 Here I think we need to find the exact use case that user can benefit from the feature. First is the user behavior, following the previous discussion, we can make the assumption that in this story:
In all of those cases, we need to enable the mechanism that the ray package in images is compatible. Here are some scenarios that I can think about:
In this case, we would not need the feature since the recreate strategy would be enough, the only modification is to enable the worker upgrade in the reconcile.
here the situation is a little bit tricky since we need to support mechanisms in ray that migrate actors from old
This case is the most possible to have the rolling upgrade feature. Since for now we may recreate a brand new Indeed we need support standard update semantic for |
Let's first consider the most basic use-case that we were going for with the --forced-cluster-upgrade flag. When a user updates a RayCluster CR and applies it, they expect changes to pod configs to be reflected in the actual pod configuration, even if the change is potentially disruptive to the Ray workload. If you update a workerGroupSpec, The ability to do (destructive) updates is available with the Ray Autoscaler's VM node providers and with the legacy python-based Ray operator. The implementation for this uses hashes of last-applied node configuration. We could potentially do the same thing here. If Ray versions mismatch, things won't work out, no matter what, because Ray does not have cross-version compatibility. If workloads are running, they may be interrupted. These are complex, higher-order concerns, but we can start by just registering pod config updates. |
Question - why to create pod's directly and not |
I am curious if there has been any update on this feature or do we have any plans? If we are worried that we do not have a strong use case to focus on this, I can help. Not have rolling upgrade is a real pain for us. I am talking from the perspective of a ML platform that supports all ML teams within a company.
I am happy to discuss more on this, or help any way I can. |
@jhasm I don't want to speak for others, but believe serve will be critical to ensure 100% uptime during upgrades of Ray Cluster versions. The way a model is served shouldn't hinder the upgrade i.e serve cli, sdk, etc. I had some thoughts I wanted to share. There may be opportunities to enable cluster version rolling upgrades using Ray's GCS external Redis. A potential starting point may be to detect when the Ray Cluster version changes. If the version changes and the cluster name is currently deployed, then launch a new Ray cluster. Once jobs are transferred, have kuberay rewrite the service to point to the new cluster. I believe the more complex portion is transferring the jobs and actors to the new cluster. |
Keep the head service and serve service with the same name. |
Any update on this? Lack of rolling-update is like a no-go for many production serving workloads. |
The RayService custom resource is intended to support the upgrade semantics of the sort people in this thread are looking for. An individual Ray Cluster should be thought of as a massive pod -- there is not a coherent way to conduct a rolling upgrade of a single Ray cluster (though actually some large enterprises have actually managed to achieve this) tl;dr solutions for upgrades require multiple Ray clusters In my experience, doing anything "production-grade" with Ray requires multiple Ray clusters and external orchestration. |
@qizzzh, I just saw your message. As @DmitriGekhtman mentioned, upgrading Ray involves more than one RayCluster. For RayService, we plan to support incremental upgrades, meaning that we won't need a new, large RayCluster for a zero-downtime upgrade. Instead, we will gradually increase the size of the new RayCluster and decrease the size of the old one. If you want to chat more, feel free to reach out to me on Slack. Ray doesn't natively support rolling upgrade. It is impossible for KubeRay to achieve that (in the single RayCluster). This issue should move to Ray instead of KubeRay. Close this issue. I will open new issues to track incremental upgrade when I start to work on it. |
Hi @kevin85421 , is there any progress on this? or any tracking issue created? so we can check whether the incremental upgrade effort has been started or no. Thanks a lot! |
@zzb54321 there have been some discussions but no work started. I am willing to start a one-pager proposal on this effort. @kevin85421 any objections? |
sounds good! |
@kevin85421 what's an N+1 upgrade? |
RayService manages multiple (N) small RayCluster CRs simultaneously. When we need to upgrade the RayService CR, it creates a new small RayCluster CR and then tears down an old RayCluster CR. You can think of it like a K8s Deployment, where each Pod in the Deployment is a 1-node RayCluster. Then, use the K8s rolling upgrade mechanism to upgrade the K8s Deployment. |
Gotcha! That makes a lot of sense. I'll follow #2274 Do you think it would be possible to set e.g. an environment variable to be different in each small cluster automatically? We've been thinking about sharding our current one large cluster into multiple smaller clusters to handle increasing scale (probably roughly what is being referred to here) - it would be nice if we could do that via this mechanism so that we didn't have to manage that ourselves! |
@JoshKarpel Would you mind explaining why you need to have different environment variables for different small RayCluster CRs? For the short term, I plan to make RayService more similar to a K8s Deployment (where each Pod has the same spec) instead of a K8s StatefulSet. That is, I prefer to make all RayCluster CRs that belong to the same RayService CR have the same spec. If we make it stateful, I think the complexity will increase a lot. |
Oh, sorry, yes, I should have said why! Our goal here would be to shard a set of dynamically-created Serve applications (reconciled with our ML model store) across multiple clusters. Right now, we deploy the Serve applications from inside the cluster itself, so each cluster would need to know which shard it should be (e.g., to then do consistent hashing on the metadata that defines the Serve apps, so it knows which apps to create in itself). We don't deploy the apps through the RayService CR because we don't want KubeRay to consider them when determining the health of the cluster (see ray-project/ray#44226). That said - short term, your plan totally makes sense, and I agree that it will be much simpler! Once we have that maybe we can work on extending it to add some stateful-ness. By then maybe we'll have played with it in our setup and have something we could upstream. |
A few question about N+1 upgrade. Assume the RayService CR defines multiple applications. In the context of N+1 upgrade, there will be multiple small RayCluster CRs.
Thanks. |
@zzb54321 I think this plan would make one application fit into one cluster. it is not blue/green upgrade? Reference to k8s deployment, it like rolling upgrade to gradually upgrade all instances. |
Search before asking
Description
Right now we don't support Ray cluster rolling upgrade. This is a valid requirement for customers that has a large number of nodes in their Ray cluster deployment.
Use case
support rolling upgrade of Ray clusters which can be beneficial to users with large Ray cluster.
Related issues
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: