-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RayService] Add support for multi-app config in yaml-string format #1156
Conversation
Signed-off-by: cindyz <[email protected]>
@@ -443,7 +443,7 @@ func (r *RayServiceReconciler) cleanUpServeConfigCache(rayServiceInstance *rayv1 | |||
configPrefix := r.generateConfigKeyPrefix(rayServiceInstance) | |||
|
|||
// Clean up RayCluster serve deployment configs. | |||
for key := range r.ServeDeploymentConfigs.Items() { | |||
for key := range r.ServeConfigs.Items() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of this function is quite confusing and seems to be causing a memory leak. To elaborate further, when a RayService is deleted, does the cache for the RayService instance get removed or not? I need to conduct some experiments to verify this. This is a technical debt issue and is not caused by this pull request. Therefore, we do not need to address this issue in the scope of this pull request.
ImportPath: configV1Spec.ImportPath, | ||
RuntimeEnv: applicationRuntimeEnv, | ||
Deployments: convertedDeploymentSpecs, | ||
Port: configV1Spec.Port, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we add Port
here? The behavior seems to be different from the following code snippet.
kuberay/ray-operator/controllers/ray/rayservice_controller.go
Lines 662 to 666 in 0ae9fc1
servingClusterDeployments := utils.ServingClusterDeployments{ | |
ImportPath: rayServiceInstance.Spec.ServeDeploymentGraphSpec.ImportPath, | |
RuntimeEnv: runtimeEnv, | |
Deployments: rayDashboardClient.ConvertServeConfig(rayServiceInstance.Spec.ServeDeploymentGraphSpec.ServeConfigSpecs), | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of the spec. The lines you linked here were only used for printing out the json in logs, I think it just missed the port. In UpdateDeployments
port was included before as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding, if we intend to make the port dynamically configurable, we will also need to update the configuration of Pods and Services. However, it is worth noting that RayCluster currently does not support dynamic updates to the configuration of Pods and Services."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I believe this is existing functionality so I don't want to change it in this PR - can you create a separate issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this change, but the corresponding function for this code snippet appears to be this instead of UpdateDeployments
in dashboard_httpclient.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if you look closely that code snippet is only used for print a json to the logs. It's not actually sent to the Serve API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open an issue later and resolve this comment. However, it seems that the ports for containers in a Pod cannot be updated dynamically. See this link for more details. In this case, the port can only be updated if users expose multiple ports, of which only a few will be used. I am not familiar with serving, so I'm unsure if there are any use cases for this situation.
Signed-off-by: cindyz <[email protected]>
Signed-off-by: cindyz <[email protected]>
Signed-off-by: cindyz <[email protected]>
Signed-off-by: cindyz <[email protected]>
Signed-off-by: cindyz <[email protected]>
Signed-off-by: cindyz <[email protected]>
ray-operator/config/samples/ray_v1alpha1_rayservice-multi-app.yaml
Outdated
Show resolved
Hide resolved
ImportPath: configV1Spec.ImportPath, | ||
RuntimeEnv: applicationRuntimeEnv, | ||
Deployments: convertedDeploymentSpecs, | ||
Port: configV1Spec.Port, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding, if we intend to make the port dynamically configurable, we will also need to update the configuration of Pods and Services. However, it is worth noting that RayCluster currently does not support dynamic updates to the configuration of Pods and Services."
cachedServeConfig, isServeConfig := cachedConfigObj.(rayv1alpha1.ServeDeploymentGraphSpec) | ||
if !isServeConfig { | ||
shouldUpdate = true | ||
reason = fmt.Sprintf("No Serve config has been cached for cluster %s with key %s", rayClusterInstance.Name, cacheKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message seems to be not precise enough. You can refer to https://go.dev/ref/spec#Type_assertions for more details about type assertions and the section "Implementing an interface" in https://go.dev/ref/spec#Method_sets about "implements".
More precisely, if T is not an interface type, x.(T) asserts that the dynamic type of x is identical to the type T. In this case, T must implement the (interface) type of x; otherwise the type assertion is invalid since it is not possible for x to store a value of type T. If T is an interface type, x.(T) asserts that the dynamic type of x implements the interface T.
In our case, cachedConfigObj
(i.e. x
) is an interface type, so isServeConfig=false
means that ServeDeploymentGraphSpec
does not implement the interface of cachedConfigObj
. In this context, isServeConfig=false
indicates that the serve configuration is cached, but the cache does not refer to the ServeDeploymentGraphSpec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I have updated the messages to include information about v1/v2 and the corresponding field types.
Signed-off-by: cindyz <[email protected]>
Signed-off-by: cindyz <[email protected]>
Signed-off-by: cindyz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will approve this PR after I manually test it.
reason = fmt.Sprintf("No V1 Serve Config of type ServeDeploymentGraphSpec has been cached for cluster %s with key %s", rayClusterInstance.Name, cacheKey) | ||
} else if !utils.CompareJsonStruct(cachedServeConfig, rayServiceInstance.Spec.ServeDeploymentGraphSpec) { | ||
shouldUpdate = true | ||
reason = fmt.Sprintf("Current V2 Serve config doesn't match cached Serve config for cluster %s with key %s", rayClusterInstance.Name, cacheKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reason = fmt.Sprintf("Current V2 Serve config doesn't match cached Serve config for cluster %s with key %s", rayClusterInstance.Name, cacheKey) | |
reason = fmt.Sprintf("Current V1 Serve config doesn't match cached Serve config for cluster %s with key %s", rayClusterInstance.Name, cacheKey) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Step 0: Create a Kubernetes cluster
kind create cluster --image=kindest/node:v1.23.0
# Step 1: Install a KubeRay operator (path: helm-chart/kuberay-operator/)
helm install kuberay-operator . --set image.repository=controller,image.tag=latest
# Step 2: Create a multi-app RayService (path: ray-operator/config/samples)
kubectl apply -f ray_v1alpha1_rayservice.yaml
# Step 3: Forward the serve port after the RayService is ready.
kubectl port-forward svc/rayservice-sample-serve-svc 8000
# Step 4: Test two applications
curl -X POST -H 'Content-Type: application/json' localhost:8000/fruit/ -d '["PEAR", 12]'
# expected output: 12
curl -X POST -H 'Content-Type: application/json' localhost:8000/calc/ -d '["ADD", 1]'
# expected output: "4 pizzas please!"
# Step 5: Edit the `rayVersion` to 2.100.0
kubectl edit rayservices.ray.io rayservice-sample
# Step 6: After the new RayCluster is ready, repeat Step 4.
…ay-project#1156) Add support for multi-app config in yaml-string format
Why are these changes needed?
The new field
serveConfigV2
under a RayService spec takes a yaml-formatted string (https://yaml-multiline.info/) which matches a Ray Serve multi-app config.Example from
kubectl describe rayservice rayservice-sample
:Notes:
oneOf
which is what we would need to ensure only one ofserveConfig
orserveConfigV2
is specified. However kubebuilder doesn't support generatingoneOf
(Support more complex validations (oneOf/anyOf/allOf/etc) kubernetes-sigs/controller-tools#461), so we perform validation in the service controller and consistently throw errors if user specifies both.Related issue number
Checks