Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. #1102

kevin85421 · 2023-05-22T07:04:07Z

Why are these changes needed?

The no-monitor option in rayStartParams is used to determine whether the monitor.py process should be launched or not. The monitor process is not only for autoscaling but observability (e.g. Prometheus).

KubeRay autoscaling is enabled (i.e. `EnableInTreeAutoscaling: true`)

Ray autoscaler supports various node providers such as AWS, GCP, Azure, and Kubernetes. However, the default autoscaler is not compatible with Kubernetes. Therefore, we disable the monitor process by default via setting no-monitor to true and inject a sidecar container for KubeRay autoscaler.

kuberay/ray-operator/controllers/ray/common/pod.go

Lines 114 to 115 in 08792ca

    
           if instance.Spec.EnableInTreeAutoscaling != nil && *instance.Spec.EnableInTreeAutoscaling { 
        
           	headSpec.RayStartParams["no-monitor"] = "true"

KubeRay autoscaling is disabled

Since the default value of no-monitor in Ray is false when not explicitly specified, the monitor process, along with the Kubernetes-incompatible autoscaler, will be launched by default. Therefore, the purpose of this pull request is to initiate the monitor process without including the autoscaler.

@wuisawesome said that we can disable the autoscaling feature by making the autoscaler become READONLY.

https://sourcegraph.com/github.com/ray-project/ray@ac2230fd28f642e4523422a1e1eb15ab92f63943/-/blob/python/ray/autoscaler/_private/monitor.py?L220-L230

If autoscaling-config is not set in the head Pod's rayStartParams, BASE_READONLY_CONFIG will be used, which creates a READONLY autoscaler by default. Hence, in this PR, our focus is to prevent users from launching a KubeRay-incompatible autoscaler when they specify the autoscaling-config themselves.

Related issue number

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0
helm install raycluster kuberay/ray-cluster --version 0.5.0

# Log in to the head Pod
kubectl exec -it $YOUR_HEAD_POD -- bash

# Check /tmp/ray/session_latest/logs/monitor.log in the head Pod
2023-05-22 13:23:39,259 INFO autoscaler.py:364 -- StandardAutoscaler: {'cluster_name': 'default', 'max_workers': 0, 'upscaling_speed': 1.0, 'docker': {}, 'idle_timeout_minutes': 0, 'provider': {'type': 'readonly', 'use_node_id_as_ip': True}, 'auth': {}, 'available_node_types': {'ray.head.default': {'resources': {}, 'node_config': {}, 'max_workers': 0}}, 'head_node_type': 'ray.head.default', 'file_mounts': {}, 'cluster_synced_files': [], 'file_mounts_sync_continuously': False, 'rsync_exclude': [], 'rsync_filter': [], 'initialization_commands': [], 'setup_commands': [], 'head_setup_commands': [], 'worker_setup_commands': [], 'head_start_ray_commands': [], 'worker_start_ray_commands': [], 'head_node': {}, 'worker_nodes': {}}

Note that StandardAutoscaler above is the same as BASE_READONLY_CONFIG if autoscaling-config is not set.

Set `autoscaling-config`

# Step 0: Build Docker image for this PR (controller:latest), and load into the Kind cluster.
# Step 1: Install the KubeRay operator with the new Docker image
 helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0 --set image.repository=controller,image.tag=latest

# Step 2: Create a RayCluster with `autoscaling-config`
# Check this gist: https://gist.github.com/kevin85421/88315abaf9735722224e7787c9e54036

# Step 3: Check KubeRay operator log. You should be able to see the following log
# 2023-05-22T22:31:24.598Z        INFO    RayCluster-Controller   Detect autoscaling-config in head Pod's rayStartParams. The monitor process will initialize the monitor with the provided config. Please ensure the autoscaler is set to READONLY mode.

wuisawesome · 2023-05-22T22:54:48Z

+1 to leaving the monitor process in existence and disabling features like autoscaling if needed.

@kevin85421 I'm a little fuzzy on the overall integration with Ray. Could you summarize what the ray start command has (does it have a --no-monitor flag?) and where the monitor.py process is expected to be running (in the head node container or in the sidecar?) under different scenarios?

The default with autoscaling disabled
the default with autoscaling enabled
If someone changes the rayStartParams on the head node with autoscaling enabled
If someone changes the rayStartParams on the head node with autoscaling disabled

kevin85421 · 2023-05-22T23:17:20Z

@wuisawesome Thank you for the comment!

Case 1: The default with autoscaling disabled (EnableInTreeAutoscaling: false)
- The value of no-monitor option is based on user's rayStartParams. The default value of no-monitor in Ray is false.
Case 2: The default with autoscaling enabled (EnableInTreeAutoscaling: true)
- no-monitor will be set to "true" automatically. See here for more details.
Case 3: If someone changes the rayStartParams on the head node with autoscaling enabled.
- no-monitor will be set to "true" automatically. See here for more details.
Case 4: If someone changes the rayStartParams on the head node with autoscaling disabled
- The value of no-monitor option is based on user's rayStartParams. The default value of no-monitor in Ray is false.

kevin85421 · 2023-05-22T23:18:18Z

cc @Yicheng-Lu-llll

Yicheng-Lu-llll · 2023-05-24T00:03:04Z

LGTM!

I'm a bit curious, though. Under what circumstances would a user need to configure autoscaling-config for KubeRay? It seems there is no need due to incompatible.

If a user do provide autoscaling-config , I am wondering how to set incompatible autoscaler to READONLY.

kevin85421 · 2023-05-24T03:43:37Z

@Yicheng-Lu-llll thank you for your comments!

I'm a bit curious, though. Under what circumstances would a user need to configure autoscaling-config for KubeRay? It seems there is no need due to incompatible.

Honestly, I do not know. I would prefer not to hardcode autoscaling-config here, considering the lesson we learned from the init container injection.

Based on the comments in ReadOnlyNodeProvider, the READONLY autoscaler also serves the purpose of collecting cluster state. Therefore, users may configure autoscaling-config for observability reasons. cc @wuisawesome @DmitriGekhtman for further inputs

If a user do provide autoscaling-config , I am wondering how to set incompatible autoscaler to READONLY.

https://sourcegraph.com/github.com/ray-project/ray@ac2230fd28f642e4523422a1e1eb15ab92f63943/-/blob/python/ray/autoscaler/_private/monitor.py?L102

wuisawesome

sgtm. I almost wonder if we should take the more opinionated stance and require the --no-monitor flag at all times, but will defer to you on this.

kevin85421 · 2023-05-24T20:25:05Z

Discuss with @wuisawesome offline:

Consider running the monitor process as a sidecar container in any case.
The monitor process is responsible for Prometheus metrics, logs, "events," health check information, information to power Ray status, and autoscaling behavior.

kevin85421 · 2023-05-24T20:51:16Z

#1109

…le autoscaler. (ray-project#1102) Add a warning to discourage users from launching a KubeRay-incompatible autoscaler.

kevin85421 · 2023-11-15T19:46:09Z

More context about no-monitor: ray-project/ray#40911 (comment)

kevin85421 added 2 commits May 22, 2023 18:40

add no-monitor by default

fa0a845

update

7ea19e6

kevin85421 force-pushed the no-monitor branch from 718de4d to 7ea19e6 Compare May 22, 2023 21:34

kevin85421 changed the title ~~[WIP] Add no-monitor by default~~ Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. May 22, 2023

kevin85421 added 3 commits May 22, 2023 22:58

update

1c15240

update

f160436

update

0eed516

kevin85421 marked this pull request as ready for review May 22, 2023 23:17

kevin85421 requested review from wuisawesome, DmitriGekhtman, architkulkarni and gvspraveen May 22, 2023 23:17

architkulkarni approved these changes May 23, 2023

View reviewed changes

wuisawesome approved these changes May 24, 2023

View reviewed changes

kevin85421 merged commit 7fe4050 into ray-project:master May 24, 2023

kevin85421 mentioned this pull request May 24, 2023

[Discuss] Always run the monitor.py process in a sidecar container #1109

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. #1102

Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. #1102

kevin85421 commented May 22, 2023 •

edited

Loading

wuisawesome commented May 22, 2023

kevin85421 commented May 22, 2023

kevin85421 commented May 22, 2023

Yicheng-Lu-llll commented May 24, 2023 •

edited

Loading

kevin85421 commented May 24, 2023

wuisawesome left a comment

kevin85421 commented May 24, 2023

kevin85421 commented May 24, 2023

kevin85421 commented Nov 15, 2023

	if instance.Spec.EnableInTreeAutoscaling != nil && *instance.Spec.EnableInTreeAutoscaling {
	headSpec.RayStartParams["no-monitor"] = "true"

Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. #1102

Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. #1102

Conversation

kevin85421 commented May 22, 2023 • edited Loading

Why are these changes needed?

KubeRay autoscaling is enabled (i.e. EnableInTreeAutoscaling: true)

KubeRay autoscaling is disabled

Related issue number

Checks

Set autoscaling-config

wuisawesome commented May 22, 2023

kevin85421 commented May 22, 2023

kevin85421 commented May 22, 2023

Yicheng-Lu-llll commented May 24, 2023 • edited Loading

kevin85421 commented May 24, 2023

wuisawesome left a comment

Choose a reason for hiding this comment

kevin85421 commented May 24, 2023

kevin85421 commented May 24, 2023

kevin85421 commented Nov 15, 2023

kevin85421 commented May 22, 2023 •

edited

Loading

KubeRay autoscaling is enabled (i.e. `EnableInTreeAutoscaling: true`)

Set `autoscaling-config`

Yicheng-Lu-llll commented May 24, 2023 •

edited

Loading