[Ray] Support supervisor on oscar for ray task mode #3164

chaokunyang · 2022-06-23T10:33:25Z

Is your feature request related to a problem? Please describe.
Currently ray dag mode run supervisor in client, which have some issues:

Client and Ray cluster are usually not in the same cluster or data center. There may be instability and delay in communication, which will lead to serious latency in Ray Task scheduling, resulting in insufficient pipelineization and low task throughput and resource utilization.
The Mars Dashboard address cannot be accessed through a browser. Since Supervisor is created in Notebook, Mars Dashboard cannot be accessed through a proxy in ray cluster.
The small size of Notebook container may not meet the resource requirements for supervisor. Generally, the Notebook container specification is 2C4G or 4C8G. When the data scale is large, Supervisor may OOM.
Can not support the large-scale Failover which needs distributed Supervisor. Large-scale Failover need to save the lineage of a large number of subtasks, so it may need to make the Supervisor running in multiple Ray Actors to ensure that the Supervisor can store a large number of fine-grained lineages. If the Supervisor is running in the client, it is not an independent instance, and it is difficult to extend and make it distributed.
Ray driver resource usage are not managed by ray cluster, which also increase the possibilities of OOM

Describe the solution you'd like
We should support scheduling ray tasks in ray actors.

chaokunyang · 2022-08-09T11:54:12Z

Another issue is Ray client server bottleneck:

In the Ray Task mode, a large number of ObjectRefs are held in the supervisor. If the supervisor is created in the client, and the client is connected to the Ray cluster through the Ray Client, the square-level number of ObjectRefs in these intermediate processes will be processed by the Ray client server once. The client server becomes the cluster bottleneck.

chaokunyang mentioned this issue Jun 23, 2022

[ray] Support scheduling ray tasks in Ray oscar deploy backend #3165

Merged

2 tasks

chaokunyang closed this as completed in #3165 Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ray] Support supervisor on oscar for ray task mode #3164

[Ray] Support supervisor on oscar for ray task mode #3164

chaokunyang commented Jun 23, 2022

chaokunyang commented Aug 9, 2022 •

edited

Loading

[Ray] Support supervisor on oscar for ray task mode #3164

[Ray] Support supervisor on oscar for ray task mode #3164

Comments

chaokunyang commented Jun 23, 2022

chaokunyang commented Aug 9, 2022 • edited Loading

chaokunyang commented Aug 9, 2022 •

edited

Loading