Alkanso/python client #901

akanso · 2023-02-08T21:39:48Z

Why are these changes needed?

Today we can create/update/delete rayclusters using kubectl, helm, etc. with yaml/JSON manifests describing the raycluster.

A Python library that can programmatically manage rayclusters (using kuberay) will enable Python applications running on top of Ray and using KubeRay to add/remove/update worker-groups on the fly.

This library enables the application itself to scale the worker-groups horizontally and vertically.

Use case
Two main use cases:

1- Enable a Python application to programmatically manage rayclusters (using kuberay) without the need to deal with YAML/JSON files.

2- We can use this KubeRay Python Client Library in our tests (E2E/Integration) to manage rayclusters without having to run kubectl command from our tests that are written in Python

Related issue number

#899

Checks

This PR include its own test files with 88% test coverage.

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

python -m coverage report
Name                                                                                                                  Stmts   Miss  Cover
-----------------------------------------------------------------------------------------------------------------------------------------
/home/azureuser/go-code/src/github.com/kuberay/clients/python-client/python_client/constants.py                           6      0   100%
/home/azureuser/go-code/src/github.com/kuberay/clients/python-client/python_client/utils/__init__.py                      0      0   100%
/home/azureuser/go-code/src/github.com/kuberay/clients/python-client/python_client/utils/kuberay_cluster_builder.py      72      9    88%
/home/azureuser/go-code/src/github.com/kuberay/clients/python-client/python_client/utils/kuberay_cluster_utils.py       146     28    81%
test_director.py                                                                                                         74      0   100%
test_utils.py                                                                                                            89      9    90%
-----------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                   387     46    88%

Yicheng-Lu-llll · 2023-02-14T07:50:16Z

clients/python-client/python_client/utils/kuberay_cluster_utils.py

+            },
+        }
+
+        return worker_group, True


Wondering if there is a particular reason to return only the worker_group instead of cluster. Because it seems to me all other similar functions(populate_ray_head, populate_meta, and L213 in populate_worker_group) return the whole cluster.

good point. this is mainly because multiple actors can be creating worker-groups for the same cluster, and eventually we would merge them into one cluster.

at a high level, something lie this:

actor1 --> populate(worker_group1)
actor2 --> populate(worker_group2)
cluster.workgroupspecs.append(worker_group1, worker_group2)

I see your point. Thank you!

Jeffwan · 2023-02-15T01:40:13Z

I think it would be great to generate as more skeleton code as we can. Current way would scale well if the core api is evolving frequently which easily make the sdk outdated.

Let's generate openapi file from crd. https://pkg.go.dev/k8s.io/kube-openapi
Leverage GetOpenAPIDefinitions to generate openapi file
then generate python code from openapi

DmitriGekhtman · 2023-02-15T01:40:28Z

clients/python-client/examples/use-raw-with-api.py

+        print(json_formatted_str)
+
+    print(
+        "try: kubectl -n default get raycluster {} -oyaml".format(


Typo

Suggested change

"try: kubectl -n default get raycluster {} -oyaml".format(

"try: kubectl -n default get raycluster {} -o yaml".format(

-oyaml and -o yaml are equivalent, I fixed it to avoid any confusion :-)
fixed

DmitriGekhtman · 2023-02-15T01:48:24Z

clients/python-client/python_client/utils/kuberay_cluster_builder.py

+        ray_image: str = "rayproject/ray:2.2.0",
+        service_type: str = "ClusterIP",
+        cpu_requests: str = "1",
+        memory_requests: str = "1G",


We don't recommend less than 2Gi memory for the Ray head these days.
Maybe we can match the requests and limits here?

I don't think there's any universal sense of good default resource values for a Ray node, so I feel a bit iffy about providing resource defaults.

fixed now with 3G

kevin85421

Thank you for this great contribution!

As we discuss in today's meeting, it is good to see some examples about:

(1) Not only create custom resource (RayCluster), but also other Kubernetes resources. Take ray-cluster.external-redis.yaml as an example, the YAML files include RayCluster, ConfigMap and Deployment for Redis.

(2) wait function to wait until the cluster is ready

This will be very helpful. With wait functions, we can implement RayJob with this Python client with the following pattern:

Step 1: Create a RayCluster
Step 2: Wait until the cluster is ready
Step 3: Execute some commands in the Ray head.
Step 4: Delete the RayCluster when the job is ready.

akanso · 2023-02-16T00:41:50Z

https://pkg.go.dev/k8s.io/kube-openapi

I see that the project is: This project is still in development and does not support all OpenAPI features

Is there an example for generating OpenAPI.json file from K8s CRD? I also cant find an example of generating the Python Client from a CRD.

Trying to generate the Python Client directly from the OpenAPIV3Schema of the CRD is giving validation errors...

akanso · 2023-02-16T23:27:52Z

Thank you for this great contribution!

As we discuss in today's meeting, it is good to see some examples about:

(1) Not only create custom resource (RayCluster), but also other Kubernetes resources. Take ray-cluster.external-redis.yaml as an example, the YAML files include RayCluster, ConfigMap and Deployment for Redis.

(2) wait function to wait until the cluster is ready

This will be very helpful. With wait functions, we can implement RayJob with this Python client with the following pattern:

Step 1: Create a RayCluster

Step 2: Wait until the cluster is ready

Step 3: Execute some commands in the Ray head.

Step 4: Delete the RayCluster when the job is ready.

added the use-raw-config_map_with-api.py example that includes all the above

Jeffwan · 2023-02-17T18:06:20Z

https://pkg.go.dev/k8s.io/kube-openapi

I see that the project is: This project is still in development and does not support all OpenAPI features

Is there an example for generating OpenAPI.json file from K8s CRD? I also cant find an example of generating the Python Client from a CRD.

Trying to generate the Python Client directly from the OpenAPIV3Schema of the CRD is giving validation errors...

@akanso
Yeah, I find the example in kubeflow project that I worked on and please have a check

generate openapi schema from CRD
https://github.com/kubeflow/training-operator/blob/ea9785592390b40dce48c70dd9daa6f2bae62e22/hack/update-codegen.sh#L101-L106

https://github.com/kubeflow/training-operator/blob/master/pkg/apis/kubeflow.org/v1/openapi_generated.go

generate openapi file
https://github.com/kubeflow/training-operator/blob/master/hack/swagger/main.go
generate python sdk
https://github.com/kubeflow/training-operator/blob/master/hack/python-sdk/gen-sdk.sh

kevin85421

Leave some nit comments. If they do not make sense to you, feel free to ignore them.
I cannot run examples with pip install -e ..