Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alkanso/python client #901

Merged
merged 4 commits into from
Mar 3, 2023
Merged

Alkanso/python client #901

merged 4 commits into from
Mar 3, 2023

Conversation

akanso
Copy link
Collaborator

@akanso akanso commented Feb 8, 2023

Why are these changes needed?

Today we can create/update/delete rayclusters using kubectl, helm, etc. with yaml/JSON manifests describing the raycluster.

A Python library that can programmatically manage rayclusters (using kuberay) will enable Python applications running on top of Ray and using KubeRay to add/remove/update worker-groups on the fly.

This library enables the application itself to scale the worker-groups horizontally and vertically.

Use case
Two main use cases:

1- Enable a Python application to programmatically manage rayclusters (using kuberay) without the need to deal with YAML/JSON files.

2- We can use this KubeRay Python Client Library in our tests (E2E/Integration) to manage rayclusters without having to run kubectl command from our tests that are written in Python

Related issue number

#899

Checks

This PR include its own test files with 88% test coverage.

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(
python -m coverage report
Name                                                                                                                  Stmts   Miss  Cover
-----------------------------------------------------------------------------------------------------------------------------------------
/home/azureuser/go-code/src/github.com/kuberay/clients/python-client/python_client/constants.py                           6      0   100%
/home/azureuser/go-code/src/github.com/kuberay/clients/python-client/python_client/utils/__init__.py                      0      0   100%
/home/azureuser/go-code/src/github.com/kuberay/clients/python-client/python_client/utils/kuberay_cluster_builder.py      72      9    88%
/home/azureuser/go-code/src/github.com/kuberay/clients/python-client/python_client/utils/kuberay_cluster_utils.py       146     28    81%
test_director.py                                                                                                         74      0   100%
test_utils.py                                                                                                            89      9    90%
-----------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                   387     46    88%

@akanso akanso requested review from Jeffwan and kevin85421 February 8, 2023 21:39
@akanso akanso self-assigned this Feb 8, 2023
},
}

return worker_group, True
Copy link
Contributor

@Yicheng-Lu-llll Yicheng-Lu-llll Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if there is a particular reason to return only the worker_group instead of cluster. Because it seems to me all other similar functions(populate_ray_head, populate_meta, and L213 in populate_worker_group) return the whole cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. this is mainly because multiple actors can be creating worker-groups for the same cluster, and eventually we would merge them into one cluster.

at a high level, something lie this:

actor1 --> populate(worker_group1)
actor2 --> populate(worker_group2)
cluster.workgroupspecs.append(worker_group1, worker_group2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Thank you!

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 15, 2023

I think it would be great to generate as more skeleton code as we can. Current way would scale well if the core api is evolving frequently which easily make the sdk outdated.

  1. Let's generate openapi file from crd. https://pkg.go.dev/k8s.io/kube-openapi
  2. Leverage GetOpenAPIDefinitions to generate openapi file
  3. then generate python code from openapi

print(json_formatted_str)

print(
"try: kubectl -n default get raycluster {} -oyaml".format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
"try: kubectl -n default get raycluster {} -oyaml".format(
"try: kubectl -n default get raycluster {} -o yaml".format(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-oyaml and -o yaml are equivalent, I fixed it to avoid any confusion :-)
fixed

ray_image: str = "rayproject/ray:2.2.0",
service_type: str = "ClusterIP",
cpu_requests: str = "1",
memory_requests: str = "1G",
Copy link
Collaborator

@DmitriGekhtman DmitriGekhtman Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't recommend less than 2Gi memory for the Ray head these days.
Maybe we can match the requests and limits here?

I don't think there's any universal sense of good default resource values for a Ray node, so I feel a bit iffy about providing resource defaults.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed now with 3G

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great contribution!

As we discuss in today's meeting, it is good to see some examples about:

(1) Not only create custom resource (RayCluster), but also other Kubernetes resources. Take ray-cluster.external-redis.yaml as an example, the YAML files include RayCluster, ConfigMap and Deployment for Redis.

(2) wait function to wait until the cluster is ready

This will be very helpful. With wait functions, we can implement RayJob with this Python client with the following pattern:

  • Step 1: Create a RayCluster
  • Step 2: Wait until the cluster is ready
  • Step 3: Execute some commands in the Ray head.
  • Step 4: Delete the RayCluster when the job is ready.

@akanso
Copy link
Collaborator Author

akanso commented Feb 16, 2023

  1. https://pkg.go.dev/k8s.io/kube-openapi

I see that the project is: This project is still in development and does not support all OpenAPI features

Is there an example for generating OpenAPI.json file from K8s CRD? I also cant find an example of generating the Python Client from a CRD.

Trying to generate the Python Client directly from the OpenAPIV3Schema of the CRD is giving validation errors...

@akanso akanso force-pushed the alkanso/python-client branch from dcd57d6 to 75d34c5 Compare February 16, 2023 23:20
@akanso
Copy link
Collaborator Author

akanso commented Feb 16, 2023

Thank you for this great contribution!

As we discuss in today's meeting, it is good to see some examples about:

(1) Not only create custom resource (RayCluster), but also other Kubernetes resources. Take ray-cluster.external-redis.yaml as an example, the YAML files include RayCluster, ConfigMap and Deployment for Redis.

(2) wait function to wait until the cluster is ready

This will be very helpful. With wait functions, we can implement RayJob with this Python client with the following pattern:

  • Step 1: Create a RayCluster
  • Step 2: Wait until the cluster is ready
  • Step 3: Execute some commands in the Ray head.
  • Step 4: Delete the RayCluster when the job is ready.

added the use-raw-config_map_with-api.py example that includes all the above

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 17, 2023

  1. https://pkg.go.dev/k8s.io/kube-openapi

I see that the project is: This project is still in development and does not support all OpenAPI features

Is there an example for generating OpenAPI.json file from K8s CRD? I also cant find an example of generating the Python Client from a CRD.

Trying to generate the Python Client directly from the OpenAPIV3Schema of the CRD is giving validation errors...

@akanso
Yeah, I find the example in kubeflow project that I worked on and please have a check

  1. generate openapi schema from CRD
    https://github.com/kubeflow/training-operator/blob/ea9785592390b40dce48c70dd9daa6f2bae62e22/hack/update-codegen.sh#L101-L106

https://github.com/kubeflow/training-operator/blob/master/pkg/apis/kubeflow.org/v1/openapi_generated.go

  1. generate openapi file
    https://github.com/kubeflow/training-operator/blob/master/hack/swagger/main.go

  2. generate python sdk
    https://github.com/kubeflow/training-operator/blob/master/hack/python-sdk/gen-sdk.sh

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Leave some nit comments. If they do not make sense to you, feel free to ignore them.

  2. I cannot run examples with pip install -e ..
    Screen Shot 2023-02-28 at 2 48 17 PM

I will do my best to review one pass every one or two days this week.

my_kuberay_api.create_ray_cluster(body=cluster0)
```

the director create the custer definition, and the custer_api acts as the http client sending the create (post) request to the k8s api-server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the director create the custer definition, and the custer_api acts as the http client sending the create (post) request to the k8s api-server
The director creates the cluster definition, and the custer_api acts as the http client sending the create (post) request to the k8s api-server


### cluster_builder

The builder allows you to build the cluster piece by piece, you are can customize more the values of the cluster definition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The builder allows you to build the cluster piece by piece, you are can customize more the values of the cluster definition
The builder allows you to build the cluster piece by piece, giving you more control to customize the values of the cluster definition.


### cluster_utils

the cluster_utils gives you even more options to modify your cluster definition, add/remove worker groups, change replicas in a worker group, duplicate a worker group, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the cluster_utils gives you even more options to modify your cluster definition, add/remove worker groups, change replicas in a worker group, duplicate a worker group, etc.
The cluster_utils gives you even more options to modify your cluster definition, add/remove worker groups, change replicas in a worker group, duplicate a worker group, etc.

│ ├── __init__.py
│ ├── kuberay_cluster_builder.py
│ └── kuberay_cluster_utils.py
└── python_client_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_api.py

clients/
└── python-client
├── README.md
├── examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use-raw-config_map_with-api.py


`pip install -e .`

#### to uninstall the module run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### to uninstall the module run
#### Uninstall the module


`pip install -U pip setuptools`

#### run the pip command
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### run the pip command
#### Install the module

├── test_director.py
└── test_utils.py

## For developers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## For developers
## Development


### For testing run

`python -m unittest discover 'path/to/kuberay/clients/python_client_test/'`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a section to describe how to run examples locally.

""" 
in case you are working directly with the source, and don't wish to 
install the module with pip install, you can directly import the packages by uncommenting the following code.
"""



"""
in case you are working directly with the source, and don't wish to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I install python_client by pip install -e .. However, I cannot import python_client. Do I miss anything? Thanks!

Screen Shot 2023-02-28 at 2 48 17 PM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import kuberay_cluster_api

from utils import kuberay_cluster_utils, kuberay_cluster_builder

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a chance to review every line of code (the PR is too large), and run all examples by myself. However, it is OK to merge this PR because:

(1) As a standalone module, any necessary bug fixes can be implemented via additional pull requests.
(2) The unit tests appear to be comprehensive

@kevin85421 kevin85421 merged commit 584da5a into master Mar 3, 2023
@akanso
Copy link
Collaborator Author

akanso commented Mar 3, 2023

@Jeffwan since this is a large change, We will do this change in the next PR, once we also collect more feedback from the folks using this feature....

@Jeffwan
Copy link
Collaborator

Jeffwan commented Mar 4, 2023

ok. sounds good to me

lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
@kevin85421 kevin85421 deleted the alkanso/python-client branch December 27, 2023 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants