-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alkanso/python client #901
Conversation
}, | ||
} | ||
|
||
return worker_group, True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if there is a particular reason to return only the worker_group
instead of cluster
. Because it seems to me all other similar functions(populate_ray_head
, populate_meta
, and L213 in populate_worker_group
) return the whole cluster
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. this is mainly because multiple actors can be creating worker-groups for the same cluster, and eventually we would merge them into one cluster.
at a high level, something lie this:
actor1 --> populate(worker_group1)
actor2 --> populate(worker_group2)
cluster.workgroupspecs.append(worker_group1, worker_group2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Thank you!
I think it would be great to generate as more skeleton code as we can. Current way would scale well if the core api is evolving frequently which easily make the sdk outdated.
|
print(json_formatted_str) | ||
|
||
print( | ||
"try: kubectl -n default get raycluster {} -oyaml".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
"try: kubectl -n default get raycluster {} -oyaml".format( | |
"try: kubectl -n default get raycluster {} -o yaml".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-oyaml
and -o yaml
are equivalent, I fixed it to avoid any confusion :-)
fixed
ray_image: str = "rayproject/ray:2.2.0", | ||
service_type: str = "ClusterIP", | ||
cpu_requests: str = "1", | ||
memory_requests: str = "1G", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't recommend less than 2Gi memory for the Ray head these days.
Maybe we can match the requests and limits here?
I don't think there's any universal sense of good default resource values for a Ray node, so I feel a bit iffy about providing resource defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed now with 3G
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this great contribution!
As we discuss in today's meeting, it is good to see some examples about:
(1) Not only create custom resource (RayCluster), but also other Kubernetes resources. Take ray-cluster.external-redis.yaml as an example, the YAML files include RayCluster, ConfigMap and Deployment for Redis.
(2) wait
function to wait until the cluster is ready
This will be very helpful. With wait
functions, we can implement RayJob with this Python client with the following pattern:
- Step 1: Create a RayCluster
- Step 2: Wait until the cluster is ready
- Step 3: Execute some commands in the Ray head.
- Step 4: Delete the RayCluster when the job is ready.
I see that the project is: This project is still in development and does not support all OpenAPI features Is there an example for generating OpenAPI.json file from K8s CRD? I also cant find an example of generating the Python Client from a CRD. Trying to generate the Python Client directly from the OpenAPIV3Schema of the CRD is giving validation errors... |
dcd57d6
to
75d34c5
Compare
added the |
@akanso
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my_kuberay_api.create_ray_cluster(body=cluster0) | ||
``` | ||
|
||
the director create the custer definition, and the custer_api acts as the http client sending the create (post) request to the k8s api-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the director create the custer definition, and the custer_api acts as the http client sending the create (post) request to the k8s api-server | |
The director creates the cluster definition, and the custer_api acts as the http client sending the create (post) request to the k8s api-server |
|
||
### cluster_builder | ||
|
||
The builder allows you to build the cluster piece by piece, you are can customize more the values of the cluster definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The builder allows you to build the cluster piece by piece, you are can customize more the values of the cluster definition | |
The builder allows you to build the cluster piece by piece, giving you more control to customize the values of the cluster definition. |
|
||
### cluster_utils | ||
|
||
the cluster_utils gives you even more options to modify your cluster definition, add/remove worker groups, change replicas in a worker group, duplicate a worker group, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cluster_utils gives you even more options to modify your cluster definition, add/remove worker groups, change replicas in a worker group, duplicate a worker group, etc. | |
The cluster_utils gives you even more options to modify your cluster definition, add/remove worker groups, change replicas in a worker group, duplicate a worker group, etc. |
│ ├── __init__.py | ||
│ ├── kuberay_cluster_builder.py | ||
│ └── kuberay_cluster_utils.py | ||
└── python_client_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_api.py
clients/ | ||
└── python-client | ||
├── README.md | ||
├── examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use-raw-config_map_with-api.py
|
||
`pip install -e .` | ||
|
||
#### to uninstall the module run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### to uninstall the module run | |
#### Uninstall the module |
|
||
`pip install -U pip setuptools` | ||
|
||
#### run the pip command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### run the pip command | |
#### Install the module |
├── test_director.py | ||
└── test_utils.py | ||
|
||
## For developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## For developers | |
## Development |
|
||
### For testing run | ||
|
||
`python -m unittest discover 'path/to/kuberay/clients/python_client_test/'` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add a section to describe how to run examples locally.
"""
in case you are working directly with the source, and don't wish to
install the module with pip install, you can directly import the packages by uncommenting the following code.
"""
|
||
|
||
""" | ||
in case you are working directly with the source, and don't wish to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import kuberay_cluster_api
from utils
import kuberay_cluster_utils
, kuberay_cluster_builder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have a chance to review every line of code (the PR is too large), and run all examples by myself. However, it is OK to merge this PR because:
(1) As a standalone module, any necessary bug fixes can be implemented via additional pull requests.
(2) The unit tests appear to be comprehensive
@Jeffwan since this is a large change, We will do this change in the next PR, once we also collect more feedback from the folks using this feature.... |
ok. sounds good to me |
Python API for KubeRay
Why are these changes needed?
Today we can create/update/delete rayclusters using kubectl, helm, etc. with yaml/JSON manifests describing the raycluster.
A Python library that can programmatically manage rayclusters (using kuberay) will enable Python applications running on top of Ray and using KubeRay to add/remove/update worker-groups on the fly.
This library enables the application itself to scale the worker-groups horizontally and vertically.
Use case
Two main use cases:
1- Enable a Python application to programmatically manage rayclusters (using kuberay) without the need to deal with YAML/JSON files.
2- We can use this KubeRay Python Client Library in our tests (E2E/Integration) to manage rayclusters without having to run kubectl command from our tests that are written in Python
Related issue number
#899
Checks
This PR include its own test files with
88%
test coverage.