-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support Volcano for batch scheduling #755
Conversation
Thank @tgaddair for this contribution! In my opinion, KubeRay shouldn't be too opinionated about external tools. Maybe the better solution is to make sure the exposed interfaces are enough for users to configure which tools they want to use. cc @DmitriGekhtman @Jeffwan |
I generally agree. Counterarguments in this case:
@kevin85421 do you have an idea for a less invasive interface change that could work here? |
It does appear that the same functionality could be achieved with some boilerplate -- namely, adding the relevant annotations, configuring the podgroup yourself, and kubectl-applying a yaml file structured as
|
I know it is a well-established tool, but it also increases the complexity of the KubeRay operator. We need to write down the comparison between different solutions for Volcano integration and choose the best one. In addition, the Spark operator is weird for Spark folks (at least for the part of Spark). For example, it will create a driver for each Spark job (In most production cases, a cluster will only have 1 driver). Hence, the spark operator is an unusual operator for me, so I am not sure if it is a good example or not.
Currently, it is not heavy-weight. However, if we integrate a lot of external tools into the KubeRay operator, it will be heavy.
No, I have no context about Volcano. That's why I want to see the comparison between different solutions. I did not have a strong opinion about how to integrate Volcano, but I hope to see the comparison so that we can choose the best one. |
These are great points!
The operator does have enough issues as it is :) Yeah, if it's possible for us to document how to do it oneself that would be great. I imagine a developer building infrastructure on top of KubeRay could write their own operator to manage a |
I will help review this as well. Please wait for my comments. |
As @tgaddair explained yesterday, the reason "native" integration is required is that Volcano needs creation of a PodGroup with an owner reference to the RayCluster CR before the RayCluster's pods are reconciled. I think all of this is reasonable. We should probably take some steps to mitigate the risk from increased code complexity.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the overall looks good to me. I just left some minor comments.
I know some other users uses different solutions than volcano. but sig-scheduling is not able to unify the podgroup definition and normally it's hard to support multiples ones. volcano looks good to me since it's already CNCF project now
if EnableBatchScheduler { | ||
var minMember int32 | ||
var totalResource corev1.ResourceList | ||
if instance.Spec.EnableInTreeAutoscaling == nil || !*instance.Spec.EnableInTreeAutoscaling { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use minMember to indicate min
of gang or use a separate annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Volcano, it considers minMember
to be the min gang size, which we set to be the total replica count (if not using autoscaling) or the total minReplica count (when using autoscaling).
Did you want to make this more configurable so that users can specify their own min gang size?
@Jeffwan thanks for the review! I made some updates to make the implementation more general, so we can add in additional schedulers in the future. This also makes the implementation consistent with the Spark Operator, which follows a similar convention. I'll take some time to address your comments and ensure tests are passing. |
Co-authored-by: Dmitri Gekhtman <[email protected]> Signed-off-by: Travis Addair <[email protected]>
Last request from me: |
Hey @DmitriGekhtman, thanks for the great feedback. I added a pretty complete end-to-end example in the docs, and verified it worked as written. Let me know what you think! I believe this PR should be good to go at this point. |
Looks great, one last suggestion concerning memory allocation for the Ray head pod. |
Thanks @DmitriGekhtman, updated the example to use 2Gi of RAM for the head node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the excellent contribution.
This comes just in time for the KubeRay 0.4.0 branch cut!
metadata: | ||
name: kuberay-test-queue | ||
spec: | ||
weight: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain weight
in the doc?
name: kuberay-test-queue | ||
spec: | ||
weight: 1 | ||
capability: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible we to update the queue capability when there is new pod group waiting for being scheduled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will update docs to clarify this.
"sigs.k8s.io/controller-runtime/pkg/builder" | ||
) | ||
|
||
type BatchScheduler interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comments to explain the interface and all interface's functions.
AddMetadataToPod(app *rayiov1alpha1.RayCluster, pod *v1.Pod) | ||
} | ||
|
||
type BatchSchedulerFactory interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
oops, looks like I leave comments late. @tgaddair feel free to have follow up pr to address, Thank you for the contribution! |
Sorry, @sihanwang41, I was a bit trigger happy! @tgaddair feel free to follow-up on Sihan's documentation comments in another PR! |
Thanks @sihanwang41, will update in a follow-up that also fixes a typo in the example. |
@sihanwang41 addressed comments in #776. |
The document is very detailed! Thank @tgaddair for this high-quality contribution! I feel risky to merge this big feature into the master at the last minute of the v0.4.0 branch cut. Will we add this PR into release v0.4.0 or move it to the next release? cc @DmitriGekhtman @sihanwang41 Risk:
Any thoughts? Thanks! Updated:
It is okay to include this PR in release v0.4.0 after we test it. |
Thanks @kevin85421, these are reasonable concerns! Also, eventually, we need automated tools to validate a Kubernetes compatibility matrix -- I think that's tracked somewhere. |
Adds Volcano integration for KubeRay. Signed-off-by: Travis Addair <[email protected]> Co-authored-by: dengkai02 <[email protected]> Co-authored-by: Dmitri Gekhtman <[email protected]>
|
||
## Run Ray Cluster with Volcano scheduler | ||
|
||
Add the `ray.io/scheduler-name: volcano` label to your RayCluster CR to submit the cluster pods to Volcano for scheduling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can kuberay support RayJob CRD integrate volcano?
This PR is based on the branch created by @loleek and the Spark Operator integration with Volcano.
Why are these changes needed?
To enable advanced multi-tenancy features for running multiple Ray cluster in a shared environment.
Related issue number
Closes #697
Checks