-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: .status.availableWorkerReplicas
#887
Conversation
func CalculateAvailableReplicas(pods corev1.PodList) int32 { | ||
count := int32(0) | ||
for _, pod := range pods.Items { | ||
if val, ok := pod.Annotations["ray.io/node-type"]; !ok || val != string(rayiov1alpha1.WorkerNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to use ray.io/node-type
because importing common.RayNodeTypeLabelKey
caused import cycle.
-: import cycle not allowed: import stack: [
github.com/ray-project/kuberay/ray-operator
github.com/ray-project/kuberay/ray-operator/controllers/ray
github.com/ray-project/kuberay/ray-operator/controllers/ray/batchscheduler
github.com/ray-project/kuberay/ray-operator/controllers/ray/batchscheduler/volcano
github.com/ray-project/kuberay/ray-operator/controllers/ray/common
github.com/ray-project/kuberay/ray-operator/controllers/ray/utils
github.com/ray-project/kuberay/ray-operator/controllers/ray/common
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Maybe we need to separate constant.go
into a separate module in the future. cc @DmitriGekhtman @Jeffwan @architkulkarni
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we should track solving that.
It would be better if we could eliminate common and utils by refactoring the package structure -- https://dave.cheney.net/2019/01/08/avoid-package-names-like-base-util-or-common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! Would you mind adding a screenshot of kubectl get raycluster
?
@@ -312,3 +312,34 @@ func TestReconcile_CheckNeedRemoveOldPod(t *testing.T) { | |||
|
|||
assert.Equal(t, PodNotMatchingTemplate(pod, workerTemplate), true, "expect template & pod not matching") | |||
} | |||
|
|||
func TestCalculateAvailableReplicas(t *testing.T) { | |||
podList := corev1.PodList{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding comments for the definition of available replicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to controllers/ray/utils/util.go
. Lmk if this it's good.
A worker is available if its Pod is running or pending.
Is it correct to count pending Pods or only running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the straightforward definition of "available replicas" is the number of worker Pods that are ready to serve actors and tasks. Also, we can refer to the definition of AvailableReplicas
in ReplicaSet.
[1] Kubernetes source code: How to calculate AvailableReplicas
?
[2] Understanding the Available condition of a Kubernetes deployment: I did not have a chance to read this article, but this may be helpful.
Do you have any thoughts about the definition of "available"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
straightforward definition of "available replicas" is the number of worker Pods that are ready to serve actors and tasks
This makes sense. We shouldn't count pending Pods here then, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. We shouldn't count pending Pods here then, right?
Yes. If referring to the definition of AvailableReplicas in ReplicaSet makes sense to you, we shouldn't count pending Pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Updated. Ready for another review
tested manually
|
b5eb1cd
to
ecb7901
Compare
to not count head Pods. closes ray-project#885
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failure is due to #852. |
Define availableWorkerReplicas.
Define availableWorkerReplicas.
to not count head Pods. closes #885
Checks