-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release blocker][Feature] Only Autoscaler can make decisions to delete Pods #1253
[release blocker][Feature] Only Autoscaler can make decisions to delete Pods #1253
Conversation
Thank you very much about this new feature. I do think this is a really good idea!!! One source of truth is always preferable. |
if !enableInTreeAutoscaling || enableRandomPodDelete { | ||
// diff < 0 means that we need to delete some Pods to meet the desired number of replicas. | ||
randomlyRemovedWorkers := -diff | ||
r.Log.Info("reconcilePods", "Number workers to delete randomly", randomlyRemovedWorkers, "Worker group", worker.GroupName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rephrase the log statement.
"Randomly pick xx workers to delete from worker group xxx."
} | ||
r.Recorder.Eventf(instance, corev1.EventTypeNormal, "Deleted", "Deleted Pod %s", randomPodToDelete.Name) | ||
} else { | ||
r.Log.Info(fmt.Sprintf("Random Pod deletion is disabled for cluster %s. The only decision-maker for Pod deletions is Autoscaler.", instance.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Ray Autoscaler. (if that is what you mean :) )
…te Pods (ray-project#1253) Only Autoscaler can make decisions to delete Pods
…te Pods (ray-project#1253) Only Autoscaler can make decisions to delete Pods
Why are these changes needed?
Without this PR, there are two mechanisms to delete Pods.
replicas
andworkersToDelete
fields in RayCluster CR. Then, KubeRay operator will delete all Pods inworkersToDelete
.workersToDelete
have been deleted, and additional deletions are still required to reach the target state, the KubeRay operator will proceed to delete worker Pods randomly.This PR provides a new behavior for Pod deletion:
ENABLE_RANDOM_POD_DELETE
for users to go back to the old behavior if we ignore some edge cases.Random Pod deletion is undesirable for numerous reasons. We should try to avoid it as much as possible. In addition, this behavior can avoid Case 4 mentioned in #1238 (comment).
Related issue number
#1238
Checks
Test 1: No Autoscaler
Test 2: Autoscaler