-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pod-deletion-cost as a new option #3680
Comments
Are you referring to have this option on the scale target? Because that is fully up to KEDA end-users as we do not define that. Or am I misinterpreting your suggestion? |
Hi @tomkerkhove, Thank you for your reply. Let me describe more about my suggestion with this example: My 3-node cluster has keda and cluster auto scaler (or Karpenter) configured. My scaler is configured to scale my deployment from 5 to 15 pods. Without down Pod deletion costScaling upAt the high traffic, my deployment is sequentially scaled to 15. There are 3 more worker nodes are added, and upscaling pods are distributed as:
Scaling downNow the traffic is down, and the scaler scales my deployment to 10 replicas. Without pod deletion cost, the pods will be terminated randomly. For example, they could be distributed as:
And 6 nodes are still in charged. When the deployment are scaled back to 5, the distributed could be:
And only Node 6 can be terminated (automatically) to save cost. With Pod Deletion CostScaling upKeda can add annotation
Scaling down
Now Node 6 can be terminated to save cost
Node 4 and Node 5 can be terminated to save cost I hope this example could make you clear about the use case. |
I think that this is useful but I'm not sure if it's something we can do from KEDA side. We could check the pods annotations and annotate them from max to 0, reducing the cost for every new instance, but I'm not totally sure if this could solve your problem. IDK if this management is under KEDA umbrella. WDYT @kedacore/keda-contributors ? |
I think this scenario is nice, but as far as I can see this is fully managed on the scale target and more related to workload scheduling. If we were to add this, hypothetically, how would you see this work? |
Hi @JorTurFer and @tomkerkhove Thank you for your replies. I think this use-case is common and useful for any cluster running on clouds. The ability to schedule the min/max replicas based on the predictable time frame in date is great. In my opinion, combining with dynamic scaling is better. About the implementation, I think we should take an assumption that pod-deletion-cost works as it is designed. Keda controller should take care of the annotation value assigned to pods:
With @JorTurFer example, pod-7 could be scheduled on node 3 (a stable node). In this scenario, the more pods scheduled on stable nodes, the fewer temporary nodes are needed and they could be free sooner to save cost. It's just another questionIt seems the time-based min/max configuration is not supported by keda. What do you think about this feature? I think it's useful for scenarios with the predictable workload. |
I'm afraid this is entering the scheduler territory more than autoscaling. KEDA is not aware of what pods are being spun up so we can actually not achieve this ATM. What do you think @zroubalik @JorTurFer? What could be done is use an admission controller and re-balance pods based on the new # of pods but again, more of a scheduling feature rather than autocsaling. I do get why you want to have this though.
See cron scaler |
hi @tomkerkhove I know that KEDA doesn't manage the pods and it may take more effort to manage them. My request is based on the priority of downscaling pods. And of course, it's all about the cost-saving with minimum manual effort and pod rescheduling. Kubernetes doesn't come with pod deletion cost until v1.21 (alpha). In my opinion, it's a great feature for better resource management. And it's great if KEDA supports this feature. About time-based configuration. I have cron scalers in my application and never try with multiple scalers in a ScaledObject yet. I'll try it for combination conditions. |
I have been thinking about this. Let's say that we agree to implement this, we could use the operator reconciliation to check if there is any pod (from workload) without the annotation and in case of existing, annotate it with less weight than others. Could this be enough? I mean, KEDA won't guarantee that "easier to remove" pods are in latest created nodes because there could be several events in the cluster which moves pods, missing the order.
to:
This is correct, but if the pressure in the node disappears, eventually you could have this scenario too:
What should KEDA do in this case? I mean, KEDA has annotated each pod with |
Hi @JorTurFer What do you think if the KEDA operator can reset the weight of all pods to maximum value on scaling to minimum/idle replica(s)? |
Yes, my idea was to define a constant max value and do something like this during the reconciliation loops:
In this scenario, during scale to zero we will reset the counter. |
Hi @JorTurFer, I think scaling to the minimum or idle replica(s) occurs frequently. So, resetting the counter on this event will help. |
I'm not sure if this logic is actually correct, and, if we should do it because it's better to not do anything rather than to do it wrongly. Given this is tightly related to the nodes and its current workload I'm not sure if KEDA should do this (Hello scheduler 👋) Just listing all pods with the same maximum value on scale to zero is just a no-op since it will have no impact. I might not get the logic above and this statement can be wrong though. |
hi @tomkerkhove, Could you please tell me why you suggest resetting pod deletion cost on scaling to zero, instead of scaling to minimum/idle replicas? |
I have just used that because I was reading through the conversation, we'd have the same problem when scaling to min/idle replica count imo. |
Hi @tomkerkhove, IMO, if a deployment is scaled to zero multiple times per day, it should be the scheduler task. The operation team should end up with serverless schedulers. However, in case the minimum/idle replicas is greater than 0, adding pod-deletion-cost will make the service more stable. |
I share the concern that this might not be a job for KEDA (because so far KEDA doesn't change the target workload and I'd like to keep it this way), on the other hand I can see benefits. What if we implement this as an add-on? A separate controller deployed next to KEDA Operator to manage this? This could be an optional component and after some time we can evaluate whether we would like to include it directly into KEDA core Operator. |
I like this approach, KEDA will continue being simple, and based on how it works and the adoption, we could think about integrate it |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. |
@JorTurFer |
Yes, we can reopen it but it has been closed by inactivity. |
Hi @JorTurFer |
okey! xD |
Hi @JorTurFer For now, I cannot arrange my time for this work. But I'll take some of my free time on the http-add-on. Do you have an add-on repo template and contributor guide for it? |
|
There is a contribution guide for it available in https://github.com/kedacore/http-add-on/tree/main/docs. For add-ons, though, it's good to check our scaler governance in our governance repo. We should discuss if this is a 3rd party or official add-on first. I think community based might be better for now. |
Yeah, so I'd suggest you to generate a new Go controller with Operator SDK, so it is in sync with KEDA core: And implement the functionality there. |
Proposal
Hello Keda team,
I have a suggestion that keda should have an option to use Pod deletion cost. With the option set to true, the pods that come later will have the lower deletion cost. It will make the deployment scale up and down in order.
Use-Case
As usual, I would like to have my upscaling pods scheduled on temporary worker nodes. Therefore, when scaling down, I would like these pods should be deleted first. So that, the temporary worker nodes can be released to save cost.
Anything else?
No response
The text was updated successfully, but these errors were encountered: