Add pod-deletion-cost as a new option #3680

khacminh · 2022-09-20T02:50:15Z

Proposal

Hello Keda team,

I have a suggestion that keda should have an option to use Pod deletion cost. With the option set to true, the pods that come later will have the lower deletion cost. It will make the deployment scale up and down in order.

Use-Case

As usual, I would like to have my upscaling pods scheduled on temporary worker nodes. Therefore, when scaling down, I would like these pods should be deleted first. So that, the temporary worker nodes can be released to save cost.

Anything else?

No response

tomkerkhove · 2022-09-20T07:35:31Z

Are you referring to have this option on the scale target? Because that is fully up to KEDA end-users as we do not define that. Or am I misinterpreting your suggestion?

khacminh · 2022-09-20T11:23:19Z

Hi @tomkerkhove,

Thank you for your reply. Let me describe more about my suggestion with this example:

My 3-node cluster has keda and cluster auto scaler (or Karpenter) configured. My scaler is configured to scale my deployment from 5 to 15 pods.

Without down Pod deletion cost

Scaling up

At the high traffic, my deployment is sequentially scaled to 15. There are 3 more worker nodes are added, and upscaling pods are distributed as:

Node 1: pod-1, pod-4
Node 2: pod-3, pod-5
Node 3: pod-2
Node 4: pod-6, pod-7, pod-8
Node 5: pod-9, pod-10, pod-11
Node 6: pod-12, pod-13, pod-14, pod-15

Node 1, Node 2, Node 3 are the stable nodes.
Node 4, Node 5, Node 6 are the new nodes

Scaling down

Now the traffic is down, and the scaler scales my deployment to 10 replicas. Without pod deletion cost, the pods will be terminated randomly. For example, they could be distributed as:

Node 1: pod-1, pod-4
Node 2: pod-3,
Node 3: pod-2
Node 4: pod-6, pod-8
Node 5: pod-10, pod-11
Node 6: pod-12, pod-15

And 6 nodes are still in charged.

When the deployment are scaled back to 5, the distributed could be:

Node 1: pod-4
Node 2: pod-3
Node 3: pod-2
Node 4: pod-6
Node 5: pod-10
Node 6:

And only Node 6 can be terminated (automatically) to save cost.

With Pod Deletion Cost

Scaling up

Keda can add annotation controller.kubernetes.io/pod-deletion-cost for them as

Pods	pod-deletion-cost	Node
pod-1	1000000	Node-1
pod-2	1000000	Node-3
pod-3	1000000	Node-2
pod-4	1000000	Node-1
pod-5	1000000	Node-2
pod-6	999999	Node-4
pod-7	999998	Node-4
pod-8	999997	Node-4
pod-9	999996	Node-5
pod-10	999995	Node-5
pod-11	999994	Node-5
pod-12	999993	Node-6
pod-13	999992	Node-6
pod-14	999991	Node-6
pod-15	999990	Node-6

Scaling down

The deployment is scaled down to 10, kubernetes will terminate the pods that have lower deletion cost. And the remaining pods are:

Pods	pod-deletion-cost	Node
pod-1	1000000	Node-1
pod-2	1000000	Node-3
pod-3	1000000	Node-2
pod-4	1000000	Node-1
pod-5	1000000	Node-2
pod-6	999999	Node-4
pod-7	999998	Node-4
pod-8	999997	Node-4
pod-9	999996	Node-5
pod-10	999995	Node-5

Now Node 6 can be terminated to save cost

The deployment is scaled down to 5, the remaining pods are:

Pods pod-deletion-cost Node

pod-1 1000000 Node-1

pod-2 1000000 Node-3

pod-3 1000000 Node-2

pod-4 1000000 Node-1

pod-5 1000000 Node-2

Node 4 and Node 5 can be terminated to save cost

I hope this example could make you clear about the use case.

JorTurFer · 2022-09-20T14:34:30Z

I think that this is useful but I'm not sure if it's something we can do from KEDA side. We could check the pods annotations and annotate them from max to 0, reducing the cost for every new instance, but I'm not totally sure if this could solve your problem.
I mean, the only thing KEDA can do is annotate every new pod with less cost, but it doesn't mean that is correct because in your scenario, the pod 7 could be scheduled in node 3 due to other pods evictions.

IDK if this management is under KEDA umbrella. WDYT @kedacore/keda-contributors ?

tomkerkhove · 2022-09-20T15:33:02Z

I think this scenario is nice, but as far as I can see this is fully managed on the scale target and more related to workload scheduling.

If we were to add this, hypothetically, how would you see this work?

khacminh · 2022-09-21T02:58:54Z

Hi @JorTurFer and @tomkerkhove

Thank you for your replies.

I think this use-case is common and useful for any cluster running on clouds. The ability to schedule the min/max replicas based on the predictable time frame in date is great. In my opinion, combining with dynamic scaling is better.

About the implementation, I think we should take an assumption that pod-deletion-cost works as it is designed. Keda controller should take care of the annotation value assigned to pods:

When keda scales the deployments to min or idle replicas, these pods should have the maximum cost
On scaling up, keda reducing the cost for every new pod.

With @JorTurFer example, pod-7 could be scheduled on node 3 (a stable node). In this scenario, the more pods scheduled on stable nodes, the fewer temporary nodes are needed and they could be free sooner to save cost.

It's just another question

It seems the time-based min/max configuration is not supported by keda. What do you think about this feature? I think it's useful for scenarios with the predictable workload.

tomkerkhove · 2022-09-21T11:36:00Z

I'm afraid this is entering the scheduler territory more than autoscaling. KEDA is not aware of what pods are being spun up so we can actually not achieve this ATM. What do you think @zroubalik @JorTurFer?

What could be done is use an admission controller and re-balance pods based on the new # of pods but again, more of a scheduling feature rather than autocsaling.

I do get why you want to have this though.

It seems the time-based min/max configuration is not supported by keda. What do you think about this feature? I think it's useful for scenarios with the predictable workload.

See cron scaler

khacminh · 2022-09-22T02:43:10Z

hi @tomkerkhove

I know that KEDA doesn't manage the pods and it may take more effort to manage them. My request is based on the priority of downscaling pods. And of course, it's all about the cost-saving with minimum manual effort and pod rescheduling. Kubernetes doesn't come with pod deletion cost until v1.21 (alpha). In my opinion, it's a great feature for better resource management. And it's great if KEDA supports this feature.

About time-based configuration. I have cron scalers in my application and never try with multiple scalers in a ScaledObject yet. I'll try it for combination conditions.

JorTurFer · 2022-09-24T19:59:55Z

I have been thinking about this. Let's say that we agree to implement this, we could use the operator reconciliation to check if there is any pod (from workload) without the annotation and in case of existing, annotate it with less weight than others.

Could this be enough? I mean, KEDA won't guarantee that "easier to remove" pods are in latest created nodes because there could be several events in the cluster which moves pods, missing the order.
E.g: the 3 instances with highest values are evicted from their nodes due to cpu/memory pressure in the node, you could pass from:

Pod	Node	Weight
A	1	10000
B	2	10000
C	1	10000
D	3	9999

to:

Pod	Node	Weight
D	3	9999
E (A)	3	9998
F (B)	4	9997
G (C)	4	9996

This is correct, but if the pressure in the node disappears, eventually you could have this scenario too:

Pod	Node	Weight
D	3	9999
E (A)	3	9998
F (B)	1	9997
G (C)	2	9996

What should KEDA do in this case? I mean, KEDA has annotated each pod with currentValue-1 but the scenario is the opposite to your scenario

khacminh · 2022-09-25T07:23:27Z

Hi @JorTurFer

What do you think if the KEDA operator can reset the weight of all pods to maximum value on scaling to minimum/idle replica(s)?

JorTurFer · 2022-09-25T08:52:30Z

Yes, my idea was to define a constant max value and do something like this during the reconciliation loops:

List all pods
Get currentValue from them (if all of them are empty, max value is the currentValue, if not, currentValue is the minimum between all of them)
Annotate each pod with currentValue - 1 (calculated for every pod) .

In this scenario, during scale to zero we will reset the counter.
My afraid is that not all the cases can scale to zero, but the edge case I said could happen too

khacminh · 2022-09-25T09:34:23Z

Hi @JorTurFer,

I think scaling to the minimum or idle replica(s) occurs frequently. So, resetting the counter on this event will help.

tomkerkhove · 2022-09-26T07:40:13Z

I'm not sure if this logic is actually correct, and, if we should do it because it's better to not do anything rather than to do it wrongly.

Given this is tightly related to the nodes and its current workload I'm not sure if KEDA should do this (Hello scheduler 👋)

Just listing all pods with the same maximum value on scale to zero is just a no-op since it will have no impact. I might not get the logic above and this statement can be wrong though.

khacminh · 2022-09-27T03:16:40Z

hi @tomkerkhove,

Could you please tell me why you suggest resetting pod deletion cost on scaling to zero, instead of scaling to minimum/idle replicas?

tomkerkhove · 2022-09-27T06:23:51Z

I have just used that because I was reading through the conversation, we'd have the same problem when scaling to min/idle replica count imo.

khacminh · 2022-09-28T02:55:58Z

Hi @tomkerkhove,

IMO, if a deployment is scaled to zero multiple times per day, it should be the scheduler task. The operation team should end up with serverless schedulers. However, in case the minimum/idle replicas is greater than 0, adding pod-deletion-cost will make the service more stable.

zroubalik · 2022-10-14T11:41:44Z

I share the concern that this might not be a job for KEDA (because so far KEDA doesn't change the target workload and I'd like to keep it this way), on the other hand I can see benefits.

What if we implement this as an add-on? A separate controller deployed next to KEDA Operator to manage this? This could be an optional component and after some time we can evaluate whether we would like to include it directly into KEDA core Operator.

JorTurFer · 2022-10-14T11:51:25Z

I share the concern that this might not be a job for KEDA (because so far KEDA doesn't change the target workload and I'd like to keep it this way), on the other hand I can see benefits.

What if we implement this as an add-on? A separate controller deployed next to KEDA Operator to manage this? This could be an optional component and after some time we can evaluate whether we would like to include it directly into KEDA core Operator.

I like this approach, KEDA will continue being simple, and based on how it works and the adoption, we could think about integrate it

stale · 2022-12-13T12:24:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2022-12-20T13:35:49Z

This issue has been automatically closed due to inactivity.

khacminh · 2022-12-21T02:29:48Z

@JorTurFer
Please reopen this issue because it is not completed

JorTurFer · 2022-12-21T07:27:19Z

Yes, we can reopen it but it has been closed by inactivity.
Our las comments don't have any answer so I though you weren't interested 😅

khacminh · 2022-12-22T03:24:44Z

Hi @JorTurFer
Because I totally agree with @zroubalik's approach so I left an emoji on his comment. I though it was enough 😅. I'm sorry for not commenting it out.

JorTurFer · 2022-12-22T08:48:17Z

okey! xD
Are you willing to contribute with this improvement?

khacminh · 2022-12-23T02:41:12Z

Hi @JorTurFer

For now, I cannot arrange my time for this work. But I'll take some of my free time on the http-add-on. Do you have an add-on repo template and contributor guide for it?

JorTurFer · 2022-12-23T20:51:43Z

For now, I cannot arrange my time for this work. But I'll take some of my free time on the http-add-on. Do you have an add-on repo template and contributor guide for it?

Unluckily, we don't have any contribution guide because that add-on was maintained by a single person :(
You are not the first one who ask about it, so after holidays we should consider doing the effort if that help to engage new contributors

tomkerkhove · 2022-12-25T10:38:16Z

There is a contribution guide for it available in https://github.com/kedacore/http-add-on/tree/main/docs.

For add-ons, though, it's good to check our scaler governance in our governance repo. We should discuss if this is a 3rd party or official add-on first.

I think community based might be better for now.

zroubalik · 2023-01-02T09:48:59Z

Yeah, so I'd suggest you to generate a new Go controller with Operator SDK, so it is in sync with KEDA core:
https://sdk.operatorframework.io/build/

And implement the functionality there.

khacminh added feature-request All issues for new features that have not been committed to needs-discussion labels Sep 20, 2022

tomkerkhove added this to Roadmap - KEDA Core Sep 20, 2022

tomkerkhove moved this to Proposed in Roadmap - KEDA Core Sep 20, 2022

stale bot added the stale All issues that are marked as stale due to inactivity label Dec 13, 2022

stale bot closed this as completed Dec 20, 2022

Repository owner moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Dec 20, 2022

JorTurFer reopened this Dec 21, 2022

Repository owner moved this from Ready To Ship to Proposed in Roadmap - KEDA Core Dec 21, 2022

stale bot removed the stale All issues that are marked as stale due to inactivity label Dec 22, 2022

JorTurFer added the stale-bot-ignore All issues that should not be automatically closed by our stale bot label Dec 22, 2022

milad2golnia mentioned this issue Sep 1, 2024

Targeted Pod Removal During Scaling Down #6124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pod-deletion-cost as a new option #3680

Add pod-deletion-cost as a new option #3680

khacminh commented Sep 20, 2022

tomkerkhove commented Sep 20, 2022

khacminh commented Sep 20, 2022 •

edited

Loading

JorTurFer commented Sep 20, 2022

tomkerkhove commented Sep 20, 2022

khacminh commented Sep 21, 2022 •

edited

Loading

tomkerkhove commented Sep 21, 2022

khacminh commented Sep 22, 2022 •

edited

Loading

JorTurFer commented Sep 24, 2022 •

edited

Loading

khacminh commented Sep 25, 2022

JorTurFer commented Sep 25, 2022

khacminh commented Sep 25, 2022

tomkerkhove commented Sep 26, 2022

khacminh commented Sep 27, 2022

tomkerkhove commented Sep 27, 2022

khacminh commented Sep 28, 2022

zroubalik commented Oct 14, 2022 •

edited

Loading

JorTurFer commented Oct 14, 2022 •

edited

Loading

stale bot commented Dec 13, 2022

stale bot commented Dec 20, 2022

khacminh commented Dec 21, 2022

JorTurFer commented Dec 21, 2022

khacminh commented Dec 22, 2022 •

edited

Loading

JorTurFer commented Dec 22, 2022

khacminh commented Dec 23, 2022

JorTurFer commented Dec 23, 2022 •

edited

Loading

tomkerkhove commented Dec 25, 2022

zroubalik commented Jan 2, 2023

Add pod-deletion-cost as a new option #3680

Add pod-deletion-cost as a new option #3680

Comments

khacminh commented Sep 20, 2022

Proposal

Use-Case

Anything else?

tomkerkhove commented Sep 20, 2022

khacminh commented Sep 20, 2022 • edited Loading

Without down Pod deletion cost

Scaling up

Scaling down

With Pod Deletion Cost

Scaling up

Scaling down

JorTurFer commented Sep 20, 2022

tomkerkhove commented Sep 20, 2022

khacminh commented Sep 21, 2022 • edited Loading

It's just another question

tomkerkhove commented Sep 21, 2022

khacminh commented Sep 22, 2022 • edited Loading

JorTurFer commented Sep 24, 2022 • edited Loading

khacminh commented Sep 25, 2022

JorTurFer commented Sep 25, 2022

khacminh commented Sep 25, 2022

tomkerkhove commented Sep 26, 2022

khacminh commented Sep 27, 2022

tomkerkhove commented Sep 27, 2022

khacminh commented Sep 28, 2022

zroubalik commented Oct 14, 2022 • edited Loading

JorTurFer commented Oct 14, 2022 • edited Loading

stale bot commented Dec 13, 2022

stale bot commented Dec 20, 2022

khacminh commented Dec 21, 2022

JorTurFer commented Dec 21, 2022

khacminh commented Dec 22, 2022 • edited Loading

JorTurFer commented Dec 22, 2022

khacminh commented Dec 23, 2022

JorTurFer commented Dec 23, 2022 • edited Loading

tomkerkhove commented Dec 25, 2022

zroubalik commented Jan 2, 2023

khacminh commented Sep 20, 2022 •

edited

Loading

khacminh commented Sep 21, 2022 •

edited

Loading

khacminh commented Sep 22, 2022 •

edited

Loading

JorTurFer commented Sep 24, 2022 •

edited

Loading

zroubalik commented Oct 14, 2022 •

edited

Loading

JorTurFer commented Oct 14, 2022 •

edited

Loading

khacminh commented Dec 22, 2022 •

edited

Loading

JorTurFer commented Dec 23, 2022 •

edited

Loading