Serving-aware partial preemption of workloads #3762

mimowo · 2024-12-06T16:26:30Z

What would you like to be added:

Serving workloads are different than training - they can be easily trimmed - a Deployment can run at 70% or 50% of Pods. This is different to most AI training workloads, where all Pods need to run. We want to leverage this fact and optimize preemptions.

In particular, when a new high priority workload comes in and we have multiple serving workloads, we want to distribute the preemptions across the serving workloads, rather than preempting one completely.

Note that this is also related to the partial preemption for batch workloads: #975. We may consider having a solution which solves both problems, but for now it seems reasonable to have this dedicated issue, emphasizing that serving workloads are special in this regard.

Why is this needed:

To improve experience of hosting mix of training and inference workloads. When the high-priority workload comes, we can make room for it by trimming multiple serving workloads, rather than preempting completely one.

Completion requirements:

This enhancement requires the following artifacts:

Design doc
API change
Docs update

The artifacts should be linked in subsequent comments.

mimowo · 2024-12-06T16:26:52Z

cc @mwielgus @mwysokin @tenzen-y

k8s-triage-robot · 2025-03-06T17:01:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

tenzen-y · 2025-03-06T23:41:21Z

/remove-lifecycle stale

mimowo added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 6, 2024

mimowo changed the title ~~Serving-aware preemption of workloads~~ Serving-aware partial preemption of workloads Dec 6, 2024

mimowo mentioned this issue Dec 6, 2024

Partial preemption of workloads #975

Open

3 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serving-aware partial preemption of workloads #3762

Serving-aware partial preemption of workloads #3762

mimowo commented Dec 6, 2024 •

edited

Loading

mimowo commented Dec 6, 2024

k8s-triage-robot commented Mar 6, 2025

tenzen-y commented Mar 6, 2025

Serving-aware partial preemption of workloads #3762

Serving-aware partial preemption of workloads #3762

Comments

mimowo commented Dec 6, 2024 • edited Loading

mimowo commented Dec 6, 2024

k8s-triage-robot commented Mar 6, 2025

tenzen-y commented Mar 6, 2025

mimowo commented Dec 6, 2024 •

edited

Loading