Do not consider preemption.borrowWithinCohort in FairSharing preemptions #4165

gabesaba · 2025-02-06T14:50:21Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

preemption.borrowWithinCohort is being used to override dominantResourceShare, causing preemptions which do not match scheduling order, resulting in the bug described here. We do not document preemption.borrowWithinCohort as applying to FairSharing, but only the other two preemption policies - link.

Which issue(s) this PR fixes:

Fixes #3779

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Update FairSharing to be incompatible with ClusterQueue.Preemption.BorrowWithinCohort. Using these parameters together is a no-op, and will be validated against in future releases. This change fixes an edge case which triggered an infinite preemption loop when these two parameters were combined.

netlify · 2025-02-06T14:50:45Z

✅ Deploy Preview for kubernetes-sigs-kueue ready!

Name	Link
🔨 Latest commit	`93c0380`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67c856adbef6780008889e5e
😎 Deploy Preview	https://deploy-preview-4165--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

tenzen-y · 2025-02-06T15:55:48Z

/cc

mimowo · 2025-02-06T16:32:24Z

/hold
For determining if we want to support the preemption.borrowWithinCohort along with FairSharing, see #3779 (comment), as this PR makes the preemption.borrowWithinCohort setting no-op.

If we decide the combination is useful we may need to re-write the PR. If we decide the combination is not useful I would like to follow it up with validation which prevents combining the settings.

tenzen-y · 2025-02-06T17:40:00Z

/hold For determining if we want to support the preemption.borrowWithinCohort along with FairSharing, see #3779 (comment), as this PR makes the preemption.borrowWithinCohort setting no-op.

If we decide the combination is useful we may need to re-write the PR. If we decide the combination is not useful I would like to follow it up with validation which prevents combining the settings.

One more consideration: this PR tries to break the current FairSharing feature specification (before this PR, the borrowWithinCohort could be involved in the FairSharing decision). Ideally, we want to looking for another solution to resolve #3779. If we decide to go with this solution, we might want to prepare any mitigation way since this already has been published as Beta feature.

tenzen-y

Changes look great to me. The next step is decision whether we should take this way.

mimowo · 2025-02-07T07:25:47Z

I gave it another round of thinking.

Actually, I think the root cause for the infinite loop is that we do belowThreshold || strategy, because then the low priority workloads can bypass the strategy.

So, I have another idea to turn this check to just strategy, but filter out workloads above the threshold in findCandidates when fairSharing is enabled. I believe the idea has also a relatively low implementation cost.

This will make sure no workloads bypass the strategy, yet we require the target workloads to be below the priority threshold. I think it is likely a valid usecase to safe-guard high priority workloads from fair sharing, and the field allows for that.

I think the idea of the preemption.borrowWithinCohort.maxPriorityThreshold was to safe-guard high priority workloads from the preemption.

EDIT: I would be happy to move forward with the approach in this comment (unless you see some counter-examples) and backport it, regardless of the long-term plans of combining the configurations.

mimowo · 2025-02-07T07:25:57Z

WDYT @gabesaba @tenzen-y ?

tenzen-y · 2025-02-07T11:25:01Z

So, I have another idea to turn this check to just strategy, but filter out workloads above the threshold in findCandidates when fairSharing is enabled. I believe the idea has also a relatively low implementation cost.

My concern is making fairSharing complicated since the idea indicates cooperation between borrowWithinCohort.maxPriorityThreshold and fairSharing.
As an alternative approach, we might be able to add a similar filtering mechanism as borrowWithinCohort.maxPriorityThreshold to fairSharing, and then make borrowWithinCohort.maxPriorityThreshold and fairSharing as mutually exclusive.

But, I'm not honestly sure if whieh solution can be understandable for users. So, here, it might be better to proceed with your (@mimowo) approach, and then consider the above approach (fairSharing with filter mechanism) based on user feedback.
That is a more low-risk approach, IMO.

gabesaba · 2025-02-07T11:28:52Z

I think it is likely a valid usecase to safe-guard high priority workloads from fair sharing, and the field allows for that.

This protection is defined on the preempting CQ's side. Which means, for these high priority workloads to be safe, this setting needs to be applied to all CQs. I think a better interface would be preemption protection defined by the CQ that wants to protect its workloads. Or, for that CQ to simply not borrow for a flavor it doesn't want to risk preemptions in.

So, I have another idea to turn this check to just strategy, but filter out workloads above the threshold in findCandidates when fairSharing is enabled. I believe the idea has also a relatively low implementation cost.

This still leads to weird behavior when the workload wants to do reclaimWithinCohort. With this proposal, we would filter valid reclaimWithinCohort targets, so that the semantics no longer match non-fair sharing reclamation. We could add a condition to do reclaim when not borrowing [1][2] - but I'm opposed to introducing more complexity into an already complex algorithm. I think we should leave these decisions to the result of the fair share value, to to keep the complexity manageable.

notes:
[1] but adds even more complexity, as definition of borrowing is complex in the hierarchical case #3948 (comment)
[2] plus the CQ could stop borrowing once it preempts some of its own workloads

mimowo · 2025-02-07T11:29:55Z

As an alternative approach, we might be able to add a similar filtering mechanism as borrowWithinCohort.maxPriorityThreshold to fairSharing, and then make borrowWithinCohort.maxPriorityThreshold and fairSharing as mutually exclusive.

That was also my thought, but in a way FairSharing is already aware of other candidate filtering configurations based on preemption.reclaimWithinCohort, so respecting also preemtion.borrowWithinCohort.maxPriorityThreshold seems justified.

mimowo · 2025-02-07T11:55:14Z

we would filter valid reclaimWithinCohort targets, so that the semantics no longer match non-fair sharing reclamation

We already use preemption.reclaimWithinCohort for filtering of candidates later considered by FairSharing algorithm based on relative priority of preemptor and candidate. Considering preemption.borrowWithinCohort when fair sharing enabled just adds another constraint on the set of candidates based on absolute priority rather than the relative one.

We could add a condition to do reclaim when not borrowing [1][2]

The notes are valid, and I agree this approach leads to some sub-optimal decisions given the dynamic nature of "borrowing" as we progress through candidates. However:

the algorithm as in master does not make the check dynamic [2] based on the current needs of preemption, so there would be no regression in this regard
it eliminates the infinite loop while allowing to put cap on the priority of the preempted workloads (if one has such a business use case).
it lowers the complexity of the algorithm anyway as the filtering is moved to findCandidates which is easier to understand

but I'm opposed to introducing more complexity into an already complex algorithm

Depends how you look at it. IMO, the complexity of would be anyway lower compared to the current "main" branch, because we would essentially commit your changes inside the fairPreemptions, but move the complexity to the findCandidates. So, the main part of the algorithm would get simplified anyway.

In the end - I'm fine with the approach proposed in this PR as long as we confirm somehow this does not break any business use cases, but I'm worried confirming this might be difficult in practice given the OSS nature of the software. When we disable configuration it is generally better to go over some deprecation (feature-gate) process, and short term has something which can be cherry-picked as bugfixing with marginal risk of breaking.

gabesaba · 2025-02-07T16:51:52Z

Discussed and reached consensus with @mimowo offline

The infinite preemption issue (which this PR fixes), and whether we limit preemptions to a threshold priority (potentially how users expect this parameter to work with FairSharing), are separate issues. Since the latter didn't exist in FairSharing in 0.10.1 anyway, we will fix it separately, ensuring that the new API makes sense. I opened #4173 with a proposal.

I disagree with the approach of using preemption.borrowWithinCohort with FairSharing, as it results in opposite semantics in Classical Preemption vs Fair Sharing. See argument in #4173

As an alternative approach, we might be able to add a similar filtering mechanism as borrowWithinCohort.maxPriorityThreshold to fairSharing, and then make borrowWithinCohort.maxPriorityThreshold and fairSharing as mutually exclusive.

@tenzen-y Please take a look at #4173. I think we can add a threshold in reclaimWithinCohort to accomplish this

mimowo · 2025-02-10T10:43:22Z

Yes, I agree that supporting the borrowWithinCohort with FairSharing would be confusing. For example, my proposal (of filtering the candidates based on the threshold regardless of borrowing of the preemptor) leads also to a problematic behavior in some cases when the preemptor does not need to borrow as the algorithm progresses.

So, I'm happy to just make it no-op.

However, I would like to update the documentation for the preemption.borrowWithingCohort field to say: "it is not supported when fair sharing is enabled", or "the behavior when fair sharing is enabled is unspecified. The current implementation ignores the field". We may also update some other relevant places in the documentation.

Ideally, we would also have some warnings in webhooks and on Kueue startup, but if this requires substantial ground work then I'm ok with just documentation update.

tenzen-y · 2025-02-10T11:00:20Z

Ideally, we would also have some warnings in webhooks and on Kueue startup, but if this requires substantial ground work then I'm ok with just documentation update.

I think that it could be possible, easily. I guess that we can just pass the Kueue Configuration to the CQ webhook server, and hold it in

kueue/pkg/webhooks/clusterqueue_webhook.go

Line 43 in 3755065

type ClusterQueueWebhook struct{}

.

The validation could be performed when CQ is created or updated.

gabesaba · 2025-02-17T11:20:33Z

a2a965f to 1d96117 - rebase
1d96117 to 28653b2 - documentation + webhook validation

@mimowo @tenzen-y

k8s-ci-robot · 2025-02-18T14:47:44Z

LGTM label has been added.

Git tree hash: f7ddced2404225cdff9d823bdb5ca949983d2ded

mimowo · 2025-02-18T15:15:30Z

Leaving the hold label to release to @gabesaba

tenzen-y

/hold
for API comment appearance.

tenzen-y · 2025-02-18T15:44:13Z

charts/kueue/templates/crd/kueue.x-k8s.io_clusterqueues.yaml

@gabesaba Could you investigate the reason why new API comments are not recorded in CRD?

tenzen-y · 2025-02-26T14:24:35Z

/hold for API comment appearance.

@gabesaba what about this?

gabesaba · 2025-03-05T13:21:57Z

/hold for API comment appearance.

@gabesaba what about this?

Let me take a look. Latest push was just a rebase/merge conflict resolution

k8s-ci-robot · 2025-03-05T13:23:24Z

@gabesaba: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kueue-test-customconfigs-e2e-main	`a2a965f`	link	true	`/test pull-kueue-test-customconfigs-e2e-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

gabesaba · 2025-03-05T13:52:40Z

/hold for API comment appearance.

@gabesaba what about this?

Let me take a look. Latest push was just a rebase/merge conflict resolution

See https://github.com/kubernetes-sigs/kueue/compare/4bc5c8ff8d687619740293f9c269354dd46bc7ec..93c038059bb3338b06ad4a4d83cdb0ec7d1d7d38

If we document the field name which uses the type, it overrides the type's documentation. Should we update this all over, to document only one of the two places (perhaps the type, for consistency?) If yes, I will create an issue to track this work.

gabesaba · 2025-03-06T15:22:41Z

/hold for API comment appearance.

@gabesaba what about this?

Let me take a look. Latest push was just a rebase/merge conflict resolution

See https://github.com/kubernetes-sigs/kueue/compare/4bc5c8ff8d687619740293f9c269354dd46bc7ec..93c038059bb3338b06ad4a4d83cdb0ec7d1d7d38

If we document the field name which uses the type, it overrides the type's documentation. Should we update this all over, to document only one of the two places (perhaps the type, for consistency?) If yes, I will create an issue to track this work.

PTAL @tenzen-y

tenzen-y

@gabesaba Thank you for checking this. I'm ok with current API.
/lgtm
/approve
/hold cancel

k8s-ci-robot · 2025-03-07T19:13:06Z

LGTM label has been added.

Git tree hash: b9b0943e96dc5defd5eb4b56b4999eb3e72ea8e3

k8s-ci-robot · 2025-03-07T19:13:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabesaba, mimowo, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mimowo,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-infra-cherrypick-robot · 2025-03-07T19:38:28Z

@mimowo: #4165 failed to apply on top of branch "release-0.10":

Applying: Do not consider preemption.borrowWithinCohort in FairSharing preemptions
Using index info to reconstruct a base tree...
M	pkg/scheduler/preemption/preemption.go
M	pkg/scheduler/preemption/preemption_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/scheduler/preemption/preemption_test.go
Auto-merging pkg/scheduler/preemption/preemption.go
CONFLICT (content): Merge conflict in pkg/scheduler/preemption/preemption.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Do not consider preemption.borrowWithinCohort in FairSharing preemptions

In response to this:

/lgtm
/approve
/cherry-pick release-0.10 release-0.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tenzen-y · 2025-03-07T19:43:50Z

@gabesaba could you open cherry-pick PRs?

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 6, 2025

k8s-ci-robot requested review from mimowo and tenzen-y February 6, 2025 14:50

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 6, 2025

gabesaba mentioned this pull request Feb 6, 2025

Infinite preemption loop in fair sharing #3779

Closed

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 6, 2025

tenzen-y reviewed Feb 6, 2025

View reviewed changes

gabesaba mentioned this pull request Feb 7, 2025

FairSharing Preemption Configuration #4173

Open

3 tasks

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2025

gabesaba force-pushed the fair_sharing_preemption_loop branch from a2a965f to 1d96117 Compare February 17, 2025 11:15

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 17, 2025

gabesaba force-pushed the fair_sharing_preemption_loop branch from 1d96117 to 28653b2 Compare February 17, 2025 11:18

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 17, 2025

k8s-ci-robot assigned mimowo Feb 18, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 18, 2025

tenzen-y reviewed Feb 18, 2025

View reviewed changes

mimowo mentioned this pull request Feb 19, 2025

Allow publishing Kueue images prior to merge to the main branch #4310

Open

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2025

Do not consider preemption.borrowWithinCohort in FairSharing preemptions

3928585

gabesaba force-pushed the fair_sharing_preemption_loop branch from 8390d77 to 2a94ccf Compare March 5, 2025 13:20

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 5, 2025

gabesaba force-pushed the fair_sharing_preemption_loop branch from 2a94ccf to 4bc5c8f Compare March 5, 2025 13:23

Update Documentation

93c0380

gabesaba force-pushed the fair_sharing_preemption_loop branch from 4bc5c8f to 93c0380 Compare March 5, 2025 13:50

tenzen-y reviewed Mar 7, 2025

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 7, 2025

k8s-ci-robot assigned tenzen-y Mar 7, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 7, 2025

k8s-ci-robot merged commit 00f6b2c into kubernetes-sigs:main Mar 7, 2025
18 of 19 checks passed

k8s-ci-robot added this to the v0.11 milestone Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not consider preemption.borrowWithinCohort in FairSharing preemptions #4165

Do not consider preemption.borrowWithinCohort in FairSharing preemptions #4165

gabesaba commented Feb 6, 2025 •

edited

Loading

netlify bot commented Feb 6, 2025 •

edited

Loading

tenzen-y commented Feb 6, 2025

mimowo commented Feb 6, 2025

tenzen-y commented Feb 6, 2025

tenzen-y left a comment

mimowo commented Feb 7, 2025 •

edited

Loading

mimowo commented Feb 7, 2025

tenzen-y commented Feb 7, 2025

gabesaba commented Feb 7, 2025 •

edited

Loading

mimowo commented Feb 7, 2025

mimowo commented Feb 7, 2025

gabesaba commented Feb 7, 2025

mimowo commented Feb 10, 2025

tenzen-y commented Feb 10, 2025

gabesaba commented Feb 17, 2025

k8s-ci-robot commented Feb 18, 2025

mimowo commented Feb 18, 2025

tenzen-y left a comment •

edited

Loading

tenzen-y Feb 18, 2025

tenzen-y commented Feb 26, 2025

gabesaba commented Mar 5, 2025

k8s-ci-robot commented Mar 5, 2025 •

edited

Loading

gabesaba commented Mar 5, 2025

gabesaba commented Mar 6, 2025

tenzen-y left a comment

k8s-ci-robot commented Mar 7, 2025

k8s-ci-robot commented Mar 7, 2025

k8s-infra-cherrypick-robot commented Mar 7, 2025

tenzen-y commented Mar 7, 2025

Do not consider preemption.borrowWithinCohort in FairSharing preemptions #4165

Do not consider preemption.borrowWithinCohort in FairSharing preemptions #4165

Conversation

gabesaba commented Feb 6, 2025 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

netlify bot commented Feb 6, 2025 • edited Loading

✅ Deploy Preview for kubernetes-sigs-kueue ready!

tenzen-y commented Feb 6, 2025

mimowo commented Feb 6, 2025

tenzen-y commented Feb 6, 2025

tenzen-y left a comment

Choose a reason for hiding this comment

mimowo commented Feb 7, 2025 • edited Loading

mimowo commented Feb 7, 2025

tenzen-y commented Feb 7, 2025

gabesaba commented Feb 7, 2025 • edited Loading

mimowo commented Feb 7, 2025

mimowo commented Feb 7, 2025

gabesaba commented Feb 7, 2025

mimowo commented Feb 10, 2025

tenzen-y commented Feb 10, 2025

gabesaba commented Feb 17, 2025

k8s-ci-robot commented Feb 18, 2025

mimowo commented Feb 18, 2025

tenzen-y left a comment • edited Loading

Choose a reason for hiding this comment

tenzen-y Feb 18, 2025

Choose a reason for hiding this comment

tenzen-y commented Feb 26, 2025

gabesaba commented Mar 5, 2025

k8s-ci-robot commented Mar 5, 2025 • edited Loading

gabesaba commented Mar 5, 2025

gabesaba commented Mar 6, 2025

tenzen-y left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 7, 2025

k8s-ci-robot commented Mar 7, 2025

k8s-infra-cherrypick-robot commented Mar 7, 2025

tenzen-y commented Mar 7, 2025

gabesaba commented Feb 6, 2025 •

edited

Loading

netlify bot commented Feb 6, 2025 •

edited

Loading

mimowo commented Feb 7, 2025 •

edited

Loading

gabesaba commented Feb 7, 2025 •

edited

Loading

tenzen-y left a comment •

edited

Loading

k8s-ci-robot commented Mar 5, 2025 •

edited

Loading