Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeaderWorkerSet integration #3515

Merged
merged 13 commits into from
Jan 17, 2025

Conversation

vladikkuzn
Copy link
Contributor

@vladikkuzn vladikkuzn commented Nov 12, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds LeaderWorkerSet integration to job framework

Which issue(s) this PR fixes:

Part of #3232

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add integration for LeaderWorkerSet where Pods are managed by the pod-group integration.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Nov 12, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 12, 2024
@vladikkuzn
Copy link
Contributor Author

/assign

Copy link

netlify bot commented Nov 12, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit e90983f
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/678a7fc02442e00008e702e4

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 12, 2024
@vladikkuzn
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Nov 12, 2024
@vladikkuzn vladikkuzn force-pushed the leaderworkerset-integration branch 2 times, most recently from 342d0de to 4519993 Compare November 12, 2024 13:48
@vladikkuzn
Copy link
Contributor Author

/test all

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2024
@vladikkuzn vladikkuzn force-pushed the leaderworkerset-integration branch 3 times, most recently from ff26e24 to 6630f22 Compare November 19, 2024 14:25
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2024
@vladikkuzn vladikkuzn force-pushed the leaderworkerset-integration branch from 6630f22 to 3036113 Compare November 19, 2024 14:27
workerTemplateGroupNameLabelPath,
)...)

// TODO(#...): support resizes later
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO create issue

sizePath,
)...)

// TODO(#...): support mutation of leader/worker templates later
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO create issue

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2024
@vladikkuzn vladikkuzn force-pushed the leaderworkerset-integration branch from 3036113 to 34aa49c Compare November 22, 2024 09:57
@mbobrovskyi mbobrovskyi force-pushed the leaderworkerset-integration branch from 0d1767d to b482fff Compare January 17, 2025 15:00
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 17, 2025
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jan 17, 2025

@vladikkuzn: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kueue-test-e2e-main-1-28 4519993 link true /test pull-kueue-test-e2e-main-1-28

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mbobrovskyi mbobrovskyi force-pushed the leaderworkerset-integration branch from daf942e to e90983f Compare January 17, 2025 16:05
@mbobrovskyi
Copy link
Contributor

mbobrovskyi commented Jan 17, 2025

I would like to also make sure that LWS can be scaled up and down by groups. We can add an e2e test in a follow up. Have you had a chance to test it manually?

/hold for #3991.

Added test cases on e2e test to check scale-up/scale-down.

/unhold Done

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 17, 2025
return apivalidation.ValidateImmutableField(mungedPodSpec, oldPodSpec, fieldPath)
}

func IsManagedByKueue(obj client.Object) bool {
Copy link
Contributor

@mimowo mimowo Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is already large so I think we can delegate it to a follow up, but it seems reasonable to me to commonize this code with WorkloadShouldBeSuspended. I think this code https://github.com/kubernetes-sigs/kueue/blob/main/pkg/controller/jobframework/defaults.go#L49-L57 should call the new function.

gomega.Expect(k8sClient.Get(ctx, wlLookupKey1, createdWorkload1)).To(gomega.Succeed())
})

ginkgo.By("Scale up LeaderWorkerSet", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the e2e tests, the ability for scaling of LWS is great!

@mimowo
Copy link
Contributor

mimowo commented Jan 17, 2025

/lgtm
/approve
This is great - thank you for the important contribution!

The integration will be improved in follow ups, the main ones listed here.

cc @mwysokin @mwielgus @ahg-g @Edwinhr716 @kerthcet

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 23116bd437c2e9503444abc12b661162535907ce

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, vladikkuzn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 17, 2025
@k8s-ci-robot k8s-ci-robot merged commit 8e63a6e into kubernetes-sigs:main Jan 17, 2025
17 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.11 milestone Jan 17, 2025
@mbobrovskyi mbobrovskyi deleted the leaderworkerset-integration branch January 17, 2025 17:09
FillZpp pushed a commit to leptonai/kueue that referenced this pull request Feb 5, 2025
* LeaderWorkerSet integration

* LeaderWorkerSet integration.

* Keep the code consistent with STS

* Revert manage warningForPodManagedLabel for create.

* Add StartupPolicy validation.

* More strict validation for pod-group name.

* Add validation for PodTemplateSpec.

* DeepCopy only one PodSpec on ValidateImmutablePodSpec and allow to change InitContainers Image.

* Update pkg/controller/jobs/leaderworkerset/leaderworkerset_webhook.go

Co-authored-by: Michał Woźniak <[email protected]>

* Bump sigs.k8s.io/lws from v0.4.2 to v0.5.0.

* Add e2e test cases to check scale-up and scale-down.

* Update helm chart values.yaml.

* Use IsManagedByKueue on LeaderWorkerSet.

---------

Co-authored-by: Mykhailo Bobrovskyi <[email protected]>
Co-authored-by: Michał Woźniak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants