Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent updating Job status on suspended jobs #4070

Conversation

IrvingMg
Copy link
Contributor

@IrvingMg IrvingMg commented Jan 27, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Follow-up of #3685

Which issue(s) this PR fixes:

Fixes #3730

Special notes for your reviewer:

Does this PR introduce a user-facing change?

MultiKueue: Do not update the status of the Job on the management cluster while the Job is suspended. This is updated  for jobs represented by JobSet, Kubeflow Jobs and MPIJob.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 27, 2025
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 27, 2025
Copy link

netlify bot commented Jan 27, 2025

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 554d025
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6797c076bebcf30008277d93
😎 Deploy Preview https://deploy-preview-4070--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@IrvingMg
Copy link
Contributor Author

/cc @mbobrovskyi @mszadkow

@mbobrovskyi
Copy link
Contributor

/lgtm
Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: be10fe42be72268734257ad92602e9f9515b1842

@mimowo
Copy link
Contributor

mimowo commented Jan 28, 2025

/release-note-edit

MultiKueue: Do not update the status of the Job on the management cluster while the Job is suspended. This is updated  for jobs represented by JobSet, Kubeflow Jobs and MPIJob.

@IrvingMg can you check / confirm if the Job CRDs updated in this PR (JobSet, KubeflowJobs, and MPIJob) can get status updated while they are suspended?

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jan 28, 2025
@IrvingMg
Copy link
Contributor Author

@IrvingMg can you check / confirm if the Job CRDs updated in this PR (JobSet, KubeflowJobs, and MPIJob) can get status updated while they are suspended?

Reproducing the original issue is difficult because it was found in a flaky test. However, I ran the end-to-end tests focusing on the updated jobs, and I can confirm that the condition is being hit. Here the filtered logs:

2025-01-28T16:04:57.02060716Z stderr F 2025-01-28T16:04:57.019930652Z	LEVEL(-2)	job/job_multikueue_adapter.go:65	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"job-job-d639b","namespace":"multikueue-ph4pf"}, "namespace": "multikueue-ph4pf", "name": "job-job-d639b", "reconcileID": "fb62f426-1649-4e76-87f7-59a70daa2b77"}
2025-01-28T16:11:10.156294898Z stderr F 2025-01-28T16:11:10.155437517Z	LEVEL(-2)	job/job_multikueue_adapter.go:65	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"job-job-f8ede","namespace":"multikueue-hkb9b"}, "namespace": "multikueue-hkb9b", "name": "job-job-f8ede", "reconcileID": "059ecfb1-0372-4cfa-817c-a9aedf3c6884"}
2025-01-28T16:11:25.194803451Z stderr F 2025-01-28T16:11:25.194242531Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-4ba49","namespace":"multikueue-mcwkz"}, "namespace": "multikueue-mcwkz", "name": "jobset-job-set-4ba49", "reconcileID": "b283331b-6158-4a4b-910d-c2b7bfbbcee2"}
2025-01-28T16:11:25.353220102Z stderr F 2025-01-28T16:11:25.352562597Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-4ba49","namespace":"multikueue-mcwkz"}, "namespace": "multikueue-mcwkz", "name": "jobset-job-set-4ba49", "reconcileID": "bf9e6e73-c592-4d93-b210-adc7b0224e84"}
2025-01-28T16:11:44.399687147Z stderr F 2025-01-28T16:11:44.396877044Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-6a4a1","namespace":"multikueue-thqbf"}, "namespace": "multikueue-thqbf", "name": "pytorchjob-pytorchjob1-6a4a1", "reconcileID": "6ff89def-dda7-4972-80d2-06aeda3485b5"}
2025-01-28T16:11:53.992862315Z stderr F 2025-01-28T16:11:53.990703383Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-75710","namespace":"multikueue-vd8x8"}, "namespace": "multikueue-vd8x8", "name": "mpijob-mpijob1-75710", "reconcileID": "9e322d27-e086-4ef8-8970-77156120433e"}
2025-01-28T16:17:49.811740413Z stderr F 2025-01-28T16:17:49.810452532Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-b5131","namespace":"multikueue-tm8cq"}, "namespace": "multikueue-tm8cq", "name": "jobset-job-set-b5131", "reconcileID": "be14fa41-ca4e-4b57-a68a-c2ace4a2a648"}
2025-01-28T16:18:04.699406613Z stderr F 2025-01-28T16:18:04.698937985Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-b868b","namespace":"multikueue-t668k"}, "namespace": "multikueue-t668k", "name": "pytorchjob-pytorchjob1-b868b", "reconcileID": "06a38ce9-0f72-4d61-8809-4336a32e2f9f"}
2025-01-28T16:18:14.167669443Z stderr F 2025-01-28T16:18:14.165920227Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-257dc","namespace":"multikueue-2jzk6"}, "namespace": "multikueue-2jzk6", "name": "mpijob-mpijob1-257dc", "reconcileID": "4899c52a-7a85-459d-a94e-7839c9db60ad"}
2025-01-28T16:19:40.705382314Z stderr F 2025-01-28T16:19:40.704292101Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-42c02","namespace":"multikueue-zq8hh"}, "namespace": "multikueue-zq8hh", "name": "jobset-job-set-42c02", "reconcileID": "fdfe6744-0c5d-44a1-8476-cb83b5a7fb0d"}
2025-01-28T16:19:56.518418675Z stderr F 2025-01-28T16:19:56.516982669Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-2455b","namespace":"multikueue-fnhw8"}, "namespace": "multikueue-fnhw8", "name": "pytorchjob-pytorchjob1-2455b", "reconcileID": "0082af87-8929-4b71-bb5c-a5a83c026210"}
2025-01-28T16:20:06.767705769Z stderr F 2025-01-28T16:20:06.755866763Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-b5584","namespace":"multikueue-nztd2"}, "namespace": "multikueue-nztd2", "name": "mpijob-mpijob1-b5584", "reconcileID": "7ceb0682-f9ec-40a9-9a80-8b528a323824"}
2025-01-28T16:20:16.885089961Z stderr F 2025-01-28T16:20:16.883932707Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-972c9","namespace":"multikueue-f82nq"}, "namespace": "multikueue-f82nq", "name": "jobset-job-set-972c9", "reconcileID": "08c0e0ff-1eb0-40f9-9508-9f0b1ebfbe64"}
2025-01-28T16:20:16.95236698Z stderr F 2025-01-28T16:20:16.950400139Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-972c9","namespace":"multikueue-f82nq"}, "namespace": "multikueue-f82nq", "name": "jobset-job-set-972c9", "reconcileID": "023d52a4-b6a1-4e9d-a93e-3f2328c5c026"}
2025-01-28T16:20:32.526456437Z stderr F 2025-01-28T16:20:32.525480309Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-02c36","namespace":"multikueue-282cl"}, "namespace": "multikueue-282cl", "name": "pytorchjob-pytorchjob1-02c36", "reconcileID": "7af3f389-3fa1-4df7-a47a-2e0c39b27207"}
2025-01-28T16:20:42.141265891Z stderr F 2025-01-28T16:20:42.139523967Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-b8c09","namespace":"multikueue-pfkdd"}, "namespace": "multikueue-pfkdd", "name": "mpijob-mpijob1-b8c09", "reconcileID": "38ca803f-ffb8-4939-a2c7-e8dda759091a"}
2025-01-28T16:20:53.206258662Z stderr F 2025-01-28T16:20:53.205853494Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-bbedb","namespace":"multikueue-vcxvg"}, "namespace": "multikueue-vcxvg", "name": "jobset-job-set-bbedb", "reconcileID": "c4330946-2262-4f89-a969-00b130201259"}
2025-01-28T16:21:08.525584101Z stderr F 2025-01-28T16:21:08.5229498Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-79c35","namespace":"multikueue-2mvp8"}, "namespace": "multikueue-2mvp8", "name": "pytorchjob-pytorchjob1-79c35", "reconcileID": "b0cd7cc1-ae2a-4642-a746-d011b8156356"}
2025-01-28T16:21:18.155650948Z stderr F 2025-01-28T16:21:18.154782987Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-cc11e","namespace":"multikueue-9qdmd"}, "namespace": "multikueue-9qdmd", "name": "mpijob-mpijob1-cc11e", "reconcileID": "8d386633-5ee3-4233-a2ae-5aaa5c95e3a6"}
2025-01-28T16:21:27.88506359Z stderr F 2025-01-28T16:21:27.883212458Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-93f12","namespace":"multikueue-68vd9"}, "namespace": "multikueue-68vd9", "name": "jobset-job-set-93f12", "reconcileID": "3113759a-33b3-4468-8dc6-b6fb390d9c6f"}
2025-01-28T16:21:44.515269425Z stderr F 2025-01-28T16:21:44.513936004Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-d74d4","namespace":"multikueue-xvwxh"}, "namespace": "multikueue-xvwxh", "name": "pytorchjob-pytorchjob1-d74d4", "reconcileID": "22a31f4d-1130-420b-ab9f-13e2cb748014"}
2025-01-28T16:21:53.450293575Z stderr F 2025-01-28T16:21:53.44976624Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-16f55","namespace":"multikueue-7lf76"}, "namespace": "multikueue-7lf76", "name": "mpijob-mpijob1-16f55", "reconcileID": "5a787bbf-4fff-4f96-9911-15d34695a5f7"}
2025-01-28T16:22:05.186761297Z stderr F 2025-01-28T16:22:05.185131083Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-b8e53","namespace":"multikueue-8djn7"}, "namespace": "multikueue-8djn7", "name": "jobset-job-set-b8e53", "reconcileID": "290f3451-89d8-4989-8320-66b508620d8e"}
2025-01-28T16:22:21.723795839Z stderr F 2025-01-28T16:22:21.722497292Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-dda18","namespace":"multikueue-hm478"}, "namespace": "multikueue-hm478", "name": "pytorchjob-pytorchjob1-dda18", "reconcileID": "029f00a5-700d-4e32-904a-95567928d1d7"}
2025-01-28T16:22:31.246882662Z stderr F 2025-01-28T16:22:31.246560869Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-4065f","namespace":"multikueue-mp6vl"}, "namespace": "multikueue-mp6vl", "name": "mpijob-mpijob1-4065f", "reconcileID": "b3953398-b7ec-4e02-b1ae-58398ae3a4c7"}
2025-01-28T16:22:42.289515943Z stderr F 2025-01-28T16:22:42.288100271Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-f758d","namespace":"multikueue-4wzkx"}, "namespace": "multikueue-4wzkx", "name": "jobset-job-set-f758d", "reconcileID": "3ff311a3-354a-47e6-92b8-8f1d66edea2f"}
2025-01-28T16:22:59.346308538Z stderr F 2025-01-28T16:22:59.343894488Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-f0d03","namespace":"multikueue-6t7m8"}, "namespace": "multikueue-6t7m8", "name": "pytorchjob-pytorchjob1-f0d03", "reconcileID": "7eebe82a-0867-4dd9-85db-29ec86c38e9e"}
2025-01-28T16:23:07.965351603Z stderr F 2025-01-28T16:23:07.962608385Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-2ab76","namespace":"multikueue-kmprk"}, "namespace": "multikueue-kmprk", "name": "mpijob-mpijob1-2ab76", "reconcileID": "3a3afb61-5bcd-4de4-9304-0ef550add3c4"}
2025-01-28T16:23:18.192629809Z stderr F 2025-01-28T16:23:18.191258179Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-2eada","namespace":"multikueue-cvqwt"}, "namespace": "multikueue-cvqwt", "name": "jobset-job-set-2eada", "reconcileID": "bf13453a-efa1-4f92-ac04-d422113924e2"}
2025-01-28T16:23:32.810795472Z stderr F 2025-01-28T16:23:32.807874254Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-22e97","namespace":"multikueue-4gj4w"}, "namespace": "multikueue-4gj4w", "name": "pytorchjob-pytorchjob1-22e97", "reconcileID": "e179e184-234e-4f12-a2ed-790a6c6d3304"}
2025-01-28T16:23:42.24540528Z stderr F 2025-01-28T16:23:42.244739278Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-9e942","namespace":"multikueue-msfb8"}, "namespace": "multikueue-msfb8", "name": "mpijob-mpijob1-9e942", "reconcileID": "24ae253e-7217-4519-92c9-03d0f03cd89b"}
2025-01-28T16:23:54.300869081Z stderr F 2025-01-28T16:23:54.300238162Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-e5834","namespace":"multikueue-mvxvq"}, "namespace": "multikueue-mvxvq", "name": "jobset-job-set-e5834", "reconcileID": "f6568a57-1e86-4cf8-af04-f13b18a6e6a0"}
2025-01-28T16:23:54.430154601Z stderr F 2025-01-28T16:23:54.429450307Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-e5834","namespace":"multikueue-mvxvq"}, "namespace": "multikueue-mvxvq", "name": "jobset-job-set-e5834", "reconcileID": "8eea4f80-8a2d-4183-a7d0-0af1d7b6a1a3"}
2025-01-28T16:24:09.997838207Z stderr F 2025-01-28T16:24:09.994890197Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-d8622","namespace":"multikueue-llvfq"}, "namespace": "multikueue-llvfq", "name": "pytorchjob-pytorchjob1-d8622", "reconcileID": "3aa3caaf-3cad-47c8-942c-7d95c3e64bcb"}
2025-01-28T16:24:22.573517231Z stderr F 2025-01-28T16:24:22.572843104Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-13cb1","namespace":"multikueue-g2lnb"}, "namespace": "multikueue-g2lnb", "name": "mpijob-mpijob1-13cb1", "reconcileID": "864ee9d2-018e-447a-b4ae-0f32880b69ef"}
2025-01-28T16:24:35.447405228Z stderr F 2025-01-28T16:24:35.446107766Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-03d25","namespace":"multikueue-5fq24"}, "namespace": "multikueue-5fq24", "name": "jobset-job-set-03d25", "reconcileID": "8288ccfd-d271-400e-b656-08b33ed29324"}
2025-01-28T16:24:50.492258457Z stderr F 2025-01-28T16:24:50.490439743Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-3d472","namespace":"multikueue-rfjvp"}, "namespace": "multikueue-rfjvp", "name": "pytorchjob-pytorchjob1-3d472", "reconcileID": "369b5b8a-6e62-4413-8da1-0f79741244c2"}
2025-01-28T16:25:00.699195911Z stderr F 2025-01-28T16:25:00.698701451Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-e4563","namespace":"multikueue-95np7"}, "namespace": "multikueue-95np7", "name": "mpijob-mpijob1-e4563", "reconcileID": "5a49000b-d0b7-42f8-9654-a29b744ae2b9"}
2025-01-28T16:25:12.38840284Z stderr F 2025-01-28T16:25:12.387543588Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-622c4","namespace":"multikueue-5cndd"}, "namespace": "multikueue-5cndd", "name": "jobset-job-set-622c4", "reconcileID": "593c6e96-5665-46d5-8388-3687236da165"}
2025-01-28T16:25:29.848562562Z stderr F 2025-01-28T16:25:29.844903509Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-acc96","namespace":"multikueue-zz7ml"}, "namespace": "multikueue-zz7ml", "name": "pytorchjob-pytorchjob1-acc96", "reconcileID": "68e912b9-86f3-4965-b9d0-ba2d4f556cdf"}
2025-01-28T16:25:40.905110585Z stderr F 2025-01-28T16:25:40.897876979Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-9f1ee","namespace":"multikueue-52bpr"}, "namespace": "multikueue-52bpr", "name": "mpijob-mpijob1-9f1ee", "reconcileID": "f4d00447-b834-4a5e-8bd7-9e8ac2fde684"}
2025-01-28T16:25:54.926569102Z stderr F 2025-01-28T16:25:54.922476548Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-efb89","namespace":"multikueue-lm9bs"}, "namespace": "multikueue-lm9bs", "name": "jobset-job-set-efb89", "reconcileID": "9caf6e6a-38c7-4dc2-8a71-9365fba6f280"}
2025-01-28T16:26:16.299653625Z stderr F 2025-01-28T16:26:16.295535112Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-47634","namespace":"multikueue-jts7z"}, "namespace": "multikueue-jts7z", "name": "pytorchjob-pytorchjob1-47634", "reconcileID": "c834fa36-62ab-46ab-bbb1-3a540325def9"}
2025-01-28T16:26:27.817943232Z stderr F 2025-01-28T16:26:27.805837111Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-ad514","namespace":"multikueue-2xbp7"}, "namespace": "multikueue-2xbp7", "name": "mpijob-mpijob1-ad514", "reconcileID": "e8c04ba9-13fc-46f2-a493-31d13f7bb91a"}
2025-01-28T16:26:39.956176093Z stderr F 2025-01-28T16:26:39.951241578Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-b394a","namespace":"multikueue-xn25c"}, "namespace": "multikueue-xn25c", "name": "jobset-job-set-b394a", "reconcileID": "816cfd4c-8b64-4aae-bf09-cd338e31a8c2"}
2025-01-28T16:27:12.856412521Z stderr F 2025-01-28T16:27:12.850590379Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-a1bae","namespace":"multikueue-4992l"}, "namespace": "multikueue-4992l", "name": "jobset-job-set-a1bae", "reconcileID": "88d4409f-dd94-49dd-9ccd-f9c539ad64af"}
2025-01-28T16:27:28.571357179Z stderr F 2025-01-28T16:27:28.570260217Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-3e053","namespace":"multikueue-br9ql"}, "namespace": "multikueue-br9ql", "name": "pytorchjob-pytorchjob1-3e053", "reconcileID": "222e790f-2be1-497a-94db-ee980a9a63ef"}
2025-01-28T16:27:38.235843918Z stderr F 2025-01-28T16:27:38.235482667Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-910e9","namespace":"multikueue-zdm7q"}, "namespace": "multikueue-zdm7q", "name": "mpijob-mpijob1-910e9", "reconcileID": "79fd1c97-0e76-4a66-932e-8e81b624a61d"}
2025-01-28T16:27:48.985795182Z stderr F 2025-01-28T16:27:48.985517807Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-adfab","namespace":"multikueue-486xk"}, "namespace": "multikueue-486xk", "name": "jobset-job-set-adfab", "reconcileID": "8d577257-d562-4ec0-bef4-fef2d38b6776"}
2025-01-28T16:27:49.188625447Z stderr F 2025-01-28T16:27:49.184741644Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-adfab","namespace":"multikueue-486xk"}, "namespace": "multikueue-486xk", "name": "jobset-job-set-adfab", "reconcileID": "0e15f6c0-f2e3-4aa6-9f03-63443195e1ac"}
2025-01-28T16:28:07.028601442Z stderr F 2025-01-28T16:28:07.02477243Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-d17f5","namespace":"multikueue-vn9q8"}, "namespace": "multikueue-vn9q8", "name": "pytorchjob-pytorchjob1-d17f5", "reconcileID": "a64dbdee-ba8c-4fdf-b8f8-cc3ae321174d"}
2025-01-28T16:28:17.114905959Z stderr F 2025-01-28T16:28:17.113736248Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-4e575","namespace":"multikueue-lp7rn"}, "namespace": "multikueue-lp7rn", "name": "mpijob-mpijob1-4e575", "reconcileID": "eceefe0d-933f-45e9-aa01-8502ed24b7b1"}
2025-01-28T16:28:30.507952371Z stderr F 2025-01-28T16:28:30.504917529Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-a602f","namespace":"multikueue-cvj6t"}, "namespace": "multikueue-cvj6t", "name": "jobset-job-set-a602f", "reconcileID": "e098b94c-766b-434e-8ca6-860a3e4077b3"}
2025-01-28T16:28:30.940345997Z stderr F 2025-01-28T16:28:30.937755114Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-a602f","namespace":"multikueue-cvj6t"}, "namespace": "multikueue-cvj6t", "name": "jobset-job-set-a602f", "reconcileID": "055ac8e5-baaf-4e3f-8637-1f1cca328ec6"}
2025-01-28T16:28:46.753658783Z stderr F 2025-01-28T16:28:46.753364324Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-e12dc","namespace":"multikueue-lbvvz"}, "namespace": "multikueue-lbvvz", "name": "pytorchjob-pytorchjob1-e12dc", "reconcileID": "039bd1c5-9479-4d69-8aa8-7f1056d3be30"}
2025-01-28T16:28:56.718848339Z stderr F 2025-01-28T16:28:56.717379752Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-99eaa","namespace":"multikueue-qwqj4"}, "namespace": "multikueue-qwqj4", "name": "mpijob-mpijob1-99eaa", "reconcileID": "9d8dfba6-38e0-45a9-b706-b36d50e90ed2"}
2025-01-28T16:28:56.877349709Z stderr F 2025-01-28T16:28:56.874943119Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-99eaa","namespace":"multikueue-qwqj4"}, "namespace": "multikueue-qwqj4", "name": "mpijob-mpijob1-99eaa", "reconcileID": "6f0e07eb-9789-41df-9b88-ff9454a39a4d"}
2025-01-28T16:29:07.403202983Z stderr F 2025-01-28T16:29:07.401623812Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-48219","namespace":"multikueue-phlcm"}, "namespace": "multikueue-phlcm", "name": "jobset-job-set-48219", "reconcileID": "908bd99e-9da2-4100-9ea6-ca460294a40b"}
2025-01-28T16:29:23.781956722Z stderr F 2025-01-28T16:29:23.778719254Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-58038","namespace":"multikueue-zmxrl"}, "namespace": "multikueue-zmxrl", "name": "pytorchjob-pytorchjob1-58038", "reconcileID": "35ede271-a522-4d01-96dd-fa12db449d02"}
2025-01-28T16:29:34.556695059Z stderr F 2025-01-28T16:29:34.553796134Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-f98f1","namespace":"multikueue-bkzrq"}, "namespace": "multikueue-bkzrq", "name": "mpijob-mpijob1-f98f1", "reconcileID": "18540875-c907-41be-a94d-0ef48a02750b"}
2025-01-28T16:29:45.045792751Z stderr F 2025-01-28T16:29:45.044233122Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-8a3b8","namespace":"multikueue-5xvpr"}, "namespace": "multikueue-5xvpr", "name": "jobset-job-set-8a3b8", "reconcileID": "d3697975-9d04-427d-b5a3-c191aa935eef"}
2025-01-28T16:30:03.715405219Z stderr F 2025-01-28T16:30:03.714692842Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-3e8b6","namespace":"multikueue-k54c2"}, "namespace": "multikueue-k54c2", "name": "pytorchjob-pytorchjob1-3e8b6", "reconcileID": "ac3be33c-950a-4b0c-9859-b116fe57c9de"}
2025-01-28T16:30:03.852664393Z stderr F 2025-01-28T16:30:03.850980971Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-3e8b6","namespace":"multikueue-k54c2"}, "namespace": "multikueue-k54c2", "name": "pytorchjob-pytorchjob1-3e8b6", "reconcileID": "1caabb5b-dc16-4654-8fdc-a12abf3d6201"}
2025-01-28T16:30:13.631906648Z stderr F 2025-01-28T16:30:13.630779478Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-6ad25","namespace":"multikueue-mn6z5"}, "namespace": "multikueue-mn6z5", "name": "mpijob-mpijob1-6ad25", "reconcileID": "3425048b-9823-467d-a21e-59142d7a554e"}
2025-01-28T16:30:25.94663933Z stderr F 2025-01-28T16:30:25.944689992Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-447fd","namespace":"multikueue-r2hpk"}, "namespace": "multikueue-r2hpk", "name": "jobset-job-set-447fd", "reconcileID": "8f91624f-8338-4a5e-b655-4749e3345439"}
2025-01-28T16:30:42.639655809Z stderr F 2025-01-28T16:30:42.639364516Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-633ee","namespace":"multikueue-tkbk4"}, "namespace": "multikueue-tkbk4", "name": "pytorchjob-pytorchjob1-633ee", "reconcileID": "b368f8aa-05ab-472c-83c2-0b8f09d24853"}
2025-01-28T16:30:53.434579516Z stderr F 2025-01-28T16:30:53.433492721Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-a9d72","namespace":"multikueue-rv92r"}, "namespace": "multikueue-rv92r", "name": "mpijob-mpijob1-a9d72", "reconcileID": "a7aaa0c9-7d16-4c6a-a531-60203e96c7e3"}
2025-01-28T16:31:06.77398414Z stderr F 2025-01-28T16:31:06.770731631Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-81653","namespace":"multikueue-m8dx6"}, "namespace": "multikueue-m8dx6", "name": "jobset-job-set-81653", "reconcileID": "eced580c-1d4c-4997-997d-fd43f24df8de"}
2025-01-28T16:31:06.934469617Z stderr F 2025-01-28T16:31:06.932420611Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-81653","namespace":"multikueue-m8dx6"}, "namespace": "multikueue-m8dx6", "name": "jobset-job-set-81653", "reconcileID": "dbd965ae-b165-48f0-ad22-897c3f4c4b79"}
2025-01-28T16:31:25.859869511Z stderr F 2025-01-28T16:31:25.858490007Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-4292d","namespace":"multikueue-qjhcp"}, "namespace": "multikueue-qjhcp", "name": "pytorchjob-pytorchjob1-4292d", "reconcileID": "42f37840-2880-4e38-a588-4e921ef12c92"}
2025-01-28T16:31:37.804577725Z stderr F 2025-01-28T16:31:37.802822554Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-9a712","namespace":"multikueue-x6ltn"}, "namespace": "multikueue-x6ltn", "name": "mpijob-mpijob1-9a712", "reconcileID": "3121708e-4af7-4203-82c7-96217eb7d03a"}
2025-01-28T16:31:37.968158164Z stderr F 2025-01-28T16:31:37.964678905Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-9a712","namespace":"multikueue-x6ltn"}, "namespace": "multikueue-x6ltn", "name": "mpijob-mpijob1-9a712", "reconcileID": "4690e810-dff0-4bce-a948-b93bb347a346"}
2025-01-28T16:31:49.985497397Z stderr F 2025-01-28T16:31:49.983222225Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-d356b","namespace":"multikueue-b2c5c"}, "namespace": "multikueue-b2c5c", "name": "jobset-job-set-d356b", "reconcileID": "21bfd6ef-5932-4fa1-99ff-c4c25b78f29f"}
2025-01-28T16:32:08.19090521Z stderr F 2025-01-28T16:32:08.18990304Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-97742","namespace":"multikueue-5wwcw"}, "namespace": "multikueue-5wwcw", "name": "pytorchjob-pytorchjob1-97742", "reconcileID": "67b247e5-3400-4dea-8599-f84768459938"}
2025-01-28T16:32:19.182931086Z stderr F 2025-01-28T16:32:19.181685781Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-001f9","namespace":"multikueue-qtqgn"}, "namespace": "multikueue-qtqgn", "name": "mpijob-mpijob1-001f9", "reconcileID": "499b2e0a-1903-414b-8b38-472a27881d59"}
2025-01-28T16:32:30.967588845Z stderr F 2025-01-28T16:32:30.95384936Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-7c2b7","namespace":"multikueue-b6cxb"}, "namespace": "multikueue-b6cxb", "name": "jobset-job-set-7c2b7", "reconcileID": "f2d155d9-da6f-4363-8964-e4fb65a46fee"}
2025-01-28T16:32:31.270070697Z stderr F 2025-01-28T16:32:31.263744712Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-7c2b7","namespace":"multikueue-b6cxb"}, "namespace": "multikueue-b6cxb", "name": "jobset-job-set-7c2b7", "reconcileID": "6af1a7c9-fe7b-4333-a558-e8c00159f2d5"}
2025-01-28T16:32:48.050437614Z stderr F 2025-01-28T16:32:48.049755128Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-a9d64","namespace":"multikueue-w5l4m"}, "namespace": "multikueue-w5l4m", "name": "pytorchjob-pytorchjob1-a9d64", "reconcileID": "d2b52e87-2fcb-4c1c-a1ba-eb78d46aee79"}
2025-01-28T16:33:01.548207669Z stderr F 2025-01-28T16:33:01.535372606Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-d3aba","namespace":"multikueue-r56hr"}, "namespace": "multikueue-r56hr", "name": "mpijob-mpijob1-d3aba", "reconcileID": "30e3d9db-3f38-48b1-8066-17aa11709522"}
2025-01-28T16:33:13.009251464Z stderr F 2025-01-28T16:33:12.998421229Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-bfa2a","namespace":"multikueue-t4rjk"}, "namespace": "multikueue-t4rjk", "name": "jobset-job-set-bfa2a", "reconcileID": "5cbd675e-3a59-402e-84f0-811deeaee2cd"}
2025-01-28T16:33:28.762752425Z stderr F 2025-01-28T16:33:28.760767881Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-ce7de","namespace":"multikueue-fwjbd"}, "namespace": "multikueue-fwjbd", "name": "pytorchjob-pytorchjob1-ce7de", "reconcileID": "b6fd0f6e-30ef-4693-a5f0-3c6827ed3209"}
2025-01-28T16:33:41.458737344Z stderr F 2025-01-28T16:33:41.458122314Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-06cc1","namespace":"multikueue-g4v6v"}, "namespace": "multikueue-g4v6v", "name": "mpijob-mpijob1-06cc1", "reconcileID": "60f67ac7-780b-4074-b353-1b669af01848"}
2025-01-28T16:33:54.623901768Z stderr F 2025-01-28T16:33:54.607907366Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-66a16","namespace":"multikueue-zwc75"}, "namespace": "multikueue-zwc75", "name": "jobset-job-set-66a16", "reconcileID": "ba4423d7-86eb-488e-9550-91e6ac81f9d6"}
2025-01-28T16:33:54.824963703Z stderr F 2025-01-28T16:33:54.821542102Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-66a16","namespace":"multikueue-zwc75"}, "namespace": "multikueue-zwc75", "name": "jobset-job-set-66a16", "reconcileID": "1b335396-6147-4fe7-bf06-dd43bb9b1dfa"}
2025-01-28T16:34:11.347837006Z stderr F 2025-01-28T16:34:11.34554284Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-494a4","namespace":"multikueue-sc2qp"}, "namespace": "multikueue-sc2qp", "name": "pytorchjob-pytorchjob1-494a4", "reconcileID": "cc05b3a7-b2a6-48f0-a30d-554591147782"}
2025-01-28T16:34:22.519437981Z stderr F 2025-01-28T16:34:22.515583135Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-0f1ff","namespace":"multikueue-cd4mx"}, "namespace": "multikueue-cd4mx", "name": "mpijob-mpijob1-0f1ff", "reconcileID": "5a94d96d-f95a-4c60-9826-debd4f67eb7c"}
2025-01-28T16:34:36.665302723Z stderr F 2025-01-28T16:34:36.663954081Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-aef78","namespace":"multikueue-mht6s"}, "namespace": "multikueue-mht6s", "name": "jobset-job-set-aef78", "reconcileID": "e693e61f-ddc3-47dc-bd00-faadecd30dea"}
2025-01-28T16:34:53.347486422Z stderr F 2025-01-28T16:34:53.343378285Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-e9d5a","namespace":"multikueue-ffkc6"}, "namespace": "multikueue-ffkc6", "name": "pytorchjob-pytorchjob1-e9d5a", "reconcileID": "c3dc5713-e724-45bd-86c8-8851912396a8"}
2025-01-28T16:35:04.35992724Z stderr F 2025-01-28T16:35:04.357945357Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-610ac","namespace":"multikueue-ttxlf"}, "namespace": "multikueue-ttxlf", "name": "mpijob-mpijob1-610ac", "reconcileID": "b571c5c4-4d3f-4187-858e-4d652938fd5e"}
2025-01-28T16:35:15.593632648Z stderr F 2025-01-28T16:35:15.591829136Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-5ed00","namespace":"multikueue-kwzkl"}, "namespace": "multikueue-kwzkl", "name": "jobset-job-set-5ed00", "reconcileID": "b91552d5-c87a-40a4-af30-7a1ee25d57f9"}
2025-01-28T16:35:15.82109924Z stderr F 2025-01-28T16:35:15.81979647Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-5ed00","namespace":"multikueue-kwzkl"}, "namespace": "multikueue-kwzkl", "name": "jobset-job-set-5ed00", "reconcileID": "2634ad53-98ae-4c17-abc6-d61975ea47b8"}
2025-01-28T16:35:34.671233121Z stderr F 2025-01-28T16:35:34.667948466Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-692fc","namespace":"multikueue-k5dgz"}, "namespace": "multikueue-k5dgz", "name": "pytorchjob-pytorchjob1-692fc", "reconcileID": "8773108e-b718-4d5c-81b9-12258a66d526"}
2025-01-28T16:35:47.012809578Z stderr F 2025-01-28T16:35:47.01081761Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-81903","namespace":"multikueue-ljqft"}, "namespace": "multikueue-ljqft", "name": "mpijob-mpijob1-81903", "reconcileID": "432f144d-6e86-4d49-bf6b-9ac202732e51"}
2025-01-28T16:36:02.072963288Z stderr F 2025-01-28T16:36:02.071683391Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-e78f3","namespace":"multikueue-tskqx"}, "namespace": "multikueue-tskqx", "name": "jobset-job-set-e78f3", "reconcileID": "26324f9c-4eb4-4e90-bb41-e562d61928c5"}
2025-01-28T16:37:06.943717378Z stderr F 2025-01-28T16:37:06.919614334Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-953d6","namespace":"multikueue-vw6ct"}, "namespace": "multikueue-vw6ct", "name": "jobset-job-set-953d6", "reconcileID": "3e7ba18c-a6a2-4be7-a749-08b76a6a0b48"}
2025-01-28T16:37:22.846200477Z stderr F 2025-01-28T16:37:22.845610943Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-e2390","namespace":"multikueue-4qw5w"}, "namespace": "multikueue-4qw5w", "name": "pytorchjob-pytorchjob1-e2390", "reconcileID": "fc4ebdc4-feee-4456-8287-def936b7c628"}
2025-01-28T16:37:33.055857658Z stderr F 2025-01-28T16:37:33.028042773Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-67754","namespace":"multikueue-m6whl"}, "namespace": "multikueue-m6whl", "name": "mpijob-mpijob1-67754", "reconcileID": "f14396e2-92db-4f70-aef2-dc2417247a51"}
2025-01-28T16:37:44.789371527Z stderr F 2025-01-28T16:37:44.786845892Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-595a4","namespace":"multikueue-9n2q6"}, "namespace": "multikueue-9n2q6", "name": "jobset-job-set-595a4", "reconcileID": "07a897c6-d8c8-4b44-b79a-d0a27c11e974"}
2025-01-28T16:38:00.756171925Z stderr F 2025-01-28T16:38:00.755221395Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-27ab1","namespace":"multikueue-k7ksl"}, "namespace": "multikueue-k7ksl", "name": "pytorchjob-pytorchjob1-27ab1", "reconcileID": "8632081e-d5e1-4a2e-b135-2d3bd4a26719"}
2025-01-28T16:38:00.996268583Z stderr F 2025-01-28T16:38:00.993563076Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-27ab1","namespace":"multikueue-k7ksl"}, "namespace": "multikueue-k7ksl", "name": "pytorchjob-pytorchjob1-27ab1", "reconcileID": "68e88f23-f974-43c5-b24b-910fd6873fc9"}
2025-01-28T16:38:13.581881008Z stderr F 2025-01-28T16:38:13.579797826Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-8b433","namespace":"multikueue-lxq6m"}, "namespace": "multikueue-lxq6m", "name": "mpijob-mpijob1-8b433", "reconcileID": "90ca61a1-10d1-4ab6-ad38-9e3d2838fb23"}
2025-01-28T16:38:24.265247175Z stderr F 2025-01-28T16:38:24.264694348Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-27ef0","namespace":"multikueue-znb5p"}, "namespace": "multikueue-znb5p", "name": "jobset-job-set-27ef0", "reconcileID": "a20aec3c-b01f-4abf-aedc-ea48b8ad34d3"}
2025-01-28T16:38:40.745849353Z stderr F 2025-01-28T16:38:40.745423524Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-4f1f8","namespace":"multikueue-249d7"}, "namespace": "multikueue-249d7", "name": "pytorchjob-pytorchjob1-4f1f8", "reconcileID": "30e0ae50-1c53-49e5-8f4e-8e86eb09dca6"}
2025-01-28T16:38:50.324187407Z stderr F 2025-01-28T16:38:50.318660346Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-f0ccb","namespace":"multikueue-rmc4w"}, "namespace": "multikueue-rmc4w", "name": "mpijob-mpijob1-f0ccb", "reconcileID": "621c929c-ce7a-4427-b0e6-9f73026e731e"}
2025-01-28T16:39:00.334172609Z stderr F 2025-01-28T16:39:00.332698917Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-720b6","namespace":"multikueue-4tqgv"}, "namespace": "multikueue-4tqgv", "name": "jobset-job-set-720b6", "reconcileID": "01f4ccb5-713c-4846-adb4-f82cc99b51a0"}
2025-01-28T16:39:16.440105152Z stderr F 2025-01-28T16:39:16.438308755Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-6b50d","namespace":"multikueue-dqdsf"}, "namespace": "multikueue-dqdsf", "name": "pytorchjob-pytorchjob1-6b50d", "reconcileID": "57bc058c-a96a-4013-8d68-c7db8b276921"}
2025-01-28T16:39:25.937427327Z stderr F 2025-01-28T16:39:25.934611483Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-5c236","namespace":"multikueue-sdxvs"}, "namespace": "multikueue-sdxvs", "name": "mpijob-mpijob1-5c236", "reconcileID": "d89c3abf-1b9f-484a-a79e-3ebb20e7cfc2"}
2025-01-28T16:39:39.309824969Z stderr F 2025-01-28T16:39:39.308541483Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-38d8f","namespace":"multikueue-sqfgl"}, "namespace": "multikueue-sqfgl", "name": "jobset-job-set-38d8f", "reconcileID": "dbd63079-747c-4cfd-a0d7-0433941a3609"}
2025-01-28T16:39:56.359419303Z stderr F 2025-01-28T16:39:56.358660019Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-c48db","namespace":"multikueue-9wrqv"}, "namespace": "multikueue-9wrqv", "name": "pytorchjob-pytorchjob1-c48db", "reconcileID": "088dca9d-b54d-4f36-93a4-ddb8e580246e"}
2025-01-28T16:40:06.658437502Z stderr F 2025-01-28T16:40:06.654634874Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-a0277","namespace":"multikueue-dmsvg"}, "namespace": "multikueue-dmsvg", "name": "mpijob-mpijob1-a0277", "reconcileID": "19645288-2eeb-4b0e-b4d7-debe4f85ba59"}
2025-01-28T16:40:16.914524833Z stderr F 2025-01-28T16:40:16.912445853Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-2eb89","namespace":"multikueue-flk4q"}, "namespace": "multikueue-flk4q", "name": "jobset-job-set-2eb89", "reconcileID": "76cf9746-947e-46ef-be49-3e6b326c0cf3"}
2025-01-28T16:40:17.075850968Z stderr F 2025-01-28T16:40:17.071939632Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-2eb89","namespace":"multikueue-flk4q"}, "namespace": "multikueue-flk4q", "name": "jobset-job-set-2eb89", "reconcileID": "edd460b2-6656-46c8-89e4-a58aa6501b5d"}
2025-01-28T16:40:32.674897038Z stderr F 2025-01-28T16:40:32.673783299Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-63950","namespace":"multikueue-pmgwb"}, "namespace": "multikueue-pmgwb", "name": "pytorchjob-pytorchjob1-63950", "reconcileID": "cc0ea847-7b13-41af-b6a5-ba5e940cb052"}
2025-01-28T16:40:42.906270803Z stderr F 2025-01-28T16:40:42.905483519Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-ef3f1","namespace":"multikueue-cwwv7"}, "namespace": "multikueue-cwwv7", "name": "mpijob-mpijob1-ef3f1", "reconcileID": "89495c5e-cf4a-465f-ac17-10bcb2c4419d"}
2025-01-28T16:40:53.405081482Z stderr F 2025-01-28T16:40:53.40445653Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-c1d92","namespace":"multikueue-b95nh"}, "namespace": "multikueue-b95nh", "name": "jobset-job-set-c1d92", "reconcileID": "2afc4435-fc3c-4a44-9455-269832842154"}
2025-01-28T16:41:10.386908975Z stderr F 2025-01-28T16:41:10.384920868Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-ed05e","namespace":"multikueue-ks6s9"}, "namespace": "multikueue-ks6s9", "name": "pytorchjob-pytorchjob1-ed05e", "reconcileID": "62a12572-748b-4bb2-85ae-e902c54ec6a2"}
2025-01-28T16:41:19.937698156Z stderr F 2025-01-28T16:41:19.93664879Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-a4ea0","namespace":"multikueue-wxp52"}, "namespace": "multikueue-wxp52", "name": "mpijob-mpijob1-a4ea0", "reconcileID": "606095fb-4fca-4986-9cb8-b77163e3377a"}
2025-01-28T16:48:16.441588737Z stderr F 2025-01-28T16:48:16.437886172Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-9b297","namespace":"multikueue-4spbk"}, "namespace": "multikueue-4spbk", "name": "jobset-job-set-9b297", "reconcileID": "b12dc096-73ab-4802-82e5-de724b446919"}
2025-01-28T16:48:32.056181336Z stderr F 2025-01-28T16:48:32.053663015Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-16201","namespace":"multikueue-p229r"}, "namespace": "multikueue-p229r", "name": "pytorchjob-pytorchjob1-16201", "reconcileID": "8fab9e77-7669-426c-a9c8-dd1e1895979b"}
2025-01-28T16:48:41.91697443Z stderr F 2025-01-28T16:48:41.914907273Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-ed6d4","namespace":"multikueue-8slfk"}, "namespace": "multikueue-8slfk", "name": "mpijob-mpijob1-ed6d4", "reconcileID": "181ff23b-c0cf-4c9c-a41b-d090cde9daa4"}
2025-01-28T16:48:52.506233798Z stderr F 2025-01-28T16:48:52.505356719Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-f2ca3","namespace":"multikueue-7ml95"}, "namespace": "multikueue-7ml95", "name": "jobset-job-set-f2ca3", "reconcileID": "9747a2ba-78f1-4198-91a7-3f899d16e749"}
2025-01-28T16:49:08.979358515Z stderr F 2025-01-28T16:49:08.978657768Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-7ecb0","namespace":"multikueue-kfcq4"}, "namespace": "multikueue-kfcq4", "name": "pytorchjob-pytorchjob1-7ecb0", "reconcileID": "5b24b4bf-0232-4147-8f46-fda2cc365f0b"}
2025-01-28T16:49:47.422795718Z stderr F 2025-01-28T16:49:47.421681348Z	LEVEL(-2)	jobset/jobset_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"jobset-job-set-0e8a0","namespace":"multikueue-pg54p"}, "namespace": "multikueue-pg54p", "name": "jobset-job-set-0e8a0", "reconcileID": "4656f112-e93c-49ec-98a6-852b70c8ce2e"}
2025-01-28T16:50:03.160455124Z stderr F 2025-01-28T16:50:03.15943517Z	LEVEL(-2)	kubeflowjob/kubeflowjob_multikueue_adapter.go:107	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"pytorchjob-pytorchjob1-9d66d","namespace":"multikueue-kkhts"}, "namespace": "multikueue-kkhts", "name": "pytorchjob-pytorchjob1-9d66d", "reconcileID": "2486ccbf-757e-45a0-9423-11fbf2c72f2c"}
2025-01-28T16:50:11.990537552Z stderr F 2025-01-28T16:50:11.988962225Z	LEVEL(-2)	mpijob/mpijob_multikueue_adapter.go:63	Skipping the sync since the local job is still suspended	{"controller": "multikueue-workload", "controllerGroup": "kueue.x-k8s.io", "controllerKind": "Workload", "Workload": {"name":"mpijob-mpijob1-2bb02","namespace":"multikueue-8dp8k"}, "namespace": "multikueue-8dp8k", "name": "mpijob-mpijob1-2bb02", "reconcileID": "c1d9173c-b425-4398-835a-a03a0ea1a2ee"}

@mimowo
Copy link
Contributor

mimowo commented Jan 29, 2025

I see, so without this PR PyTorchJob status would be updated while suspended but apparently it does not violate any validation in PyTorch. I'm on a fence about cherry-picking this, but I would say this is still a minor bug we may want to cherry pick.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: IrvingMg, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 29, 2025
@mimowo
Copy link
Contributor

mimowo commented Jan 29, 2025

/king bug
/cherry-pick release-0.10 release-0.9

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.10 in a new PR and assign it to you.

In response to this:

/king bug
/cherry-pick release-0.10 release-0.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot merged commit 83e680d into kubernetes-sigs:main Jan 29, 2025
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.11 milestone Jan 29, 2025
@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: new pull request created: #4084

In response to this:

/king bug
/cherry-pick release-0.10 release-0.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@IrvingMg IrvingMg deleted the cleanup/not-update-job-status-while-it-is-suspended branch January 29, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MultiKueue] Do not update Job status while the Job is suspended (for CRDs)
5 participants