-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
225: Add support for exclusive plans #260
Conversation
… change the default behavior of how Job Pods are scheduled: it will create a different PodAntiAffinity rule which will prevent more than one SUC Pod from running on a node at a time. If omitted or explicitly set to false, Job Pods will be scheduled according to current default behavior. Added a helper function in job.go to evalute the new bool and return the correct PodAntiAffinity rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me but please consider my comments: I do not think the onePlanPerNode
function is necessary and it obscures the retention of the existing default behavior and that the added bool on the Plan is opt-in.
pkg/upgrade/job/job.go
Outdated
plan.Name, | ||
}, | ||
}}, | ||
MatchExpressions: onePlanPerNode(plan, node), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this before encouraging @jrodonnell to submit his PR but this bit I didn't spend enough time thinking about it (something about PR submissions forces my brain to actually think, or something): I'm not a fan of the function and do not think it necessary besides.
If we leave the default initialization of the single match-expression as-is, so as to indicate such is the ... default value/behavior we can then combine this with a conditional check further down (say, after where we conditionally initialize job.spec.Parallelism
) to override the key and values for the match expression when indicated by the new opt-in bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't like replacing the entire expression here with a call to onePlanPerNode
, as the function is responsible for generating the LabelSelectorRequirement regardless of whether or not the new option is in use. Can we leave this as-is and either mutate or add to the []metav1.LabelSelectorRequirement
later, if the option is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that makes more sense, as it is only changing the default if the flag is set. I will try to switch this out today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed this change but left the flag as-is just for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New flag name added into new mutation logic, new label added as well. Tested and working in my own cluster, exclusive Pods will wait for each other (but not for non-exclusive Pods) on a per-node basis and non-exclusive
ones will still schedule themselves normally.
pkg/upgrade/job/job.go
Outdated
@@ -359,3 +353,30 @@ func New(plan *upgradeapiv1.Plan, node *corev1.Node, controllerName string) *bat | |||
|
|||
return job | |||
} | |||
|
|||
func onePlanPerNode(plan *upgradeapiv1.Plan, node *corev1.Node) []metav1.LabelSelectorRequirement { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment in-place where this function is used: I don't think it is necessary.
/cc @brandond @briandowns |
I'm confused as to why this is necessary; isn't the documented way to do this via something like the k3s-upgrade "prepare" step, where each subsequent plan knows the name of the previous plan, and waits for it to complete before moving into the upgrade phase?
If the desire is to bring this closer to full workflow engine with strict sequencing and inter-job dependency tracking built into the controller, instead of leaning on the images to figure it out for themselves, just slapping a flag on the Plan that prevents it from running alongside any other Plan doesn't seem like it gets us there? |
The gist of this work is a relatively simple tweak to the plan to satisfy the ask in #225 which seeks to prevent multiple plans from applying on a node at the same time (as opposed to the default anti-affinity that only prevents multiple applies of the same plan per node). While this does allow for bad-actor plans to monopolize nodes it is opt-in behavior at the plan level to declaratively connote that concurrent mutations of the node should not be allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple nits on the field name and expression generation logic.
Just to be clear, the intent is for this option to only force exclusivity/serialization among plans with the option set. Only a single "exclusive" or "serialized" plan may run at a time, alongside zero or more "non-exclusive" plans.
pkg/upgrade/job/job.go
Outdated
plan.Name, | ||
}, | ||
}}, | ||
MatchExpressions: onePlanPerNode(plan, node), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't like replacing the entire expression here with a call to onePlanPerNode
, as the function is responsible for generating the LabelSelectorRequirement regardless of whether or not the new option is in use. Can we leave this as-is and either mutate or add to the []metav1.LabelSelectorRequirement
later, if the option is set?
I believe the intent was to make an exclusive Plan block any other Plan from running, regardless of any other Plan's settings, for a few reasons. I see this exclusivity as an important statement of fact and intent, that running any other Plans while this exclusive one is running is a bad idea because it will be doing things to the node that could break other Jobs in the middle of execution and thus leave the node in an unknown state, and allowing that declaration to be effectively overridden or ignored by other Plans would seem to defeat that purpose. Under those same assumptions, requiring all Plans to have the flag in order to comply with one Plan’s exclusivity requirement would be counterintuitive to me and would also force all Plans into serial execution even if only one of them actually needed to be exclusive (or force the operator into a two-step process: apply exclusive Plans, wait to finish, then apply normal ones). Whereas if the exclusive Plans just block all other Plans while they are running, then when they are done the rest will just automatically execute according to the current default scheduling. To me that seems like the most intuitive behavior that operators can easily understand and leverage in practice. |
Can we make this configurable as part of controller settings? I feel like this is a major change in the Plan contract, that administrators should have to opt into. I think that by default exclusive plans should only block other exclusive plans. If administrators want to allow an exclusive plan to block all other plans, they can enable that in the controller config - with an awareness that this may change the behavior of other plans already deployed to the cluster. |
In that case, maybe it does make more sense to think of making Plans serial rather than exclusive. The operator could still treat them as exclusive by letting the serial ones finish before submitting more non-serial Plans. The anti-affinity rule would have to be based off a different one than the one here, probably a new unique label (perhaps @davidcassany as the one who opened the original issue I am also curious to hear what you think of this discussion, do you think this new flag should force exclusivity of the Plan with the flag and prevent any other Plan from running or enforce serial scheduling of all Plans with the flag and ones without continue being scheduled as normal now? |
I also want to keep in mind that lots of folks don't treat Plans as a one-shot operations and delete them afterwards. There may be a standing plan to upgrade to the latest release in a channel, for example. In that case the Plan will finish, but execute again automatically when the channel is updated. |
… simple if statement.
That is some good additional context to have, I did not realize that was a common use case. I think it sheds some more light on the original request, which was to provide a way to prevent concurrent Plans from running on a cluster. But long-running or not, I think it's reasonable to expect an operator to tag their Plan appropriately if it is potentially disruptive/sensitive to disruption or not, so I think I agree with what you're saying. I think this would work: give every Plan Pod a new label, e.g. |
… anti-affinity rule on it.
Co-authored-by: Brad Davidson <[email protected]>
…tch) (#3717) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [docker.io/rancher/system-upgrade-controller](https://togithub.com/rancher/system-upgrade-controller) | patch | `v0.13.1` -> `v0.13.2` | | [rancher/system-upgrade-controller](https://togithub.com/rancher/system-upgrade-controller) | patch | `v0.13.1` -> `v0.13.2` | --- ### Release Notes <details> <summary>rancher/system-upgrade-controller (docker.io/rancher/system-upgrade-controller)</summary> ### [`v0.13.2`](https://togithub.com/rancher/system-upgrade-controller/releases/tag/v0.13.2) [Compare Source](https://togithub.com/rancher/system-upgrade-controller/compare/v0.13.1...v0.13.2) ##### What's Changed - feat: allow plan to ignore secret updates by [@​buroa](https://togithub.com/buroa) in [https://github.com/rancher/system-upgrade-controller/pull/263](https://togithub.com/rancher/system-upgrade-controller/pull/263) - 225: Add support for exclusive plans by [@​jrodonnell](https://togithub.com/jrodonnell) in [https://github.com/rancher/system-upgrade-controller/pull/260](https://togithub.com/rancher/system-upgrade-controller/pull/260) - Fix: upgrade go in go.mod and bci image by [@​matttrach](https://togithub.com/matttrach) in [https://github.com/rancher/system-upgrade-controller/pull/268](https://togithub.com/rancher/system-upgrade-controller/pull/268) - Use node name for job name instead of host name by [@​brandond](https://togithub.com/brandond) in [https://github.com/rancher/system-upgrade-controller/pull/274](https://togithub.com/rancher/system-upgrade-controller/pull/274) - Adding the ability to define a secuirty context and SELinux options by [@​Auston-Ivison-Suse](https://togithub.com/Auston-Ivison-Suse) in [https://github.com/rancher/system-upgrade-controller/pull/257](https://togithub.com/rancher/system-upgrade-controller/pull/257) - Adding image pull secrets by [@​Dr-N00B](https://togithub.com/Dr-N00B) in [https://github.com/rancher/system-upgrade-controller/pull/272](https://togithub.com/rancher/system-upgrade-controller/pull/272) ##### New Contributors - [@​buroa](https://togithub.com/buroa) made their first contribution in [https://github.com/rancher/system-upgrade-controller/pull/263](https://togithub.com/rancher/system-upgrade-controller/pull/263) - [@​jrodonnell](https://togithub.com/jrodonnell) made their first contribution in [https://github.com/rancher/system-upgrade-controller/pull/260](https://togithub.com/rancher/system-upgrade-controller/pull/260) - [@​matttrach](https://togithub.com/matttrach) made their first contribution in [https://github.com/rancher/system-upgrade-controller/pull/268](https://togithub.com/rancher/system-upgrade-controller/pull/268) - [@​Auston-Ivison-Suse](https://togithub.com/Auston-Ivison-Suse) made their first contribution in [https://github.com/rancher/system-upgrade-controller/pull/257](https://togithub.com/rancher/system-upgrade-controller/pull/257) - [@​Dr-N00B](https://togithub.com/Dr-N00B) made their first contribution in [https://github.com/rancher/system-upgrade-controller/pull/272](https://togithub.com/rancher/system-upgrade-controller/pull/272) **Full Changelog**: rancher/system-upgrade-controller@v0.13.1...v0.13.2 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy43NC4zIiwidXBkYXRlZEluVmVyIjoiMzcuNzQuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: lumiere-bot[bot] <98047013+lumiere-bot[bot]@users.noreply.github.com>
Add new field to Plan spec, Exclusive. This field will be useful when the order of execution for a series of Jobs is important, as each exclusive Job will have to complete before the next one is scheduled.
Setting the field to true will change the default behavior of how Job Pods are scheduled: it will create a PodAntiAffinity rule which will prevent more than one
exclusive
labeled SUC Job Pod from being scheduled to a node at a time, and will add this label to the Job Pod.If omitted or explicitly set to false, Job Pods will be scheduled according to current default behavior.
Implements request in #225.
@dweomer