Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

225: Add support for exclusive plans #260

Merged
merged 4 commits into from
Sep 29, 2023
Merged

225: Add support for exclusive plans #260

merged 4 commits into from
Sep 29, 2023

Conversation

jrodonnell
Copy link
Contributor

@jrodonnell jrodonnell commented Aug 29, 2023

Add new field to Plan spec, Exclusive. This field will be useful when the order of execution for a series of Jobs is important, as each exclusive Job will have to complete before the next one is scheduled.

Setting the field to true will change the default behavior of how Job Pods are scheduled: it will create a PodAntiAffinity rule which will prevent more than one exclusive labeled SUC Job Pod from being scheduled to a node at a time, and will add this label to the Job Pod.

If omitted or explicitly set to false, Job Pods will be scheduled according to current default behavior.

Implements request in #225.

@dweomer

… change the default behavior of how Job Pods are scheduled: it

will create a different PodAntiAffinity rule which will prevent more than one SUC Pod from running on a node at a time. If omitted
or explicitly set to false, Job Pods will be scheduled according to current default behavior.

Added a helper function in job.go to evalute the new bool and return the correct PodAntiAffinity rule.
Copy link
Contributor

@dweomer dweomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me but please consider my comments: I do not think the onePlanPerNode function is necessary and it obscures the retention of the existing default behavior and that the added bool on the Plan is opt-in.

plan.Name,
},
}},
MatchExpressions: onePlanPerNode(plan, node),
Copy link
Contributor

@dweomer dweomer Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this before encouraging @jrodonnell to submit his PR but this bit I didn't spend enough time thinking about it (something about PR submissions forces my brain to actually think, or something): I'm not a fan of the function and do not think it necessary besides.

If we leave the default initialization of the single match-expression as-is, so as to indicate such is the ... default value/behavior we can then combine this with a conditional check further down (say, after where we conditionally initialize job.spec.Parallelism) to override the key and values for the match expression when indicated by the new opt-in bool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't like replacing the entire expression here with a call to onePlanPerNode, as the function is responsible for generating the LabelSelectorRequirement regardless of whether or not the new option is in use. Can we leave this as-is and either mutate or add to the []metav1.LabelSelectorRequirement later, if the option is set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that makes more sense, as it is only changing the default if the flag is set. I will try to switch this out today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed this change but left the flag as-is just for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New flag name added into new mutation logic, new label added as well. Tested and working in my own cluster, exclusive Pods will wait for each other (but not for non-exclusive Pods) on a per-node basis and non-exclusive ones will still schedule themselves normally.

@@ -359,3 +353,30 @@ func New(plan *upgradeapiv1.Plan, node *corev1.Node, controllerName string) *bat

return job
}

func onePlanPerNode(plan *upgradeapiv1.Plan, node *corev1.Node) []metav1.LabelSelectorRequirement {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in-place where this function is used: I don't think it is necessary.

@dweomer
Copy link
Contributor

dweomer commented Aug 29, 2023

/cc @brandond @briandowns

@brandond
Copy link
Member

brandond commented Aug 29, 2023

prevent more than one SUC Pod from running on a node at a time

I'm confused as to why this is necessary; isn't the documented way to do this via something like the k3s-upgrade "prepare" step, where each subsequent plan knows the name of the previous plan, and waits for it to complete before moving into the upgrade phase?

This behavior will be useful when the order of execution for a series of Jobs is important, as the Job will have to complete before the next one is scheduled.

If the desire is to bring this closer to full workflow engine with strict sequencing and inter-job dependency tracking built into the controller, instead of leaning on the images to figure it out for themselves, just slapping a flag on the Plan that prevents it from running alongside any other Plan doesn't seem like it gets us there?

@dweomer
Copy link
Contributor

dweomer commented Aug 29, 2023

The gist of this work is a relatively simple tweak to the plan to satisfy the ask in #225 which seeks to prevent multiple plans from applying on a node at the same time (as opposed to the default anti-affinity that only prevents multiple applies of the same plan per node).

While this does allow for bad-actor plans to monopolize nodes it is opt-in behavior at the plan level to declaratively connote that concurrent mutations of the node should not be allowed.

Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple nits on the field name and expression generation logic.

Just to be clear, the intent is for this option to only force exclusivity/serialization among plans with the option set. Only a single "exclusive" or "serialized" plan may run at a time, alongside zero or more "non-exclusive" plans.

plan.Name,
},
}},
MatchExpressions: onePlanPerNode(plan, node),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't like replacing the entire expression here with a call to onePlanPerNode, as the function is responsible for generating the LabelSelectorRequirement regardless of whether or not the new option is in use. Can we leave this as-is and either mutate or add to the []metav1.LabelSelectorRequirement later, if the option is set?

@jrodonnell
Copy link
Contributor Author

Just to be clear, the intent is for this option to only force exclusivity/serialization among plans with the option set. Only a single "exclusive" or "serialized" plan may run at a time, alongside zero or more "non-exclusive" plans.

I believe the intent was to make an exclusive Plan block any other Plan from running, regardless of any other Plan's settings, for a few reasons. I see this exclusivity as an important statement of fact and intent, that running any other Plans while this exclusive one is running is a bad idea because it will be doing things to the node that could break other Jobs in the middle of execution and thus leave the node in an unknown state, and allowing that declaration to be effectively overridden or ignored by other Plans would seem to defeat that purpose. Under those same assumptions, requiring all Plans to have the flag in order to comply with one Plan’s exclusivity requirement would be counterintuitive to me and would also force all Plans into serial execution even if only one of them actually needed to be exclusive (or force the operator into a two-step process: apply exclusive Plans, wait to finish, then apply normal ones).

Whereas if the exclusive Plans just block all other Plans while they are running, then when they are done the rest will just automatically execute according to the current default scheduling. To me that seems like the most intuitive behavior that operators can easily understand and leverage in practice.

@brandond
Copy link
Member

brandond commented Aug 30, 2023

I believe the intent was to make an exclusive Plan block any other Plan from running, regardless of any other Plan's settings.

Can we make this configurable as part of controller settings? I feel like this is a major change in the Plan contract, that administrators should have to opt into. I think that by default exclusive plans should only block other exclusive plans. If administrators want to allow an exclusive plan to block all other plans, they can enable that in the controller config - with an awareness that this may change the behavior of other plans already deployed to the cluster.

@jrodonnell
Copy link
Contributor Author

I feel like this is a major change in the Plan contract

In that case, maybe it does make more sense to think of making Plans serial rather than exclusive. The operator could still treat them as exclusive by letting the serial ones finish before submitting more non-serial Plans. The anti-affinity rule would have to be based off a different one than the one here, probably a new unique label (perhaps upgrade.cattle.io/serial: true) but I can take a crack at that too.

@davidcassany as the one who opened the original issue I am also curious to hear what you think of this discussion, do you think this new flag should force exclusivity of the Plan with the flag and prevent any other Plan from running or enforce serial scheduling of all Plans with the flag and ones without continue being scheduled as normal now?

@brandond
Copy link
Member

brandond commented Aug 31, 2023

The operator could still treat them as exclusive by letting the serial ones finish before submitting more non-serial Plans.

I also want to keep in mind that lots of folks don't treat Plans as a one-shot operations and delete them afterwards. There may be a standing plan to upgrade to the latest release in a channel, for example. In that case the Plan will finish, but execute again automatically when the channel is updated.

@jrodonnell
Copy link
Contributor Author

I also want to keep in mind that lots of folks don't treat Plans as a one-shot operations and delete them afterwards. There may be a standing plan to upgrade to the latest release in a channel, for example. In that case the Plan will finish, but execute again automatically when the channel is updated.

That is some good additional context to have, I did not realize that was a common use case. I think it sheds some more light on the original request, which was to provide a way to prevent concurrent Plans from running on a cluster. But long-running or not, I think it's reasonable to expect an operator to tag their Plan appropriately if it is potentially disruptive/sensitive to disruption or not, so I think I agree with what you're saying.

I think this would work: give every Plan Pod a new label, e.g. upgrade.cattle.io/exclusive: false | true and if false the current default anti-affinity rule is used, effectively ignoring this new label, and if true then the anti-affinity rule will be based on this new label and any new exclusive Pods will wait their turn.

@brandond brandond changed the title 225: Prevent concurrent plans on nodes 225: Add support for exclusive plans Sep 29, 2023
@brandond brandond merged commit 5014b4b into rancher:master Sep 29, 2023
coolguy1771 referenced this pull request in coolguy1771/home-ops Nov 29, 2023
…tch) (#3717)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[docker.io/rancher/system-upgrade-controller](https://togithub.com/rancher/system-upgrade-controller)
| patch | `v0.13.1` -> `v0.13.2` |
|
[rancher/system-upgrade-controller](https://togithub.com/rancher/system-upgrade-controller)
| patch | `v0.13.1` -> `v0.13.2` |

---

### Release Notes

<details>
<summary>rancher/system-upgrade-controller
(docker.io/rancher/system-upgrade-controller)</summary>

###
[`v0.13.2`](https://togithub.com/rancher/system-upgrade-controller/releases/tag/v0.13.2)

[Compare
Source](https://togithub.com/rancher/system-upgrade-controller/compare/v0.13.1...v0.13.2)

##### What's Changed

- feat: allow plan to ignore secret updates by
[@&#8203;buroa](https://togithub.com/buroa) in
[https://github.com/rancher/system-upgrade-controller/pull/263](https://togithub.com/rancher/system-upgrade-controller/pull/263)
- 225: Add support for exclusive plans by
[@&#8203;jrodonnell](https://togithub.com/jrodonnell) in
[https://github.com/rancher/system-upgrade-controller/pull/260](https://togithub.com/rancher/system-upgrade-controller/pull/260)
- Fix: upgrade go in go.mod and bci image by
[@&#8203;matttrach](https://togithub.com/matttrach) in
[https://github.com/rancher/system-upgrade-controller/pull/268](https://togithub.com/rancher/system-upgrade-controller/pull/268)
- Use node name for job name instead of host name by
[@&#8203;brandond](https://togithub.com/brandond) in
[https://github.com/rancher/system-upgrade-controller/pull/274](https://togithub.com/rancher/system-upgrade-controller/pull/274)
- Adding the ability to define a secuirty context and SELinux options by
[@&#8203;Auston-Ivison-Suse](https://togithub.com/Auston-Ivison-Suse) in
[https://github.com/rancher/system-upgrade-controller/pull/257](https://togithub.com/rancher/system-upgrade-controller/pull/257)
- Adding image pull secrets by
[@&#8203;Dr-N00B](https://togithub.com/Dr-N00B) in
[https://github.com/rancher/system-upgrade-controller/pull/272](https://togithub.com/rancher/system-upgrade-controller/pull/272)

##### New Contributors

- [@&#8203;buroa](https://togithub.com/buroa) made their first
contribution in
[https://github.com/rancher/system-upgrade-controller/pull/263](https://togithub.com/rancher/system-upgrade-controller/pull/263)
- [@&#8203;jrodonnell](https://togithub.com/jrodonnell) made their first
contribution in
[https://github.com/rancher/system-upgrade-controller/pull/260](https://togithub.com/rancher/system-upgrade-controller/pull/260)
- [@&#8203;matttrach](https://togithub.com/matttrach) made their first
contribution in
[https://github.com/rancher/system-upgrade-controller/pull/268](https://togithub.com/rancher/system-upgrade-controller/pull/268)
- [@&#8203;Auston-Ivison-Suse](https://togithub.com/Auston-Ivison-Suse)
made their first contribution in
[https://github.com/rancher/system-upgrade-controller/pull/257](https://togithub.com/rancher/system-upgrade-controller/pull/257)
- [@&#8203;Dr-N00B](https://togithub.com/Dr-N00B) made their first
contribution in
[https://github.com/rancher/system-upgrade-controller/pull/272](https://togithub.com/rancher/system-upgrade-controller/pull/272)

**Full Changelog**:
rancher/system-upgrade-controller@v0.13.1...v0.13.2

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy43NC4zIiwidXBkYXRlZEluVmVyIjoiMzcuNzQuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==-->

Co-authored-by: lumiere-bot[bot] <98047013+lumiere-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants