Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command Pattern #1231

Closed
wants to merge 2 commits into from
Closed

Command Pattern #1231

wants to merge 2 commits into from

Conversation

danrspencer
Copy link
Contributor

Motivation

We've adopted a few best practices within our Controllers that help us deal with a couple of different problems:

  1. Concurrency - Any time we take a mutating action in a controller there is a potential that any other changes we would make are no longer valid. If they are still valid that's fine because we can just immediately re-reconcile and take the next action.
  2. Testing - Our reconciliation is some of our most complicated code, we want it all to be as testable as possible. Unit tests are the cheapest form of test so we want our reconciliation to be unit testable.

Solution

N.B. - This is a relatively quick hack and slash job of a library we've been using in our codebase on top of KubeRS for a few years. If the approach is deemed worth pursuing I'll clean this PR up.

We use a command pattern so that our functions can state the action they want to take, but without taking the action. This is encoded in a KubeCommand enum which contains the context and the action (verb) to perform. We strip away the strong typing as part of the command so that we don't have to encode the types of mutation we want to perform in the type system

This means that we can run unit tests against the code at various levels and just assert that the command (the "mutation") we get back is the one that's expected.

The other best practice we've adopted within our project is to only ever perform a single action in any given reconciliation. This best practice is codified here in the run_via_dispatch (possibly needs a better name) method on the Controller. It also plays into the improved testability once again because it's easy to just assert on a returned value.

Finally I've included a .to_store method we use a lot in our testing which lets us turn a vector of resources into a Store, allowing us to easily setup test fixtures for our reconciles.

…iting of synchronous reconcile methods

Also added vec![].to_store() as a test util to allow for easy creation of a store from a vec.
@clux
Copy link
Member

clux commented Jun 27, 2023

Hey, this is really interesting. Thanks for this.

I'm going to reword this a bit in my words just to ensure we are on the same page. Feel free to update me if this is a misrepresentation. Will talk a little bit about what I think about them after. The two main concerns:

1. Testing

Writing unit tests with mocks (as currently recommended for this type of reconciler verification) is not super clean nor easy. Having to do it through tower mock services makes it more expensive than these type of cheap unit tests should be.

The command abstraction allows you to register Api commands against arbitrary objects (using dynamic objects as a way to satisfy typing requirements), and later see what was called. This is something you cannot do directly with Api without dropping into tower mocks - which are at the transport level - so that tests cases end up with much simplified verification logic.

2. Concurrency

Having to do early exits in the reconciler after every mutating action (to ensure unfaultable idempotency), is error-prone and a repeatable process that can be abstracted away. Reducing the reconciler into a set of steps that each perform one action based on the state we are in, makes each step more easily testable.

@clux
Copy link
Member

clux commented Jun 27, 2023

As for how this could fit here, it's complicated:

Testing

The difficulty of writing reconciler tests is something I definitely feel myself, and actually have thought about solving this by allowing something like command data vector that could optionally be stored on Client (but only used in tests that can use a different test Client). That might rely on #1032 getting somewhere though. Need to restart that a big PR chain.

Note that I would rather have this hidden somewhere within Client rather than new top level abstraction types that I suspect is likely to confuse people that want to "just run one patch" because your use-case here of writing these big controllers with this particular test pattern is pretty opinionated, and we try to avoid too much opinion in kube itself.

Dispatch/Concurrency

This is more of a question because I currently am still struggling a bit with understanding how this can truly work from a reconciler perspective.

This feels like it's trying to recreate the state machine style reconcilers pioneered by krator - but maybe I am wrong.

Suppose you have a reconciler that in response to a certain object state needs to BOTH send an event, AND patch a CR. How would you encode that in a dispatcher world? Is this something you can do? If so, how do you track the progress of this?

I bring up events in particular because we already have event::Recorder, that takes an Api and a command abstraction would be incomplete unless it put its tendrils into these helpers as well. It might be easier to extend within kube (e.g. more stuff in Client) than try to wrap the existing stuff.

@danrspencer
Copy link
Contributor Author

danrspencer commented Jun 27, 2023

I'm going to reword this a bit in my words just to ensure we are on the same page. Feel free to update me if this is a misrepresentation. Will talk a little bit about what I think about them after. The two main concerns:

Probably needed, it wasn’t my clearest ever write up - I wrote it in a bit of a hurry 😅.

1. Testing

Writing unit tests with mocks (as currently recommended for this type of reconciler verification) is not super clean nor easy. Having to do it through tower mock services makes it more expensive than these type of cheap unit tests should be.

The command abstraction allows you to register Api commands against arbitrary objects (using dynamic objects as a way to satisfy typing requirements), and later see what was called. This is something you cannot do directly with Api without dropping into tower mocks - which are at the transport level - so that tests cases end up with much simplified verification logic.

Yep, that’s roughly our aims. We tend to have a few different levels of unit tests (depending on the complexity of the specific controller) and then jump straight to integration tests against Tilt, we don’t do any mocking at all.

Note that I would rather have this hidden somewhere within Client rather than new top level abstraction types that I suspect is likely to confuse people that want to "just run one patch" because your use-case here of writing these big controllers with this particular test pattern is pretty opinionated, and we try to avoid too much opinion in kube itself.

I think the options in the current PR are relatively open ended. There’s nothing to stop people using the existing api functions for “just run one patch”, or they can have smaller specific functions return one or more KubeCommands which are later dispatched via api.dispatch_command in the reconciler, or it can be fully adopted via the run_via_dispatch function which executes the single action.

2. Concurrency

Having to do early exits in the reconciler after every mutating action (to ensure unfaultable idempotency), is error-prone and a repeatable process that can be abstracted away. Reducing the reconciler into a set of steps that each perform one action based on the state we are in, makes each step more easily testable.

The way we tend to write our reconcilers is:

  1. Figure out the state of the world in the form of an enum that represents the states we care about
  2. Given that enum we can convert directly from it into a KubeCommand
  3. Dispatch the command
  4. Repeat

I was discussing this with a team member today who’s previously worked in robotics and liked in to an “assess then act” approach that’s often used in that field.

It also has the advantage of being cancel safe so you can be a bit less careful when tearing down a service.

Suppose you have a reconciler that in response to a certain object state needs to BOTH send an event, AND patch a CR. How would you encode that in a dispatcher world? Is this something you can do? If so, how do you track the progress of this?

I must admit, I’m unfamiliar with krator, and events are something we don’t currently use (although we probably should do). The way we handle things that perhaps feel like they should happen at the same time is generally by updating the status on something to keep track of where we are in our operations. I suspect we’d handle the event scenario in the same way but that may be my naivety showing through.

One of our goals was we want our service to be able to be able to survive a forced restart at any point in its operations and come back without missing a beat. Because we can always figure out exactly what the next step we need to take is by looking at the state of the cluster we’ve achieved that.

@clux
Copy link
Member

clux commented Jun 27, 2023

That makes sense.

The way we handle things that perhaps feel like they should happen at the same time is generally by updating the status on something to keep track of where we are in our operations.

If you need to send an event + patch, then yeah, you could patch first and also include in that patch on the same object something that indicates that you also need to send an event in the next round (provided you are sending an event about that object), but that probably won't work for objects that you don't fully own (like Deployment if you are writing a custom scaler).

It is possible that we shouldn't be including events as an api command like this because they are usually diagnostics that are fired without verifying their success (because they are less important than the actual command). But it does screw with the abstraction.

am unfamiliar with krator

Here is a reference for how krator tried to do things: https://github.com/krator-rs/krator/blob/main/krator/examples/moose.rs#L215. Heavy state machine model. Abandoned now. Would be interesting to hear a post-mortem from the creator(s).

One of our goals was we want our service to be able to be able to survive a forced restart at any point in its operations and come back without missing a beat.

Yeah, that is the right mindset and goal. It's generally what I try to go for as well, with one one thing at a time handled in the reconciler. However, this type of pattern usually starts with an if-else to catch "am i in this case, or the other one", and can be turned into enums with some kind of state identifier fn after. I think it's important for kube to keep that simplicity as a default - there's already enough difficult concepts here. But at the very least, I do see value in a modern pattern laid out somewhere for this.

@danrspencer
Copy link
Contributor Author

So ... I'm a little unsure where we are with this PR now 😅. If you think it's worth getting into the main library I'm happy to knock it in to shape and address any specific feedback you think is necessary to make it idiomatic with KubeRS.

@clux
Copy link
Member

clux commented Jun 28, 2023

I think the command pattern as it stands (in particular with its runtime integration) is probably too opinionated and niche for kube at this stage. I personally need more justification that it fits our abstractions because as it builds on top of Api, it looks like it will clash with other runtime abstractions.

If you want this pattern to be more universally available now (or just demonstrate better), then I would start by experimenting on it in a separate crate.

The testing problem I do think is worth prioritising, but I think that needs to come after the v2 client (hence current reluctance), and I personally think the primary goal of that needs to be ease of testing rather than command pattern. I think that's the more important problem for us to focus in in kube because it affects a larger set of users out of the box. Note that if we can optionally observe what calls happens inside the client (across multiple apis), then building a command pattern would also be easier.

@danrspencer
Copy link
Contributor Author

Ok, thanks for the feedback. I might look at moving it to a stand alone repository for now. 👍

Would you be open to a smaller PR that adds PartialEq to a few of the KubeRS structs as I have in this PR? Currently we have to jump through a couple of hoops to be able to compare our commands because these struts don't currently derive it.

@clux
Copy link
Member

clux commented Jun 28, 2023

Would you be open to a smaller PR that adds PartialEq to a few of the KubeRS structs as I have in this PR? Currently we have to jump through a couple of hoops to be able to compare our commands because these struts don't currently derive it.

Ah, yeah, those look absolutely fine to me!

@danrspencer danrspencer closed this Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants