Invoker concurrency quota #548

slinkydeveloper · 2023-06-28T15:21:20Z

Fix #540, introducing a configurable concurrency limit in the invoker. In order to be able to unit test this feature, I had to do quite some changes to allow mocking. Hopefully this is going to be useful in future anyway, to test the invoker state machine transitions in an isolated manner.

…oal of this commit is to make more manageable testing the state machine logic. Also make sure we unreserve the quota when we abort an invocation task.

…plify the structure. Move quota to a separate file.

Removed Codec from Invoker (this was unused)

tillrohrmann

Thanks for creating this PR @slinkydeveloper. The changes make sense and look good to me. I had some minor comments that we can resolve before merging this PR.

src/invoker/src/service/mod.rs

src/invoker/src/service/state_machine_coordinator.rs

src/invoker/src/service/state_machine_tree.rs

src/invoker/src/service/mod.rs

tillrohrmann · 2023-06-30T08:12:15Z

src/invoker/src/service/mod.rs

+            self.quota.unreserve_slot();
+            self.status_store.on_end(&partition, &service_invocation_id);


It seems that we unreserve whenever we end an invocation. Currently, we reserve a slot and only later we call InvocationStatusStore.on_start. Maybe there is some symmetry and we could combine it so that future changes don't run the risk to forget calling one of them (like whenever we start an invocation we reserve a slot, whenever we end and invocation, we unreserve it).

As a side effect, if we always start when reserving a slot, then InvocationStatusStore.on_failure wouldn't be called w/o having first called on_start.

I think it is not easy to refactor as you propose:

We reserve a slot when we get the InvokeCommand, and we unreserve it whenever this invoker will "forget" about this sid, meaning it removes the sid from the InvocationStateMachines

InvocationStatusStore.on_start signals when we actually start the invocation task, setting the in_flight to true and last executed time to now. In particular, when coming from a retry, we need to "flip" the status from in_flight = false to in_flight = true and reset the last executed time to now. Meaning after reserving the quota once, we may invoke several times InvocationStatusStore.on_start in case of retries

The only case where it may make sense to unify is on_end and unreserve_slot, as they're both triggered when the invoker is "removing" the given sid from its internal state. But TBH I prefer to keep them separate, as it makes more explicit what's going on. We can revisit this later if we want to.

Alright. I guess to make it work the status needs to be slightly changed: We create the InvocationStateMachine when allocating the quota and release the quota when releasing it. That would bind the InvocationStateMachine to the lifecycle of its slot/quota reservation. Nothing we have to do right now.

This could potentially resolve another asymmetry which is that if we fail on start_invocation_task, then we need to create a InvocationStateMachine to call handle_error_event. Moving the concern of creating this state machine out of the start_invocation_task could simplify the control flow a tad bit.

The problem is that to create the state machine I need the endpoint metadata first.

Maybe this could become part of the state variants that have a resolved endpoint. I could see that the endpoint metadata might change across restarts if there is a newer endpoint version and no committed journal entries yet.

I could see that the endpoint metadata might change across restarts if there is a newer endpoint version and no committed journal entries yet.

I think we should keep this "relaxed" and continue to do what we do today, which is that we don't "fix" the endpoint after resolving it, but we re-resolve on every restart, to allow the user to "fix it" even if there are committed journal entries. This is especially useful if we allow some "configuration options" of the endpoint to be mutable: A case for this could be, for example, that you have made some changes to the gateway between restate and the service endpoint, perhaps requiring a new header, but you still did not configure the header in the restate "endpoint". By adding the header to the endpoint configuration, the runtime will be able again to access again the service endpoint.

tillrohrmann

LGTM. +1 for merging.

src/invoker/src/service/mod.rs

slinkydeveloper added this to the 1B milestone Jun 28, 2023

slinkydeveloper requested a review from tillrohrmann June 28, 2023 15:21

slinkydeveloper changed the title ~~Invoker quota~~ Invoker concurrency quota Jun 28, 2023

slinkydeveloper added 6 commits June 28, 2023 17:22

Implement invoker quota

3c853e1

Moved the actual invoker state machine logic back in Service. The g…

16bd82a

…oal of this commit is to make more manageable testing the state machine logic. Also make sure we unreserve the quota when we abort an invocation task.

Move the handle/input command definitions in a separate file, and sim…

be38207

…plify the structure. Move quota to a separate file.

Add an interface to mock the logic to run the InvocationTask.

6b4f10b

Removed Codec from Invoker (this was unused)

Add quota tests

1ba6942

Rebase changes

a0df62a

slinkydeveloper force-pushed the issues/540 branch from 4ca6b4f to a0df62a Compare June 28, 2023 15:30

tillrohrmann approved these changes Jun 30, 2023

View reviewed changes

slinkydeveloper force-pushed the issues/540 branch from bb8d8f3 to 32b6f58 Compare June 30, 2023 12:21

Feedback

a8222b5

slinkydeveloper force-pushed the issues/540 branch from 32b6f58 to a8222b5 Compare June 30, 2023 13:06

slinkydeveloper requested a review from tillrohrmann June 30, 2023 13:10

tillrohrmann approved these changes Jul 3, 2023

View reviewed changes

src/invoker/src/service/mod.rs Outdated Show resolved Hide resolved

ism_manager -> invocation_state_machine_manager

e7845c5

slinkydeveloper merged commit fc35ad5 into restatedev:main Jul 3, 2023

slinkydeveloper deleted the issues/540 branch July 3, 2023 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invoker concurrency quota #548

Invoker concurrency quota #548

slinkydeveloper commented Jun 28, 2023 •

edited

Loading

tillrohrmann left a comment

tillrohrmann Jun 30, 2023

slinkydeveloper Jun 30, 2023

tillrohrmann Jul 3, 2023

tillrohrmann Jul 3, 2023

slinkydeveloper Jul 3, 2023

tillrohrmann Jul 3, 2023

slinkydeveloper Jul 3, 2023

tillrohrmann left a comment

		self.quota.unreserve_slot();
		self.status_store.on_end(&partition, &service_invocation_id);

Invoker concurrency quota #548

Invoker concurrency quota #548

Conversation

slinkydeveloper commented Jun 28, 2023 • edited Loading

tillrohrmann left a comment

Choose a reason for hiding this comment

tillrohrmann Jun 30, 2023

Choose a reason for hiding this comment

slinkydeveloper Jun 30, 2023

Choose a reason for hiding this comment

tillrohrmann Jul 3, 2023

Choose a reason for hiding this comment

tillrohrmann Jul 3, 2023

Choose a reason for hiding this comment

slinkydeveloper Jul 3, 2023

Choose a reason for hiding this comment

tillrohrmann Jul 3, 2023

Choose a reason for hiding this comment

slinkydeveloper Jul 3, 2023

Choose a reason for hiding this comment

tillrohrmann left a comment

Choose a reason for hiding this comment

slinkydeveloper commented Jun 28, 2023 •

edited

Loading