Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting for handlers to finish in all exit cases + abort and compensation in a message handler #144

Merged
merged 20 commits into from
Jan 28, 2025

Conversation

dandavison
Copy link
Contributor

@dandavison dandavison commented Oct 2, 2024

Two new samples. From the READMEs:

# Waiting for message handlers

This workflow demonstrates how to wait for signal and update handlers to
finish in the following circumstances:

- Before a successful return
- On failure
- On cancellation
# Waiting for message handlers, and performing compensation and cleanup in message handlers

This sample demonstrates how to do the following:

1. Ensure that all update/signal handlers are finished before a successful
    workflow return, and on workflow cancellation and failure.
2. Perform compensation/cleanup in an update handler when the workflow is
    cancelled or fails.

@dandavison dandavison force-pushed the message-handler-waiting-compensation-cleanup branch from c566b62 to 9355bc2 Compare October 2, 2024 17:34
@dandavison dandavison force-pushed the message-handler-waiting-compensation-cleanup branch 7 times, most recently from af64650 to 5867812 Compare October 7, 2024 18:23
Copy link
Contributor

@drewhoskins-temporal drewhoskins-temporal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels to me like there's two main things going on here.

  • cleaning up the workflow's handlers
  • aborting updates and compensating for them.

Both seem valid. The first feels mainstream, the second power user (though we may start to get feedback telling us otherwise).

I wonder if they should be separate samples, or at least more clearly delineate this one so people can just pick out what they need. (I don't want to scare people off of #1 because of the code for #2.)

I can imagine pretty achievable framework improvements we could do for both scenarios. Given that we've heard a lot more feedback about the first scenario, that'd be my guess as to where to invest for now.

Comment on lines 97 to 100
# 👉 "Race" the workflow_exit future against the handler's own application
# logic. Always use `workflow.wait` instead of `asyncio.wait` in
# Workflow code: asyncio's version is non-deterministic.
await workflow.wait( # type: ignore
[update_task, self.workflow_exit], return_when=asyncio.FIRST_EXCEPTION
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, and looks like a pain, but I'm not sure how often people want to cancel in-process update handlers in response to a workflow exit as opposed to simply waiting for the handlers to complete.
My guess is that most people are happy to wait because such cases are not particularly latency sensitive.
It may be okay that this is more of a power user struggle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with this. The point isn't the latency -- it's that if the workflow is exiting due to a failure/cancellation then the update may need to perform a rollback / cancellation. So they can't "simply wait for the handler to complete" because the handler has to learn whether it should proceed to the end of its happy path, or perform rollback / compensation.

) from cast(BaseException, self.workflow_exit.exception())
# 👉 Catch BaseException since asyncio.CancelledError does not inherit
# from Exception.
except BaseException as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought deeply, but I'm wondering if using a child workflow would work better for use cases in which there are compensations. Would it be more straightforward? Would it work better, e.g. by letting the workflow exit and letting the child workflow finish up the compensations?
Obviously that's extra latency which some might not be able to afford. And it's a lot of code, though it might be easier than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea and could be worth another sample, at least to investigate and compare how well it works. This sample contains useful techniques for those not using child workflows, so I don't think this work need block merge here.

) from e
raise

async def my_update_compensation(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be cool if folks could register cancellation handlers with an annotation similar to validators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea! Make a note of it?

raise AssertionError("unreachable")


def is_workflow_exit_exception(e: BaseException) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , This feels like a concept worth highlighting (and maybe adding a helper?) that I think lots of users would benefit from. Good to take every opportunity to teach users about which exceptions fail the workflow and the fact that some don't fail it (and that this is a good thing), and then show how that intersects with cleanup code like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think we all agreed on that. temporalio/features#589

@dandavison dandavison force-pushed the message-handler-waiting-compensation-cleanup branch from d72c2de to 270830d Compare January 23, 2025 22:58
@dandavison
Copy link
Contributor Author

I wonder if they should be separate samples

Yes, I made them into two separate samples:

# Waiting for message handlers

This workflow demonstrates how to wait for signal and update handlers to
finish in the following circumstances:

- Before a successful return
- On failure
- On cancellation
# Waiting for message handlers, and performing compensation and cleanup in message handlers

This sample demonstrates how to do the following:

1. Ensure that all update/signal handlers are finished before a successful
    workflow return, and on workflow cancellation and failure.
2. Perform compensation/cleanup in an update handler when the workflow is
    cancelled or fails.

This should be ready to merge now.

@dandavison dandavison force-pushed the message-handler-waiting-compensation-cleanup branch from dce6d7b to 1ca9796 Compare January 28, 2025 13:18
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking

Comment on lines +59 to +63
for exit_type in [
WorkflowExitType.SUCCESS,
WorkflowExitType.FAILURE,
WorkflowExitType.CANCELLATION,
]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit complicated for a simple waiting sample IMO, but nothing wrong with it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this sample is partly to teach users about what errors they'll see under different sorts of workflow exit. See the README:

workflow exit type: SUCCESS
    🟢 caller received update result
    🟢 caller received workflow result


workflow exit type: FAILURE
    🟢 caller received update result
    🔴 caught exception while waiting for workflow result: Workflow execution failed: deliberately failing workflow


workflow exit type: CANCELLATION
    🟢 caller received update result

)


def is_workflow_exit_exception(e: BaseException) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurate for all users to copy because users can customize what is a workflow exit exception. If we feel this is a utility users need, we should consider exposing on the SDK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it says that in the comment:

    # 👉 If you have set additional failure_exception_types you should also
    # check for these here.

and we have temporalio/features#589 for the utility.

Comment on lines +56 to +61
# 👉 Catch BaseException since asyncio.CancelledError does not inherit
# from Exception.
except BaseException as e:
if is_workflow_exit_exception(e):
await workflow.wait_condition(workflow.all_handlers_finished)
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity reasons, I wonder if this sample should allow a workflow failure to not wait for a handler. That might be what some users want anyways (to fail the workflow now instead of waiting to fail).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that showing them how to wait for the handler when the workflow fails is useful since it points them towards how to implement graceful cleanup of a handler execution.

@dandavison dandavison merged commit d9658d5 into main Jan 28, 2025
10 checks passed
@dandavison dandavison deleted the message-handler-waiting-compensation-cleanup branch January 28, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants