Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Update replicator to use a timeout and to automatically stop watching when a final phase is seen. #57

Merged
merged 3 commits into from
May 4, 2023

Conversation

bunchesofdonald
Copy link
Contributor

@bunchesofdonald bunchesofdonald commented May 4, 2023

Digging into issue #55 I found that the thread hanging was caused by kubernetes.watch.Watch not obeying the stop message. This happens because stop is only evaluated after watch.stream yields and it will stop yielding when the job we're watching is complete. This PR makes two adjustments to the way we're handling the watcher:

  1. It passes in a timeout of pod_watch_timeout_seconds to watch.stream which means that if any given request that watch makes times out then our thread will exit.
  2. In the event processing loop watch.stop is explicitly called whenever a 'final' phase is seen.

These two changes should ensure that the thread exits at most pod_watch_timeout_seconds seconds after the last event is seen, and in most cases the thread should exit immediately after a flow-run is complete.

Closes #55

Checklist

  • References any related issue by including "Closes #" or "Closes ".
    • If no issue exists and your change is not a small fix, please create an issue first.
  • Includes tests or only affects documentation.
  • Passes pre-commit checks.
    • Run pre-commit install && pre-commit run --all locally for formatting and linting.
  • Summarizes PR's changes in CHANGELOG.md

@bunchesofdonald bunchesofdonald requested a review from a team May 4, 2023 15:22
@bunchesofdonald bunchesofdonald added the bug Something isn't working label May 4, 2023
Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@desertaxle desertaxle merged commit f40d0cf into main May 4, 2023
@desertaxle desertaxle deleted the issue-55 branch May 4, 2023 16:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubernetes worker hangs at the end of flow run execution
2 participants