Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: tell apart "no such workflow" from "workflow stopped" #4798

Open
oliver-sanders opened this issue Apr 4, 2022 · 0 comments
Open
Labels
could be better Not exactly a bug, but not ideal.
Milestone

Comments

@oliver-sanders
Copy link
Member

Currently the network client(s) will raise WorkflowStopped on failure to load the contact file, however, this could also mean that there is no such workflow to load the contact file for.

This can be somewhat confusing, e.g.

Unfortunately it is hard to tell the two possibilities apart for all cases:

  • Workflow might exist on the scheduler host but not on the remote.
  • Workflow might exist on the remote but not on the scheduler host.

The way we keep remotes in-sync is through the contact file, however, (because it's not needed) we do not sync the contact file to "polling" remote hosts. We would need to extend this signalling mechanism to polling hosts to allow us to differentiate in these cases. We used to have a "contact 2" file, though I don't think it was used for this purpose it would work for this case.

Note that this singling mechanism is not perfect and can fail:

  • Scheduler get killed.
  • Remote tidy fails due to network issues.

On the scheduler hosts we can perform an SSH/process listing to check if the workflow is still alive and tidy up the contact file if not. We can't pull off these tricks from polling remote hosts (and probably shouldn't try from TCP/SSH+TCP remote hosts, although it's likely we do).

See also: #4776 (comment)

Pull requests welcome!

@oliver-sanders oliver-sanders added the could be better Not exactly a bug, but not ideal. label Apr 4, 2022
@oliver-sanders oliver-sanders added this to the cylc-8.1.0 milestone Apr 4, 2022
@oliver-sanders oliver-sanders modified the milestones: cylc-8.1.0, cylc-8.2.0 Oct 18, 2022
@oliver-sanders oliver-sanders modified the milestones: cylc-8.2.0, cylc-8.x Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could be better Not exactly a bug, but not ideal.
Projects
None yet
Development

No branches or pull requests

1 participant