Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trigger: generalisation of triggering approaches #4686

Closed
oliver-sanders opened this issue Feb 14, 2022 · 70 comments · Fixed by #4739
Closed

trigger: generalisation of triggering approaches #4686

oliver-sanders opened this issue Feb 14, 2022 · 70 comments · Fixed by #4739
Assignees
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Feb 14, 2022

Related Issues:

If agreed this issue should supersede:

After a long chat with @dpmatthews (who proposed yet another triggering approach 😁) I think we can generalise the trigger problem into two dimensions:

  • Continue (yes/no).
    • After I trigger the task will the flow continue from that point immediately.
    • Or does it only continue if/when a flow front catches up with it.
    • I.E. Should the triggered tasks spawn children on completion or after "merge".
  • Overrun (yes/no).
    • Should the "merge" [1] condition be based on the pool or the DB?
    • I.E. Should triggered tasks overrun previous runs of tasks?
    • I.E. Should the following flow overrun the triggered tasks?

Note: From the internal implementation these two dimensions may appear flip-sides of the same coin since they both boil down to the flow_nums, however, considering them from a user standpoint I think it's fair to prise them apart.

Note: Purposefully using new terminology to avoid conflation with existing terms, we may want to workshop "continue" and "overrun" a touch.

[1]: The quoted "merge" above relates to the interaction between two tasks with different flow_nums in general and not to the more specific concept of "flow merging" in the pool exclusively.

Combing these we get four spaces:

Continue Don't Continue
Overrun (1) Reflow (as currently implemented) (3) No Flow (current default trigger behaviour)
No Overrun (2) Continue (@dpmatthews new proposed implementation) (4) No Flow (@oliver-sanders proposed implementation)
  • The bad news is it looks like we have use cases for all four.
  • Dave & I think the no-overrun cases are more important than the overrun ones.
  • The good news is that they can coexist and the mechanism for supporting all four is currently implemented, it's mostly an interface problem.

Going through the four spaces in detail:

1) Reflow (implemented)

Equivalent to cylc trigger --flow=<new-flow-number>.

Continue: Yes
Overrun: Yes

  • Tasks are triggered with a new flow number.
  • The reflow can overrun previous flows.
  • The reflow will merge if it collides with another flow in the pool (and only in the pool i.e. overrun).

The use case is for re-running over tasks which have been previously run e.g. change configuration and re-run a sub-graph.

2) Continue (proposed)

Equivalent to cylc trigger --flow=<all-flow-numbers>,<new-flow-number>.

Continue: Yes
Overrun: No

  • A new trigger approach proposed by @dpmatthews.
  • Tasks are triggered with all existing flow numbers plus a new flow number (which we added purely so the new flow can still be targeted by CLI tools).
  • Because this flow contains all existing flow numbers it will not be overrun by any of the flows which exist at the time of the trigger.
  • This is intended for the sort of use cases we would expect --flow=1 to be used for, but has been generalised to be reflow compatible.

This approach feels quite "natural". The use cases are setting off another bit of the same flow where you don't want tasks to be overrun.

3) No Flow (implemented)

Equivalent to cylc trigger --flow -1.

I am using a negative flow number rather than None to distinguish the two no-flow approaches.
Internally we can still maintain the same no-flow logic as present but would need to change the marker.

Continue: No
Overrun: Yes

Useful for running one-off tasks that you do not want to impact the workflow in any way (i.e. cylc submit type uses).

4) No Flow (proposed)

Equivalent to cylc trigger --flow -2.

I am using a negative flow number rather than None to distinguish the two no-flow approaches.
Internally we can still maintain the same no-flow logic as present.

Continue: No
Overrun: No

Use case is for manually intervening in graph execution by ignoring dependencies or runahead limit and skipping ahead to a task which you want to be considered a part of the approaching flow front.

Interface

The internals to handle the four cases are already in-place, flow_nums, DB lookups etc, so it mostly boils down to an interface / documentation issue.

I think all four methods could be exposed via a single --flow argument, however, it is sensible to provide defaults for the different behaviours. I think it would be good to document the --flow equivalents as they may help users to understand their function.

Note that --reflow currently determines the new flow number server rather than client side which is sensible.

1) Enable behaviours explicitly

If we are happy with the continue/overrun model (after workshopping the terms) we could expose it directly something like:

# 1) reflow
cylc trigger --continue --overrun

# 2) continue
cylc trigger --continue

# 3) no-flow (implemented)
cylc trigger --overrun

# 4) no-flow (proposed)
cylc trigger

This is quite nice as you have to explicitly opt in to each behaviour separately reducing the scope for unintended results and accidents.

2) Single --flow argument

if we don't like the continue/overrun model we could move the presets into the flow argument something like:

# 1) reflow
cylc trigger --flow=new

# 2) continue
cylc trigger --flow=any

# 3) no-flow (implemented)
cylc trigger --flow=none

# 4) no-flow (proposed)
cylc trigger --flow=next

It's less behaviour driven so we would need to explain each option separately.

3) Separate flag for each approach

An alternative to (2) would be to could come up with three/four different flags:

# 1) reflow
cylc trigger --reflow

# 2) continue
cylc trigger --flow

# 3) no-flow (implemented)
cylc trigger --rerun

# 4) no-flow (proposed)
cylc trigger  # --run

Default

I think no-continue & no-overrun is the safest, sanest default because:

  • The minimum set of behaviours is the simplest.
  • The "Continue" cases have a dramatic impact on the workflow execution and are hard to revoke.
  • The "Re-run" cases are quite advanced and require additional knowledge to operate.

But I'm biased. I think the default is less important than the clear separation of behaviours.

@hjoliver
Copy link
Member

hjoliver commented Feb 14, 2022

Nice write-up @oliver-sanders 👍

I generally agree on the matrix of cases we need to support, in terms of the functional capability.

I'm not so sure on the conceptual description, bearing in mind that as you said, the terminology needs work-shopping.

Continue (yes/no).

  • After I trigger the task will the flow continue from that point immediately.
  • Or does it only continue if/when a flow front catches up with it.
  • I.E. Should the triggered tasks spawn children on completion or after "merge".

I find this use of "continue" a bit brain-twisting. continue=no means continue if/when a flow catches up? So Case 3 is classified as overrun=yes, continue=no which has to be understood like this: don't continue immediately; but do continue if/when a flow catches up and merges; BUT that won't happen because the triggered task can be overrun by any following flow?

Overrun (yes/no).

  • Should the "merge" [1] condition be based on the pool or the DB?
  • I.E. Should triggered tasks overrun previous runs of tasks?

The "no" case of this is meaningless isn't it? If you trigger a task anywhere behind any flow, including the original flow which typically starts at the start of the graph, but it can't overrun previous runs, then the triggered task itself can't even run, no? Or do we need to specify which flow(s) it can't overrun?

  • I.E. Should the following flow overrun the triggered tasks?

I think it's intuitive to say "my newly-triggered flow can overrun previously-run flows". I'm not so sure that "my newly-triggered task/flow cannot be overrun by upcoming flows" is the best way to describe that concept though. Better to think of it as "my newly triggered flow is part of the upcoming flow(s)" (i.e. I am "pre-running" bits of the upcoming flow(s)).

@hjoliver
Copy link
Member

hjoliver commented Feb 15, 2022

In other words, I think we should describe what happens in terms of the task or flow being triggered:

I am triggering this task,

  • as a one-off run that will not affect the workflow
  • or to start a new flow that is independent of other flows (and can therefore overrun them); can be used to re-run tasks
  • or to pre-run a task as part of an upcoming flow (which will merge with the triggered task and then flow onward from it)
  • or to pre-run a flow as part of an upcoming flow (which will merge with it and not overrun it)

(with the n=0 task pool merge proviso in all cases)

@hjoliver
Copy link
Member

hjoliver commented Feb 15, 2022

A new trigger approach proposed by @dpmatthews.

I like the idea of adding a new flow number to triggered tasks (that aren't actually starting a new flow) so that they can be targeted with the CLI.

I'm not so sure about the "add all existing flow numbers" bit. How is that better than just merging with the first flow that comes along, whatever its number? Do you want avoid merging with other flows triggered after the first triggering event?

This is intended for the sort of use cases we would expect --flow=1 to be used for, but has been generalised to be reflow compatible.

Not sure I follow this. How is --flow=N not reflow-compatible?

it will not be overrun by any of the flows which exist at the time of the trigger.

This seems to suggest that the triggered task will merge with every subsequent flow that catches up with it? (or at least, every flow that was present when the task was triggered).

I would think we want to support pre-running a task for a particular flow (or perhaps for the first one that catches it), but the first merge neutralises it so that further flows can overrun it.

E.g. consider pre-running a task for the current main flow; then much later start an entirely new flow to re-run that part of the graph - it would be surprising if some hidden long-ago triggered task was treated as already-ran and merged with the new flow? I don't think we can tell the new flow not to merge with the old task, because that would preclude later decisions to pre-run tasks in the new flow.

[Update: your CLI examples seem to suggest you don't mean every subsequent flow should merge with the triggered task 👍 ]

Still considering the CLI options you've laid out ... at first glance I prefer something like 2.

I don't mind the default being as you describe, for triggering tasks in front of a flow. (Just not keen on the description in terms of "continue" and "overrun", for reasons above).

@oliver-sanders
Copy link
Member Author

Overrun (yes/no).

The "no" case of this is meaningless isn't it?

Just not keen on the description in terms of "continue" and "overrun", for reasons above

I think the continue/overrun model is valid, I don't think the "overrun" part has been communicated.

I'm describing behaviour from the user's perspective not the implementation perspective.

Here's an example to show how the overrun=yes/no cases differ and hopefully clarify the 'continue' bit.

a => b => c => d => e => f

Lets start with task "a" running in the original flow, then trigger task "d" using each of the four methods.

  • flow:1
    • a(running)
  • flow:<triggered-flow>
    • d(running)

The following examples show what the workflow would look like after it has finished running.

1) Reflow

cylc trigger --flow=2

The two flows never meet in the pool so never merge.

  • flow:1
    • a,b,c,d,e,f
  • flow:2
    • d,e,f
  • Does the flow "continue" from task "d" immediately after it completes: yes
  • Does the task "d" get "overrun": yes

2) Continue

cylc trigger --flow=1,2

The two flows never meet in the pool, however, they do meet in the DB. Because the triggered tasks belong to flow=1 they will not be overrun.

  • flow:1
    • a,b,c
  • flow:1,2
    • d,e,f
  • Does the flow "continue" from task "d" immediately after it completes: yes
  • Does the task "d" get "overrun": no

3) No-flow (implemented)

cylc trigger --flow=-1

The triggered task leaves the pool. As it is not considered to belong to any flow it will be overrun.

  • flow:1
    • a,b,c,d,e,f
  • flow:-1
    • d
  • Does the flow "continue" from task "d" immediately after it completes: no
  • Does the task "d" get "overrun": yes

4) N-flow (proposed)

cylc trigger --flow=-2

The triggered task leaves the pool. As it is considered to belong to any flow it will be overrun.

  • flow:1
    • a,b,c,d,e,f
  • Does the flow "continue" from task "d" immediately after it completes: no
  • Does the task "d" get "overrun": no

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 15, 2022

A new trigger approach proposed by @dpmatthews.

I'm not so sure about the "add all existing flow numbers" bit. How is that better than just merging with the first flow that comes along, whatever its number? Do you want avoid merging with other flows triggered after the first triggering event?

Approaches (2) and (4) are both trying to solve the overrun=no problem but are doing it in different ways.

Approach (4) uses a special marker to say "merge me with anything", approach (2) explicitly lists all of the flows tasks should be considered to belong to.

The reason we are not using the special marker to implement (2) is because that would prohibit a reflow from being able to overrun the tasks.

This is intended for the sort of use cases we would expect --flow=1 to be used for, but has been generalised to be reflow compatible.

If there are multiple flows (e.g. 1 & 2) active at the time the trigger is performed, --flow=1 will not necessarily do the job. What if the triggered task is caught first by flow=2? It would merge if in the pool but be overrun if no longer in the pool.

I would think we want to support pre-running a task for a particular flow (or perhaps for the first one that catches it), but the first merge neutralises it so that further flows can overrun it.

This is the use case for trigger type (4) [continue:no, overun:no].

The task will merge with the next flow that catches it, subsequent flows can overrun it.

E.g. consider pre-running a task for the current main flow; then much later start an entirely new flow to re-run that part of the graph

[Update: your CLI examples seem to suggest you don't mean every subsequent flow should merge with the triggered task 👍 ]

Because we explicitly listed all of the flows the triggered task belongs to when we issued the trigger, any new flow numbered that are subsequently added (by reflow) will be able to overrun the task.

Still considering the CLI options you've laid out ... at first glance I prefer something like 2.

If we can come up with good --flow descriptors this could work quite nicely.

@wxtim
Copy link
Member

wxtim commented Feb 15, 2022

I'm going to want animated GIFs if I have to explain this to users.

@dpmatthews
Copy link
Contributor

Still considering the CLI options you've laid out ... at first glance I prefer something like 2.

If we can come up with good --flow descriptors this could work quite nicely.

Some suggestions ...

cylc trigger --flow=new

Seems fine - this is a new flow which is separate from any existing flow (unless it merges).

cylc trigger --flow=any

How about --flow=current - this is part of the current flow(s) (hence no overrun).

cylc trigger --flow=none

Seems fine - this isn't part of any flow.

cylc trigger --flow=next

How about --flow=wait - wait for another flow to merge before continuing.

@hjoliver
Copy link
Member

hjoliver commented Feb 18, 2022

Coming back to this ...

I understand the "continue" and "overrun" terminology, but IMO it's more intuitive to think of it like this:

The triggered flow can:

  • continue immediately (your continue=yes)
    • if the triggered flow is a new flow (it has a new flow number) it will not get merged with any upcoming flow, because it does not belong to those flows (your continue=yes; overrun=yes)
    • or if you say it belongs to an upcoming flow [1], that flow will merge with it on catch-up (your continue=yes; overrun=no)
  • or it can continue later, when a flow catches and merges with it (your continue=no; overrun=no)
    • this can only happen if you say it belongs to an upcoming flow [1]
  • or it can not continue at all (your continue=no; overrun=yes)

[1] by "say it belongs to an upcoming flow" I mean any one of:

  • name a particular upcoming flow that it belongs to
  • or name several upcoming flows that it can belong to (the first one to catch it wins)
  • or say it can merge with any upcoming flow (the first one to catch it wins)

The main point of confusion of course is that continue=no means - by what I think is the normal semantics of the word! - "continue later" (after a merge). (I suppose you might say the triggered task does not continue, but the flow that merges with it does, but IMO are they are really one and the same).

Finally, "overrun" makes more sense, but it's kind of intuitively obvious in the following sense:

  • if a flow runs into a triggered task that "belongs to it" (in the sense above) then it will obviously not overrun it, because by definition that task has already run in that flow
  • otherwise it obviously will overrun it, because flows do not interact unless they are, in essence, "the same flow"

@dpmatthews
Copy link
Contributor

@hjoliver It sounds like we're all happy with the 4 options - the issue is to how best describe them?
I'm not clear what you're suggesting.
What do you think of my suggestion? I'd avoided using the words "continue" and "overrun".

I'm still reasonable happy with --flow=new, --flow=none and --flow=wait as described above.

How about --flow=current - this is part of the current flow(s) (hence no overrun).

This seems the most difficult to describe. Other suggestions:
--flow=join, --flow=combine, --flow=old, --flow=inherit
These are all just different ways of saying become part of the existing flows(s).

@oliver-sanders
Copy link
Member Author

The main point of confusion of course is that continue=no means - by what I think is the normal semantics of the word!

As stated in the OP I am not proposing the words "continue" and "overrun" and am fully aware of the issue with "continue".

Note: Purposefully using new terminology to avoid conflation with existing terms, we may want to workshop "continue" and "overrun" a touch.

If we are happy with the continue/overrun model (after workshopping the terms) we could expose it directly something like:

However, I am suggesting that this model for understanding the different methods might make for an intuitive interface given sensible substitutions for these words as it would help users to understand the ways in which the different trigger methods are similar and the ways in which they are different.

@hjoliver
Copy link
Member

hjoliver commented Feb 21, 2022

As stated in the OP I am not proposing the words "continue" and "overrun" and am fully aware of the issue with "continue".

Well your case 1 above does suggest those words as CLI options; and you did describe the whole thing in those terms, which I initially struggled to understand because the of the "continue" issue, and which I assumed (maybe wrongly) might be how you'd want to describe it to users too.

(Maybe I should have moved on from this discussion already, but I came back to it after a few days and it didn't seem fully resolved).

I'm not clear what you're suggesting.

Sorry, I wasn't suggesting anything there, just explaining why I don't think the conceptual description in terms of continue/overrun above is very helpful, compared to simply describing the triggered task belonging to an upcoming flow or not (and if it does belong, when the flow encounters it, it knows the task has already been run and so won't run it again).

What do you think of my suggestion? I'd avoided using the words "continue" and "overrun".

That's the next step, which I meant to follow up on the other day but ran out of time ...

@hjoliver
Copy link
Member

hjoliver commented Feb 22, 2022

OK, on to the real point ... CLI

I like the --flow=blah approach.

It seems pretty intuitive, and it remains compatible with giving actual flow numbers if that's ever needed. E.g. trigger a task in front of multiple flows, but only one of them should merge with it (I'm not sure what the use case is for that, but I'm also not sure what the use case is for having any one of several upcoming flows merge with it ... the reality is, I don't really see people using multiple upcoming flows in close proximity like this until we can support entirely independent flows).

How about a small variation on the --flow= suggestions above: if you trigger a task, expect it to continue (flow on) as normal unless --flow=none (one-off task) or --wait (wait for catch and merge before continuing).

a) trigger a one-off task that does not affect other flows:

$ cylc trigger --flow=none

b) trigger a new flow, that is not part of any existing flow:

$ cylc trigger --flow=new

c) pre-run parts of the graph ahead of an existing flow or flows:

# trigger and continue as normal
$ cylc trigger --flow=any  # any current flow can catch up and merge; others can overrun
$ cylc trigger --flow=1,2  # flows 1 or 2 can catch up and merge; others can overrun

# trigger as above but wait for merge before continuing
$ cylc trigger --wait --flow=any
$ cylc trigger --wait --flow=1,2

I like this because waiting is singled out as unusual behaviour. For a spawn-on-demand scheduler, the default should be to spawn downstream children as normal (i.e. as outputs are generated).

Also, it makes it clear that continue-now vs continue-later are essentially the same thing as far as the flows are concerned, just with a delay in the wait case.

Finally, maybe --flow= should be a required option, to force users to think about what they want, because getting it wrong could be costly.

@wxtim
Copy link
Member

wxtim commented Feb 22, 2022

Can I confirm that my sketch matches the four Options @oliver-sanders ?

IMG_20220222_073945.jpg

In which case I like the terms,

  • "Island" - triggered task(s) is not part of workflow
  • "Bridge" - triggered task(s) will form a bridge between their upstream and downstream dependencies.

And suggest that limiting the length of the bridge / size of island to 1 is just a special case - i.e.

  1. Long Bridge
  2. Long Island
  3. Length 1 Island
  4. Length 1 Bridge

The odd aspect of this "conceit" for describing reflow is that you don't know how long a bridge will be when you start building it.

When it comes to documentation I'm guessing I won't be the only person who needs pictures to understand this.

@wxtim
Copy link
Member

wxtim commented Feb 22, 2022

Alt visualization:

IMG_20220222_080429.jpg

@dpmatthews
Copy link
Contributor

Finally, maybe --flow= should be a required option, to force users to think about what they want

We have to remember that this discussion only applies to tasks not currently in the pool (*). If you, for example, trigger a failed task that should just run that task as part of the current flow and shouldn't require you to specify any additional options (and, if you do specify any options they should be ignored).

(*) Not sure what terminology to use. This certainly applies to a task with incomplete outputs. Also to a) a queued task, b) a task waiting for a clock trigger or an xtrigger, c) what else? We need to make sure this is clearly documented. We also need to describe what happens if you trigger a submitted or running task (the trigger is ignored).

because getting it wrong could be costly

I think this is why @oliver-sanders wants the default to be the least dangerous option (a one-one task?) - you shouldn't trigger a new flow without choosing the required behaviour.

@hjoliver
Copy link
Member

@wxtim -

When it comes to documentation I'm guessing I won't be the only person who needs pictures to understand this.

Yes we'll definitely need diagrams!

I'm not keen on the "bridge" and "island" terminology for users (although I see what you're getting at).

@hjoliver
Copy link
Member

hjoliver commented Feb 22, 2022

We have to remember that this discussion only applies to tasks not currently in the pool (*).

Yes, I was taking that as a given at this point.

I think this is why @oliver-sanders wants the default to be the least dangerous option (a one-[OFF] task?) - you shouldn't trigger a new flow without choosing the required behaviour.

Yes, but my latest suggestion is even safer than having a "safest default": there is no default, and you can't trigger a new flow without using --flow=new. I'm not entirely averse to having the safest option as a default, but even that is arguably dangerous.

For example, users who blithely do a cylc trigger without considering what that means might not find out for quite some time that the upcoming flow ignores their triggered task.

@hjoliver
Copy link
Member

hjoliver commented Feb 22, 2022

If you, for example, trigger a failed task that should just run that task as part of the current flow and shouldn't require you to specify any additional options (and, if you do specify any options they should be ignored).

We can document that as If a task already belongs to a flow, triggering it will just run it sooner as part of its own flow.

Then we just have to make the "already belongs to a flow" bit obvious in the UI.

As for "shouldn't have to specify any options" (if already in the pool): that requires some thought (for the CLI at least).

@oliver-sanders
Copy link
Member Author

Can I confirm that my sketch matches the four Options

Not sure about the first one, however, your "Alt visualizations" are spot on 💯 (and exactly how I would want to present this to users). We should be able to do this in ASCII for the CLI --help.

Island & Bridge

Kinda get what you're going for there but Cylc "islands" are highly unstable:

  • Whilst a reflow (1) is an "island" in a sense, it will "merge" with another approaching island forming a sand-bar.
  • Whilst a no-flow (3) can create an "island", if you try to create this island too close to shore you get a peninsular instead.

Perhaps "sub-marine sea mounts"?

Well your case 1 above does suggest those words as CLI options; and you did describe the whole thing in those terms

Note: Purposefully using new terminology to avoid conflation with existing terms, we may want to workshop "continue" and "overrun" a touch.

I picked these awkward terms because otherwise we would have gone back around the terminology loop of the previous week again, please consider them as abstract concepts for which we can choose new labels.

Irrespective of how the CLI ends up, explaining these four trigger spaces will ultimately involve explaining the behaviours which form the matrix (since the matrix defines the fundamental differences) so we can do with hashing out options.

How about a small variation on the --flow= suggestions ... or --wait (wait for catch and merge before continuing).

Yes so we should define the interface from a description of its behaviour from the user's perspective (your --wait being the opposite of --continue).

If we add another option to reflect the "overrun" dimension, say --join then we are back to CLI option (1).

  • --wait being the opposite of --continue.
  • --join being the opposite of --rerun.

(I had considered this opposite arrangement, however, it makes reflow the default which is definitely not a good idea!)

CLI options (1) & (2) are both viable, however, I don't see a good reason to mix them.

  • (1) does a better job of conveying the four spaces, making it easier for users to understand the ways in which the spaces are different and the ways in which they are similar.
  • (2 & 3) flatten the matrix (better if we don't agree on the model) reducing each to a one word description.

Yes, but my latest suggestion is even safer than having a "safest default": there is no default

I don't think this is a good idea, especially from the UI perspective where it would be confusing as heck. The best option is to come up with a consistent behaviour that can apply across the whole range.

I think that's option (4) (i.e. continue=no, rerun=no) (i.e. --wait --join with the syntax above).

The full range of options should only need to be known to the very small proportion of users who would actually want to use them (i.e. keep the complexity away from the general case).

@oliver-sanders
Copy link
Member Author

CLI option (2) single --flow argument.

# 1) reflow
cylc trigger --flow=new

# 2) continue
cylc trigger --flow=<TBC>

# 3) no-flow (implemented)
cylc trigger --flow=none

# 4) no-flow (proposed)
cylc trigger --flow=wait

CLI option (4) wait/join

Use the opposites of continue/rerun.

wait
Wait for catch and merge before continuing.

join
Tasks are considered to "belong" to other flows.

There are two options for the defaults:

a) Yes (wait and join default to "yes" so must be disabled manually)
b) No (wait and join default to "no" so must be enabled manually)

# 1) reflow
a) cylc trigger --wait=no --join=no  # or --no-wait --no-join
b) cylc trigger

# 2) continue
a) cylc trigger --wait=no  # or --no-wait
b) cylc trigger --join

# 3) no flow (implemented)
a) cylc trigger --join=no  # or --no-join
b) cylc trigger --wait

# 4) no-flow (proposed)
a) cylc trigger
b) cylc trigger --wait --join 

@hjoliver
Copy link
Member

@wxtim - "alt visualization", very nice 👍

@hjoliver
Copy link
Member

hjoliver commented Feb 22, 2022

@oliver-sanders -

I picked these awkward terms because otherwise we would have gone back around the terminology loop of the previous week again, please consider them as abstract concepts for which we can choose new labels.

Yes I do understand that now, it's just that I (naturally IMO!) initially assumed you meant we might need to workshop better names (in our context) for the concept "continue" interpreted in the normal sense of the word. But nevermind, we are now on the same page on that one 😁

Yes, but my latest suggestion is even safer than having a "safest default": there is no default

I don't think this is a good idea, especially from the UI perspective where it would be confusing as heck. The best option is to come up with a consistent behaviour that can apply across the whole range.

I agree that having no default is not ideal, especially for the (non-CLI) UI. But I'm not convinced yet that the any of the options constitute a safe default. Triggering tasks with or without reflow and with or without merging is more complex and more consequential than most (any?) other action. Taking cylc stop for example, the default makes perfect sense because you can change your mind at any point after issuing the command if you decide you really wanted a quick shutdown. You can't necessarily bail out like that if you've just triggered the wrong kind of action in terms of tasks or flows running.

The safest default is to trigger a one-off task, because that doesn't cause reflow and doesn't mess with existing flows. BUT that probably isn't what users will want most of the time.

@hjoliver
Copy link
Member

hjoliver commented Feb 22, 2022

CLI options (1) & (2) are both viable, however, I don't see a good reason to mix them.

I have a different spin on what you're calling mixing the options, and the "behaviour-driven" nature of option 1 (overrun/continue) and the four-way matrix.

I think a description in terms of the higher level "flow" concept explains those behaviours and in fact unifies two of them. And a good conceptual framework should always be easier to understand than a bunch of separate behaviours.


  1. A flow is a single run of the workflow propagating through the graph
    and Cylc 8 can support multiple flows

  2. Flows do not interact (unless they clash in n=0)
    so: flows overrun other flows, and they can be overrun by other flows

  3. When you manually trigger a task (outside of n=0) you can:

    1. run it as a one-off task, independent of any flow
      so: no continue, can be overrun
    2. or start a new flow
      so ("flows do not interact"): continue, can overrun, can be overrun
    3. or pre-run part of an existing flow or flows
      and you can choose to continue now or on catch-up, but either way it is the same flow
      so: existing flows will not overrun these tasks, because they already ran in this flow

The bits in bold are all that users need to understand this, and they are pretty clean and intuitive concepts.

So from a flow-centric perspective, under 3., there are only three fundamental options, not four as in your table; with a choice of continue-now or continue-on-catchup in the pre-run case.

A flow-centric perspective should not be problematic for users only interested in running one flow. They may still need (i) and (iii) and so they need to understand the difference between triggering a future task "as part of my flow" or "independent of my flow".

So this is obviously pretty close to your CLI option (2) single --flow argument, but there's one problem:

# 4) no-flow (proposed)
cylc trigger --flow=wait

We need to be to able to combine "wait" with different flow options: --flow=any or --flow=1,2 etc. Perhaps most of the time any can be assumed (or could be the default), but disallowing the flow-specific option would be a bold call.

@oliver-sanders
Copy link
Member Author

Yes, but my latest suggestion is even safer than having a "safest default": there is no default

I don't think this is a good idea, especially from the UI perspective where it would be confusing as heck. The best option is to come up with a consistent behaviour that can apply across the whole range.

I agree that having no default is not ideal, especially for the (non-CLI) UI. But I'm not convinced yet that the any of the options constitute a safe default

I think no-flow option (4) is fine.

Triggering a task is expected to have consequences, same for n=1 same for n=21, same for Cylc 7, same for Cylc 8. This consequence is obvious and easily understood, if you tell something to run, it will run, the fact it ran will persist.

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

Relating this back to the OP

That's helpful, thanks.

I'm still not sure if your conitinued use of the the original terminology is suggesting we stick with that to dexcribe and document the behaviour though? [UPDATE: 🤣 I just saw this below the image: These four trigger names are not a part of the proposal and should probably not make it past this issue 🤣]

In case you are, I think the terms reflow, continue and no-flow are too easily misinterpreted and need reworking.

  1. reflow: --flow=new
    So far, I've mostly been using "reflow" to refer to the general capability to flow onward from a manually triggered task.
    We/I probably should not use it that way, AND we should not say that a particular flow is or is not "a reflow" either. Instead:

    • Cylc supports multiple flows in the same graph
    • You can start new flows by manually triggering tasks
    • "reflow" is what happens if and when a flow traverses parts of the graph that already ran in a previous flow
    • A new flow is not exclusively "a reflow" or not. E.g. it might start as a reflow but then move out ahead into clean graph; and a flow might be "a reflow" with respect to one flow, but "a preflow" with respect to another.
  2. continue: [--flow=all --no-wait]
    (As previously discussed) "Continue" here means "continue immediately", but the implied opposite (to "not continue") could reasonably be expected to be (3) (--flow=none), not (4) which is better explained as "continue later" (on merge).

  3. no-flow: --flow=none
    Fine, a one-off task that does not start a flow

  4. no-flow: [--flow=all] --wait
    (As previously discussed) This should be described as triggering part of an existing flow early and continuing to flow on later (after catch-up and merge), and "no-flow" does not sound like a good name for that.

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

Clarification of the three flow "words":

Agreed. And n=0 flow numbers should do for --flow=all

See what you make of the three examples above, if agreed, happy to turn them into test cases.

Great way to illustrate what happens, in combination with @wxtim's diagrams.

One minor quibble on your description of case (3) ... related to "the missing --wait argument":

The triggered task runs, however, due to the --wait constraint, the graph does not run on from there.

I would say, the graph does not run on from there because --flow=none (it's a one-off task, not a flow).

And (sorry 😬 !) possibly a more fundamental issue with your proposed n<0 behaviour (which I now see is different to what I had thought you meant before)

Firstly, as an aside, the n=0 window is well defined, but n>0 and n<0 are not really, because the n=0 window is not tied to a contiguous section of the graph. Even with only a single active flow (e.g. after a trigger-no-wait operation) there can be triggerable tasks that have n=0 tasks in both directions in the graph from them.

Secondly, this (what you've labelled as) n<0 behaviour privileges the very first flow over all others, because the result (c.f. n>0) is determined by whether or not the triggered task ever ran before (in any flow). I don't like that much. Seems to me we should be able to make stuff happen somewhere in the graph without caring whether or not a flow went through it some time in the past.

  1. Continue
    The task "a" will get re-run by the trigger, however, the graph will not run on from there.

    flow:1
    a#1 (the naturally triggered run)
    a#2 (the manually triggered run)
    b
    c
    d

So if "a" had never run before, the default trigger (2 Continue) will cause "a" to flow on as part of flow 1.

But if "a" had ever run before, the default trigger (which is supposed to "continue") will behave exactly like --flow=none except that the flow label is different (1 vs none).

Instead, I think we should have consistent behaviour in terms of does the triggered task/flow merge with upcoming flows and continue or not regardless of where we trigger in the graph (still merging with existing flows by default, btw).

NOTE this is why my "to be implemented" post above was short and did not distinguish between "n<0" and "n>0" - according to my view of how this should work, that's not necessary. I had not gone down to the specifics of flow numbers there, but here's what I was thinking...

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

This case is straightforward (one-off task, no flow-on, no merge with existing flows):

cylc trigger --flow=none

And this case is straightforward (new flow, no merge with existing flows):

cylc trigger --flow=new  

But we differ here:

cylc trigger  # with or without --wait

The triggered flow should get:

  • existing n=0 flow numbers (so any upcoming flows will merge with it)
  • but not that of a previous flow (if there iare any) that already ran it
  • and a new flow number (in case there are no existing flows that have not used the task already)

That's it, and the default-trigger behaviour and result is the same before and after any flow.


A few comments:

If we have flow=1 and default-trigger a task in front of it:

  • flow=1,2 will continue immediately
  • flow=1 will merge when it catches up (in the DB)

If we have flow=1 and default-trigger a task behind it:

  • flow=2 will continue immediately
  • (if there were any flows behind it, it would have those flow numbers too and they would merge on catch-up ... but can't do that if only flow=1 exists at trigger time of course)

Then wherever we default-trigger a task (outside of n=0 of course):

  • it will flow on (unless we say not to)
  • any existing flows that catch up will merge with it

Other pros:

  • we don't have to attempt to distinguish n > or < 0 (which may be impossible)
  • and/or we don't have to treat flow=1 differently from other flows
  • this preserves the integrity of flows and makes the historical record easier to understand: if a task already ran (and completed it's expected outputs!) in flow=3, then that's flow=3 done for that task. You can trigger the task again, but it will belong to another flow (or to no flow at all).
  • I think this is better for "re-running" tasks or sub-graphs behind a flow. In general, if you trigger a task it will flow on by default; so if you trigger a task to re-run it you should expect that to flow on too, by default. And certainly that will be a common use case, no reason to expect that re-running a single task will be more likely wanted that re-running several dependent tasks.

(Note that adding a new flow number isn't strictly necessary except when there are no existing flows that have not run the task already ... in which case the user is definitely doing a re-run, and there is no following flow to merge with).

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Mar 9, 2022

@hjoliver it is not clear what you are proposing, please could you fill out the above examples with your desired behaviour and highlight where they differ.

You seem to be suggesting the rules for what flow numbers are provided by --flow=all differ depending on whether the task has run before or not in contradiction with:

Agreed. And n=0 flow numbers should do for --flow=all

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

You seem to be suggesting the rules for what flow numbers are provided by --flow=all differ depending on whether the task
has run before or not in contradiction with:

Agreed. And n=0 flow numbers should do for --flow=all

Not really, I'm saying current active flows (i.e. those in n=0) should be sufficient, c.f. all flows recorded in the DB.

With the small caveat (which is probably what caused the confusion here, sorry) that we should exclude flow numbers of flows that have already passed through the triggered task. That is what allows the default trigger to re-run a sub-graph (say) behind a flow (because the triggered task will not take the flow number of the flow that we are re-running, even if that flow number still exists in n=0).

please could you fill out the above examples with your desired behaviour and highlight where they differ.

OK, I'll try to do that now, since we desperately need to lay this one to rest. I wonder if this is gonna end up the longest single issue page on the project :-)

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

suggesting the rules for what flow numbers are provided by --flow=all differ depending on whether the task
has run before or not

Also, I'd say the rules are the exactly same in both cases, it's just that in the never-ran-before case there is no previous flow number to exclude.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Mar 9, 2022

we should exclude flow numbers of flows that have already passed through the triggered task

So if there is only one flow in the workflow the task will not run at all.

If there are multiple flows in the workflow the "continue" trigger will result in a reflow irrespective of whether the other flow(s) are ahead or behind of the original?

Examples would be great.

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

So if there is only one flow in the workflow the task will not run at all.

No, see this comment:

and a new flow number (in case there are no existing flows that have not used the task already)

@oliver-sanders
Copy link
Member Author

Ok, so this effectively changes to default to reflow for historical tasks.

I would much prefer for reflows to require users to opt-in in all cases because the consequences of reflow on users data are quite dangerous and reflow (and multiple flows in general) are way beyond what we can expect of the working knowledge of the vast majority of users.

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

If there are multiple flows in the workflow the "continue" trigger will result in a reflow irrespective of whether the other flow(s) are ahead or behind of the original?

(See my terminology comments above on what exactly "reflow" means)

So I think "the continue trigger" should, by definition, "continue", which means a flow should carry on from the triggered task.

The main thing, which we agreed on, is that by default that continuing flow should not get overrun by any existing flows (and I'm not arguing with that).

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

Ok, so this effectively changes to default to reflow for historical tasks.

Meh, sort of. My way is simpler from a consistency perspective (same behaviour on triggering a task, whether or not it ever ran before), and I think what matters and is easier to understand is whether the triggered task flows on or not. The fact that flowing on after triggering an n>0 task is not technically a "reflow" will be lost on most users. It will look like a new flow to them (now we have the original flow, and this new one from where I triggered a task) ... the fact that it happens to have the right flow numbers so that the original flow won't overrun it on catch-up, or that it is "not a reflow" because those tasks never ran before, is secondary.

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

And my other related point is that if you are triggering a past task to re-run it, you are just as likely to want it to flow on (the regenerate some products use case), as opposed to running a single task.

The re-run a single task case seems to me to be best expressed by non-default --flow=none option. For two reasons: 1) you want to trigger a single task, not a flow; and 2) my "flow integrity" argument above: a flow is a self-perpetuating run through the graph, and the previous flow already passed by ... so why should the re-triggered task have the same flow number?

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

I would much prefer for reflows to require users to opt-in in all cases because the consequences of reflow on users data are quite dangerous and reflow (and multiple flows in general) are way beyond what we can expect of the working knowledge of the vast majority of users.

I don't disagree that "reflow is dangerous" in the sense that it re-runs tasks and that will probably overwrite existing data. However:

  1. re-running a single task with no flow-on does that too; if you re-run anything you have to be aware of that consequence
  2. the graph shows what is supposed to happen downstream of any task, so it should not be very surprising if that happens unless you tell it not to. It is not so uncommon for Cylc 7 users to expect it to happen and then to struggle to understand how to make it happen via the nightmare of cylc inserting multiple waiting tasks in the right order.
  3. I don't think we should significantly complicate the conceptual flow model by going to lengths to avoid reflow

At least I think we probably both understand where the other is coming from now.

Because I was focused more on consistent triggering behaviour, when you agreed to go back to the no-wait default I thought that applied equally to future and past tasks. i.e. no-wait in front of flow=1 means "flow on now" (with all current flow numbers that could catch up and merge); and no-wait behind flow=1 means exactly the same thing.

Both generate a new flow front. The fact that one case involves re-running past tasks should be blindingly obvious to users because they deliberately triggered a task that already ran.

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

If you're not coming around to my perspective (which again, makes for simpler, consistent triggering behaviour and does not treat flow=1 as magic [SPECIAL]) then I suppose one way out of this bind is to revert to "wait" as a default. I'd rather not do that because a) it artificially constrains the workflow; and b) if it behaves as you want for re-running tasks, it makes the "wait" concept harder to understand (easy: wait for existing flows to catch up before continuing; weird: if only flow=1 exists and we trigger behind it, what are we "waiting" for??)

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

Example 1 (n>0)

(SAME RESULT in all cases)

Example 2 (n<0)

1) Reflow

SAME RESULT (A new flow is started which overruns the previous flow.)

2) Continue

DIFFERENT RESULT: same as 1) Reflow

The task "a" will get re-run by the trigger, and the graph WILL run on from there (that's what "continue" and "no wait" means)

3) No Flow (implemented)

SAME RESULT

4) No Flow (proposed)

DIFFERENT RESULT: still same as type (2), but now that is the same as reflow rather than no-flow

@oliver-sanders
Copy link
Member Author

If you're not coming around to my perspective (which again, makes for simpler, consistent triggering behaviour and does not treat flow=1 as magic)

Disagree on "simpler", "consistent" and "magic" 😁.

You're not winning me over I'm afraid. I see your points, but I don't agree with them. Since the start I've maintained that defaulting to reflow is dangerous and that all reflow functionality (and all its complex consequences e.g. no-flow) should be opt-in.

You are proposing that --flow=all can actually mean, all flows OR all flows and a new one minus an existing one OR a just new flow, which isn't especially consistent.

If I understand correctly what you are proposing does not add any new functionality, it just changes the default. If so my interpretation covers all bases, but if you want a reflow you must manually say so.

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

You are proposing that --flow=all can actually mean, all flows OR all flows and a new one minus an existing one

That's kind of a misrepresentation because it ignores the definition of flow. A flow is a self-consistent self-perpetuating run through the graph. If a flow has passed by a task, retriggering it should be considered a new flow (or a one-off no-flow), because by definition that task has already run in that flow. You are saying, give the task the same flow number it had before but run it anyway, even though it has already run in that flow.

OR a just new flow, which isn't especially consistent.

My consistency is at the conceptual level. When you trigger a task, any task, does it flow on or not. This supposed inconsistency is down at the level of flow numbers which is really an implementation detail that we use to make the required behaviours work.

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

If I understand correctly what you are proposing does not add any new functionality, it just changes the default. If so my interpretation covers all bases, but if you want a reflow you must manually say so.

That's right, but we are coming from two different flow models (in a sense). By my conceptual model (which I'm claiming is simpler) your default is different behind the first flow than it is in front of it. (And it doesn't even seem to make sense with respect to the names that you gave the options: behind flow=1 the "continue" / no-wait default does not actually continue anything.)

@oliver-sanders
Copy link
Member Author

I don't think we are going to get anywhere with this, suggest another call.

@oliver-sanders
Copy link
Member Author

(otherwise it's going to be another ten pages of reply, quote and response)

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

Yep, can do 👍

@hjoliver
Copy link
Member

hjoliver commented Mar 9, 2022

OK, meeting done. Result: I concede defeat. 💥 Reasons, for the record:

  • I took "historical task" above to mean "ran previously in any flow", which makes flow=1 special (i.e. once flow=1 passes by, a task instantly becomes "historical" and stays that way) ... BUT we will only consider active flows at trigger time
  • I will give up on the absolute sanctity of a "flow" as (once triggered) a self-perpetuating run through the graph, because re-triggering a task with the old flow number does seem to provide the cleanest solution for re-running to hit a different branch.
  • And having given up on that, the pre-run vs re-run behaviour (in terms of flowing on or not after triggering) at least makes sense in terms of the assigned old flow number
  • And last but not entirely least:
    • this whole discussion is only about what should be the default behaviour
    • and this is a reasonably democratic project 😁

Also, on terminology:

  • We agreed not to describe a flow as "a reflow" (or "not a reflow") for the reasons given above. The term may still be OK in this limited sense: if/when a flow overruns another flow it could be said to be "reflowing" that particular region of the graph.
  • We need a good way to describe the two different flow merge concepts: 1) flows with a common flow number will not overrun the same graph nodes; and 2) any flows will merge if they "collide" in n=0 ... (maybe "collision" is a good term for that).
  • I wasn't keen on --wait (or --no-wait) as an option name or concept because when re-triggering behind a flow you would almost never want to wait for an upcoming flow to merge and then continue. However:
    • that objection no longer applies if we re-use the old flow number (which prevents reflow without requiring --wait)
    • and even if it's not relevant most of the time, "wait for merge" does at least describe what will happen (by default) if an existing flow does catch up to a triggered task

@hjoliver
Copy link
Member

hjoliver commented Mar 10, 2022

The final result then, for implementation.

(@oliver-sanders' explicit examples above are all valid and useful, and should be made into tests, but I think we can ditch the four-way categorization at this point).

Trigger Active Flows

cylc trigger [--wait]

The triggered task runs with the set of all active (n=0) flow numbers, A

  • it will flow on if its children have not already been spawned in any member of A
    • default: immediate flow-on
    • --wait: flow on if/when members of A catch up and merge with it
    • the triggered flow will merge with any member of A that catches it, or that it catches
    • it will not merge with other flows (unless it is in the active set for a subsequent trigger event)
  • otherwise (if children already spawned in any member of A) it will not flow on
    • (in which case --wait is meaningless)

(It gets a bit gnarly to list exactly what happens when triggering ahead of all flows, behind all flows, and between flows ... but we don't need to do that here as it's all derivable from the above).

Trigger Specific Flows

cylc trigger --flow=1,2 [--wait]

The triggered task runs with the specified set of flow numbers, S = {1,2}

  • (As for active flows, with A replaced by S)
  • (Niche power tool for experts, if needed)

Trigger a New Flow

cylc trigger --flow=new

The triggered task runs with a new flow number, not in the set of active flows A (or any previous flow in fact).

  • it flows on immediately
  • it will not merge with any member of A that catches it, or that it catches
  • it will not merge with other flows (unless it is in the active set for a subsequent trigger event)
  • (--wait is meaningless)

Trigger No Flow

cylc trigger --flow=none 

The triggered task runs with a "none" flow number.

  • one-off task run, no flow-on
  • it will not merge with any flow, ever
  • (--wait is meaningless)

@oliver-sanders oliver-sanders removed the question Flag this as a question for the next Cylc project meeting. label Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants