Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trigger: no-flow triggered tasks should always merge whether still in the pool or not #4657

Closed
oliver-sanders opened this issue Feb 4, 2022 · 24 comments
Labels
bug? Not sure if this is a bug or not question Flag this as a question for the next Cylc project meeting. superseded
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Feb 4, 2022

This is a condensation of concerns raised with the cylc trigger logic on #4651 a different formulation of which can be found in #4653.

Outline:

  • When a task that has not yet run is triggered via cylc trigger [without --reflow] it is not assigned to a flow but will merge with the first flow that catches it.
  • If this task is still in the pool by the time the scheduler window catches up with it it merges into the flow that caught it.
  • If the task has exited the pool before the scheduler window catches up with it the task is forgotten and does not merge.

Is this analysis correct? Yes

Problem:

  • This creates a nasty situation where a task is either run once or twice, depending on the evolution of the scheduler during the runtime of the task.
  • This makes it hard for users to understand the consequences of the trigger before they issue it.
  • Tasks triggered ahead of the scheduler window will always be run twice, which is highly unlikely to be what the user wanted.

Do we agree that this is a problem? No

Desired Behaviour:

  • Either the triggered task should always merge, or it should never merge, it shouldn't be able to go either way.
  • Task triggering is not supposed to be a reflow feature, users should be able to use it without understanding reflow.

Do we agree that this is the desired behaviour? No

@oliver-sanders oliver-sanders added the question Flag this as a question for the next Cylc project meeting. label Feb 4, 2022
@oliver-sanders oliver-sanders added this to the cylc-8.0rc1 milestone Feb 4, 2022
@oliver-sanders oliver-sanders added the bug? Not sure if this is a bug or not label Feb 7, 2022
@hjoliver
Copy link
Member

hjoliver commented Feb 7, 2022

Is this analysis correct? Is it missing anything?

Correct!

This creates a nasty situation where a task is either run once or twice, depending on the evolution of the scheduler during the runtime of the task.

That's not nasty. It's a natural consequence of flow merging (which is a necessity until flow number is in the UID). Consider two ongoing flows, one behind the other. Every task ahead of the second flow (the one that is catching up) will either run twice or once, depending on exactly when/if catch-up occurs. A task triggered without --reflow is not a flow but it still has to reside in the task pool while active.

This makes it hard for users to understand the consequences of the trigger before they issue it.

If you trigger a task ahead of the main flow:

  • it will only run once if the main flow catches up before it finishes
    • (because Cylc can't support two instances of the same point/name active at once)
  • otherwise it will run twice (once when you trigger it, once when the main flow passes over it)

That's not hard to understand is it?

Tasks triggered ahead of the scheduler window will always be run twice,

Yes (again, unless we implement cylc trigger --flow=1)

which is highly unlikely to be what the user wanted.

My take was, if the dependencies encoded in the graph are meaningful then it seems unlikely that triggering future tasks as part of the main flow is going to be wanted much. Which is not to say never; in which case we should do #4653

But I do know it's quite common to test changes to tasks by manually triggering individual tasks in a running workflow, as an easy cheat (c.f. deploying a new instance of the whole workflow just to do that). And if doing that, it's best to trigger a future or past instance to avoid any confusion or clash with the running flow. (This is essentially the old cylc submit use case).

@hjoliver
Copy link
Member

hjoliver commented Feb 8, 2022

Regarding the title:

no-flow triggered tasks should always merge whether still in the pool or not

"Flow merge" has so far meant what happens (necessarily, until the UID has flow in it) when two instances of the same task, from different flows, want to be in the active pool at the same time.

By that definition, flow merging can't occur "whether still in the pool or not" (specifically, the or not bit).

So presumably you mean (as I've assumed above) that the triggered task should be be triggered as part of the main flow, rather than as a "no flow" task. Then, the main flow will stop at that task, whether or not it is still in the pool.

Note however, this amounts to triggering a new ongoing flow with the same flow number. It doesn't make much sense to trigger a standalone future task as part of the upcoming flow, because under spawn-on-demand spawning is local, and we don't want the flow to halt completely when it runs into the triggered task (or where it was in the graph, it if finished already).

@hjoliver
Copy link
Member

hjoliver commented Feb 8, 2022

Long story short, cylc trigger --flow= will solve your problem?

It remains to be seen if we can automatically assign the flow number in some circumstances (e.g. when there's only one flow present).

@hjoliver
Copy link
Member

hjoliver commented Feb 8, 2022

Desired Behaviour:

Either the triggered task should always merge, or it should never merge, it shouldn't be able to go either way.

"Never merge" is not possible, because of the UID problem. If a task in a flow catches up to another instance of itself in n=0 (whether that instance is flow or no-flow) we have to merge them.

"Always merge" is not possible either (going by what "merge" means as discussed above).

Task triggering is not supposed to be a reflow feature, users should be able to use it without understanding reflow.

They way I look at it, it's not:

  • Triggering without --reflow is independent of any re/flows unless the main flow catches up while the triggered task is still running, and we can't avoid that (and it's not hard to understand - do users really want two instances of the exact same task running at the same time?)

Triggering ahead with --reflow however (whether as part of the main flow, via --flow=, or not) is by definition using the reflow concept (in the sense of manually triggering a task from which the flow continues on).

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 8, 2022

That's not hard to understand is it?

I think that is quite hard to understand, at least I find it confusing and surprising, hence I reported it as a bug. If the behaviour is unintuitive to someone who works on the project...

This just feels needlessly complicated to me. Users should not need to understand reflows (no-flows are related to reflows IMO) or merging conditions to use cylc trigger which should be a standalone concept.

See also retire "task pool" as a user facing concept.

I think partially satisfied prerequisites and incomplete outputs might further convolute this?

My take was, if the dependencies encoded in the graph are meaningful then it seems unlikely that triggering future tasks as part of the main flow is going to be wanted much

Perfectly reasonable, however, dependencies are not always meaningful and not all tasks have dependencies (the original reported issue and the orphan task use cases I contributed).

If a user is triggering a task with dependencies ahead of time and that task is going to stand a reasonable chance of doing something meaningful then it's reasonable to presume the dependencies aren't critical to its functioning.

"Always merge" is not possible either

Pretty sure it is possible and probably quite easy, happy to implement.

No-flow tasks sit outside of the reflow model and as such are special anomalies.

It doesn't matter whether a no-flow task is still in the pool or not we can easily implement the logic to merge it with the first flow that catches it. We are already performing a DB check (almost) every time we spawn a task, we can easily check if it's a no-flow task and apply the merge condition at a later date (which I did on my original branch).

Experimenting with task modifications the lazy way: by triggering "out of the way" tasks in your running workflow instead of deploying another instance of the whole workflow

Reasonable use case, however, I think that's something you should have to ask for, not the default behaviour.

@oliver-sanders
Copy link
Member Author

Proposal

If we can't not-merge (wouldn't make sense anyway) then the only options are to sometimes merge or to always merge. I think always merge is a simpler, safer and more intuitive default.

  • Keep the no-flow triggering approach.
    • Nothing wrong with this, makes sense, no-flow is a sensible marker.
  • Check the DB so that triggered tasks merge the same irrespective of how the graph evolves whilst the task is running.
    • Safe & simple, no need to explain flow merging conditions to users of cylc trigger.
    • By default a trigger will work the same for all future cases.
  • Maintain the current behaviour as an option.
    • E.G. cylc trigger --one-off or cylc trigger --reflow=-1 (internally set the flow num to -1 or something like that).
    • This option switches off the merge condition for these tasks effectively disabling the DB check.
    • This preserves the whole "task may or may not merge" issue, however, it's now something you must explicitly ask for.

@hjoliver
Copy link
Member

hjoliver commented Feb 9, 2022

No-flow tasks sit outside of the reflow model and as such are special anomalies.

Yes, but I think users quite frequently want to retrigger a single past task. That's pretty much all they could do in Cylc 7, after all, without an elaborate task insertion procedure to set up a "manual reflow". (Or if they got lucky and the downstream tasks were still in the task pool).

@hjoliver
Copy link
Member

hjoliver commented Feb 9, 2022

If a user is triggering a task with dependencies ahead of time and that task is going to stand a reasonable chance of doing something meaningful then it's reasonable to presume the dependencies aren't critical to its functioning.

True, that's why I said "not wanted much" (as opposed to "never" 😁 )

See also #3748. [Retire task pool as a user-facing concept]

I think partially satisfied prerequisites and incomplete outputs might further convolute this?

Yes. Essentially it's complicated by waiting tasks that have to be assigned a flow already. However, this is still beyond simple compared to the nasty Cylc 7 task pool concept. Here, the n=0 window contains active tasks, incomplete tasks, and "active waiting" tasks, the latter being simply "tasks that are ready to run but are held back for now by a limiting mechanism" (oh, and partially satisfied tasks - which also have to have a flow assigned already, as the prerequisites are flow-dependent). Those are essentially implementation independent concepts (however, if we can do even better, all good...)

@hjoliver
Copy link
Member

hjoliver commented Feb 9, 2022

OK I'm wondering if this is where our respective wires got crossed:

I've always thought of reflow as the ability to start multiple independent flows in different parts of the graph, whether past or future relative to the ongoing (or stopped) initial flow.

  • with the proviso that (until Support non-merging flows #4419 flows in UID) we can't support total independence and therefore (for now) have to merge if the same task ID gets spawned at the same time in n=0
  • and a triggered no-flow task is just a single-task subset of a flow, which doesn't need a flow number because it doesn't spawn children
  • and if you want to trigger a future task as part of an existing flow you can use trigger --flow= (not implemented yet, but the set-outputs case is). (And note, this would prevent the triggered tasks from running twice).

Whereas you seem to be suggesting flows should always merge if they cover the same part of the graph - not just if they clash in n=0 ??? Or do you mean only for a flow that overruns a triggered no-flow task (that finished before the flow got there)?? (which is after all is the topic of this issue!). If the latter, I don't have a problem with that, it's just a minor tweak to the model I've just described (and in fact it's the same as the last bullet point except that you presumably don't want to have to use --flow= by default?

Otherwise (if you're talking about ongoing flows merging into flows without actually conflicting in n=0 then this would seemingly prevent multiple flows at all, if they cover the same parts of the graph at different times, which would rule out what's likely the primary use case for reflow, i.e. regeneration of some past products.

But maybe I've misunderstood ... I'm trying to digest this while sitting in a conference!!

[UPDATE! If you do just mean flow -> no-flow task, then we're probably good to go; I may have just over-interpreted what you were saying because of my bullet point 3 above: to me, a no-flow task is a single-task flow (that doesn't need a flow number) and (so far) they get treated the same as any flow, in terms of merging behaviour.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 9, 2022

I think we are on the same page!

Providing that's right, I think we can bump this to RC2.

I've always thought of reflow as the ability to start multiple independent flows in different parts of the graph, whether past or future relative to the ongoing (or stopped) initial flow.

I agree.

The confusion may have arisen because I have likened no-flow tasks to a special case of reflow where the spawning logic is disabled.

Or do you mean only for a flow that overruns a triggered no-flow task (that finished before the flow got there)?? (which is after all is the topic of this issue!).

Yes.

If the latter, I don't have a problem with that, it's just a minor tweak to the model I've just described (and in fact it's the same as the last bullet point except that you presumably don't want to have to use --flow= by default?

Yes.

@oliver-sanders
Copy link
Member Author

Here are some examples of what I would consider to be the "desired behaviour" to make sure we're on the same page:

In this example:

a(running) => b(runahead) => c => d => e
  • If I triggered "b", I would expect "b" to run once and only once.
  • If I triggered "c", I would expect "c" to run once and only once.
  • If I triggered "d", I would expect "d" to run once and only once.
  • If I triggered "e", I would expect "e" to run once and only once.

In this example:

1/a(running) => 2/a(runahead) => 3/a => 4/a => 5/a
  • If I triggered "2/a", I would expect "2/a" to run once and only once.
  • If I triggered "3/a", I would expect "3/a" to run once and only once.
  • If I triggered "4/a", I would expect "4/a" to run once and only once.
  • If I triggered "5/a", I would expect "5/a" to run once and only once.

In this example:

1/a(running)
2/a(runahead)
3/a
4/a
5/a
  • If I triggered "2/a", I would expect "2/a" to run once and only once.
  • If I triggered "3/a", I would expect "3/a" to run once and only once.
  • If I triggered "4/a", I would expect "4/a" to run once and only once.
  • If I triggered "5/a", I would expect "5/a" to run once and only once.

@hjoliver
Copy link
Member

hjoliver commented Feb 10, 2022

OK I think I understand you now 🎉

I was worried about wider implications because I thought you meant the same for flows as for single triggered tasks (because to me, so far, a triggered no-flow task is the same as a flow, except that it doesn't flow onward from the trigger point).

So I agree that we need to support flows merging with manually triggered future tasks (in your sense of "merge", not just an n=0 merge as it has meant so far in SoD implementation an documentation).

The difference is, my intention was to support this as an option, for a specific flow, via cylc trigger --flow=, whereas you apparently think it is the desirable default behaviour without specifying which flow(s) should merge with the task.

The thing I'm not sure about is how that would work when there can be multiple flows (i.e. not just the original flow and individual tasks triggered out in front of it).

If the original flow passes by (and does not re-run) a triggered task X, then later on a second flow passes over the same location in the graph, presumably the second flow should "re-run" that task. How does the second flow know that X was only meant to belong to the first flow? Presumably you are not suggesting that any subsequent flow should not repeat-run the task?

@hjoliver
Copy link
Member

hjoliver commented Feb 10, 2022

Thinking of how this might work:

If I manually trigger a task (without the --reflow option):

  • it gets tagged in the DB with an explicit zero flow number as a no-flow task
  • the first flow that overruns the task takes possession of it, by replacing the zero with its flow number (retroactive DB update)
  • then to subsequent flows, the task will look like it always belonged to that first flow (and they will not merge with it)

Note we have to use an explicit no-flow indicator (such as a zero flow number) to handle multi-flow scenarios. The triggered task might be a future task with respect to the flow of interest (flow 2 say) but already ran in an early flow (1 say). Then, before manual triggering it will have flow number '{1}' (in the DB); at triggering at will become {0, 1}; and when flow 2 catches up to it, it will become {1, 2} (at which point it ceases to be a no-flow task for any further flows that come along).

What do you think?

@oliver-sanders
Copy link
Member Author

What do you think?

I think that's exactly what I was proposing.

(although I wasn't considering preserving an explicit no-flow number so {1} rather than {0, 1})

The thing I'm not sure about is how that would work when there can be multiple flows

The first flow to catch the triggered task takes possession of it (i.e. give it its flow number). This should be easy to implement, more exotic behaviour could be provided with the --flow option.

Example 1

before: task triggered before the flow approaches

  • flow:None
    • c(succeeded) - the triggered task
  • flow:1
    • a(running)

later: the first flow catches the task (it is not rerun in the first flow), a new flow is started

  • flow:1
    • a(succeeded)
    • b(succeeded)
    • c(succeeded) - the triggered task
    • d(ducceeded)
  • flow:2
    • a(running)

after: the subsequent flow does not interact with the task because it belongs to the first flow

  • flow:1
    • a(succeeded)
    • b(succeeded)
    • c(succeeded) - the triggered task
    • d(ducceeded)
  • flow:2
    • a(succeeded)
    • b(succeeded)
    • c(succeeded)
    • d(succeeded)

Example 2

before: task triggered after the first flow has already passed it, a new flow is started

  • flow:None
    • c(succeeded) - the triggered task
  • flow:1
    • a(succeeded)
    • b(succeeded)
    • c(succeeded)
    • d(succeeded)
  • flow:2
    • a(running)

after: a subsequent flow catches the triggered task (it is not rerun in the subsequent flow)

  • flow:1
    • a(succeeded)
    • b(succeeded)
    • c(succeeded)
    • d(succeeded)
  • flow:2
    • a(succeeded)
    • b(succeeded)
    • c(succeeded) - the triggered task
    • d(succeeded)

Example 3 - the edge case

This leaves us with one niche edge case where a task is triggered normally:

$ cylc trigger 1/c
$ sleep 5  # task has run and succeeded
  • flow:None
    • c#1(running)
  • flow:1
    • a(running)
  • flow:2

Then subsequently reflow triggered before another it has been caught:

$ cylc trigger 1/c --reflow

In which case we could either, not merge the task with the new flow allowing it to merge with the next flow that catches it:

  • flow:None
    • c#1(running)
  • flow:1
    • a(running)
  • flow:2
    • c#2(running)

Or merge it as a previous submission of the newly reflow-triggered task:

  • flow:None
  • flow:1
    • a(running)
  • flow:2
    • c#1(running)
    • c#2(running)

It's pretty niche, happy either way.

@hjoliver
Copy link
Member

hjoliver commented Feb 11, 2022

(although I wasn't considering preserving an explicit no-flow number so {1} rather than {0, 1})

Yeah on reflection we can stick with the current no-flow-number approach.

@hjoliver
Copy link
Member

hjoliver commented Feb 11, 2022

I don't quite understand your example 3. Assuming c#1 means job-submit no.1 of task c, we can't have c#1(running) and c#2(running) at the same time, as shown in two of your bullet-point lists, because they can't coexist in the scheduler task pool.

@hjoliver
Copy link
Member

hjoliver commented Feb 11, 2022

For anyone trying to follow this:

  • We're on the same page with regard to the functionality: we need to support triggering a future task as part of an upcoming flow, so that the triggered task doesn't get re-run when the flow gets to that place in the graph
  • We just need to nut out whether that can be done in general (/by default) without having to use explicit trigger --flow=

@hjoliver
Copy link
Member

hjoliver commented Feb 11, 2022

The implicit approach has one interesting complication, which shows it would be good to have trigger --flow as option as well, even for non-"exotic" cases.

Spawning "on demand" is local in the graph. i.e. each task spawns its own children, as part of its own flow, as it completes its outputs. So:

  1. cylc trigger 3/c
  • 3/c can't spawn its children as it runs because it doesn't belong to a flow yet
  • we have to retroactively spawn on its outputs once the flow catches up and "possesses" it
  1. cylc trigger --flow=2 3/c
  • 3/c can spawn its children normally, as it runs
  • the children may also be able to run (and flow on), depending, even before the original part of flow 2 catches up

Both approaches could be useful.

But I suppose it is reasonable to have 1. as the default [for pre-running a task ahead of a flow, at least].

@hjoliver
Copy link
Member

hjoliver commented Feb 11, 2022

Another complication:

I think it's fair to say that when triggering a task in front of a flow , users will most often want the following flow to not rerun the task. [Let's call this pre-running a task]

But the opposite is true for triggering a task behind a flow (which is likely to be more common in fact). Here, users are most likely wanting to re-run a task in the past flow, not to pre-run a task in a future flow. (This is what we have on master now; the triggered task has no flow number, but a following flow will not possess it).

Unfortunately, if the triggered task has ever run before we can't be sure if the user wants a pre-run or a re-run.

@hjoliver
Copy link
Member

hjoliver commented Feb 11, 2022

So it seems to me we need something like this:

  • cylc trigger --re-run ID
    • re-run a single task from previous flow; ignored by the following flow
  • cylc trigger --pre-run ID
    • pre-run a task; the following flow will merge with it and retroactively spawn on its outputs
  • cylc trigger --pre-run --flow=2 ID
    • pre-run a task as part of flow 2; children can be spawned as it runs
    • this might be advantageous even if flow 2 is the only upcoming flow (children might be able to run earlier)
  • it is an error to not specify one of --pre-run or --re-run
    • because we can't automatically infer which the user wants, in general (unless the only flow present is {1})

This is foolproof and easy to understand.

Users must know if they are pre-running a future task or re-running a past one.

Note this does not require an understanding of reflow! (unless of course the user has already triggered multiple flows, or they voluntarily user --flow=)

@oliver-sanders
Copy link
Member Author

I don't quite understand your example 3. Assuming c#1 means job-submit no.1 of task c, we can't have c#1(running) and c#2(running) at the same time

Ach, sorry, that should be c#1(succeeded) and c#2(running).

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 11, 2022

Have had a chat with @dpmatthews about this today, I think there's a generalisation to be had here.

Don't have the to write up today, will do Monday.

@oliver-sanders
Copy link
Member Author

It is done: #4686

@oliver-sanders
Copy link
Member Author

Closing as superseded by #4686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug? Not sure if this is a bug or not question Flag this as a question for the next Cylc project meeting. superseded
Projects
None yet
Development

No branches or pull requests

2 participants