-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trigger: no-flow triggered tasks should always merge whether still in the pool or not #4657
Comments
Correct!
That's not nasty. It's a natural consequence of flow merging (which is a necessity until flow number is in the UID). Consider two ongoing flows, one behind the other. Every task ahead of the second flow (the one that is catching up) will either run twice or once, depending on exactly when/if catch-up occurs. A task triggered without
If you trigger a task ahead of the main flow:
That's not hard to understand is it?
Yes (again, unless we implement
My take was, if the dependencies encoded in the graph are meaningful then it seems unlikely that triggering future tasks as part of the main flow is going to be wanted much. Which is not to say never; in which case we should do #4653 But I do know it's quite common to test changes to tasks by manually triggering individual tasks in a running workflow, as an easy cheat (c.f. deploying a new instance of the whole workflow just to do that). And if doing that, it's best to trigger a future or past instance to avoid any confusion or clash with the running flow. (This is essentially the old |
Regarding the title:
"Flow merge" has so far meant what happens (necessarily, until the UID has flow in it) when two instances of the same task, from different flows, want to be in the active pool at the same time. By that definition, flow merging can't occur "whether still in the pool or not" (specifically, the or not bit). So presumably you mean (as I've assumed above) that the triggered task should be be triggered as part of the main flow, rather than as a "no flow" task. Then, the main flow will stop at that task, whether or not it is still in the pool. Note however, this amounts to triggering a new ongoing flow with the same flow number. It doesn't make much sense to trigger a standalone future task as part of the upcoming flow, because under spawn-on-demand spawning is local, and we don't want the flow to halt completely when it runs into the triggered task (or where it was in the graph, it if finished already). |
Long story short, It remains to be seen if we can automatically assign the flow number in some circumstances (e.g. when there's only one flow present). |
"Never merge" is not possible, because of the UID problem. If a task in a flow catches up to another instance of itself in n=0 (whether that instance is flow or no-flow) we have to merge them. "Always merge" is not possible either (going by what "merge" means as discussed above).
They way I look at it, it's not:
Triggering ahead with |
I think that is quite hard to understand, at least I find it confusing and surprising, hence I reported it as a bug. If the behaviour is unintuitive to someone who works on the project... This just feels needlessly complicated to me. Users should not need to understand reflows (no-flows are related to reflows IMO) or merging conditions to use See also retire "task pool" as a user facing concept. I think partially satisfied prerequisites and incomplete outputs might further convolute this?
Perfectly reasonable, however, dependencies are not always meaningful and not all tasks have dependencies (the original reported issue and the orphan task use cases I contributed). If a user is triggering a task with dependencies ahead of time and that task is going to stand a reasonable chance of doing something meaningful then it's reasonable to presume the dependencies aren't critical to its functioning.
Pretty sure it is possible and probably quite easy, happy to implement. No-flow tasks sit outside of the reflow model and as such are special anomalies. It doesn't matter whether a no-flow task is still in the pool or not we can easily implement the logic to merge it with the first flow that catches it. We are already performing a DB check (almost) every time we spawn a task, we can easily check if it's a no-flow task and apply the merge condition at a later date (which I did on my original branch).
Reasonable use case, however, I think that's something you should have to ask for, not the default behaviour. |
ProposalIf we can't not-merge (wouldn't make sense anyway) then the only options are to sometimes merge or to always merge. I think always merge is a simpler, safer and more intuitive default.
|
Yes, but I think users quite frequently want to retrigger a single past task. That's pretty much all they could do in Cylc 7, after all, without an elaborate task insertion procedure to set up a "manual reflow". (Or if they got lucky and the downstream tasks were still in the task pool). |
True, that's why I said "not wanted much" (as opposed to "never" 😁 )
Yes. Essentially it's complicated by waiting tasks that have to be assigned a flow already. However, this is still beyond simple compared to the nasty Cylc 7 task pool concept. Here, the |
OK I'm wondering if this is where our respective wires got crossed: I've always thought of reflow as the ability to start multiple independent flows in different parts of the graph, whether past or future relative to the ongoing (or stopped) initial flow.
Whereas you seem to be suggesting flows should always merge if they cover the same part of the graph - not just if they clash in Otherwise (if you're talking about ongoing flows merging into flows without actually conflicting in But maybe I've misunderstood ... I'm trying to digest this while sitting in a conference!! [UPDATE! If you do just mean flow -> no-flow task, then we're probably good to go; I may have just over-interpreted what you were saying because of my bullet point 3 above: to me, a no-flow task is a single-task flow (that doesn't need a flow number) and (so far) they get treated the same as any flow, in terms of merging behaviour. |
I think we are on the same page! Providing that's right, I think we can bump this to RC2.
I agree. The confusion may have arisen because I have likened no-flow tasks to a special case of reflow where the spawning logic is disabled.
Yes.
Yes. |
Here are some examples of what I would consider to be the "desired behaviour" to make sure we're on the same page: In this example:
In this example:
In this example:
|
OK I think I understand you now 🎉 I was worried about wider implications because I thought you meant the same for flows as for single triggered tasks (because to me, so far, a triggered no-flow task is the same as a flow, except that it doesn't flow onward from the trigger point). So I agree that we need to support flows merging with manually triggered future tasks (in your sense of "merge", not just an The difference is, my intention was to support this as an option, for a specific flow, via The thing I'm not sure about is how that would work when there can be multiple flows (i.e. not just the original flow and individual tasks triggered out in front of it). If the original flow passes by (and does not re-run) a triggered task X, then later on a second flow passes over the same location in the graph, presumably the second flow should "re-run" that task. How does the second flow know that X was only meant to belong to the first flow? Presumably you are not suggesting that any subsequent flow should not repeat-run the task? |
Thinking of how this might work: If I manually trigger a task (without the
Note we have to use an explicit no-flow indicator (such as a zero flow number) to handle multi-flow scenarios. The triggered task might be a future task with respect to the flow of interest (flow 2 say) but already ran in an early flow (1 say). Then, before manual triggering it will have flow number '{1}' (in the DB); at triggering at will become What do you think? |
I think that's exactly what I was proposing. (although I wasn't considering preserving an explicit no-flow number so
The first flow to catch the triggered task takes possession of it (i.e. give it its flow number). This should be easy to implement, more exotic behaviour could be provided with the Example 1before: task triggered before the flow approaches
later: the first flow catches the task (it is not rerun in the first flow), a new flow is started
after: the subsequent flow does not interact with the task because it belongs to the first flow
Example 2before: task triggered after the first flow has already passed it, a new flow is started
after: a subsequent flow catches the triggered task (it is not rerun in the subsequent flow)
Example 3 - the edge caseThis leaves us with one niche edge case where a task is triggered normally: $ cylc trigger 1/c
$ sleep 5 # task has run and succeeded
Then subsequently reflow triggered before another it has been caught: $ cylc trigger 1/c --reflow In which case we could either, not merge the task with the new flow allowing it to merge with the next flow that catches it:
Or merge it as a previous submission of the newly reflow-triggered task:
It's pretty niche, happy either way. |
Yeah on reflection we can stick with the current no-flow-number approach. |
I don't quite understand your example 3. Assuming |
For anyone trying to follow this:
|
The implicit approach has one interesting complication, which shows it would be good to have Spawning "on demand" is local in the graph. i.e. each task spawns its own children, as part of its own flow, as it completes its outputs. So:
Both approaches could be useful. But I suppose it is reasonable to have 1. as the default [for pre-running a task ahead of a flow, at least]. |
Another complication: I think it's fair to say that when triggering a task in front of a flow , users will most often want the following flow to not rerun the task. [Let's call this pre-running a task] But the opposite is true for triggering a task behind a flow (which is likely to be more common in fact). Here, users are most likely wanting to re-run a task in the past flow, not to pre-run a task in a future flow. (This is what we have on master now; the triggered task has no flow number, but a following flow will not possess it). Unfortunately, if the triggered task has ever run before we can't be sure if the user wants a pre-run or a re-run. |
So it seems to me we need something like this:
This is foolproof and easy to understand. Users must know if they are pre-running a future task or re-running a past one. Note this does not require an understanding of reflow! (unless of course the user has already triggered multiple flows, or they voluntarily user |
Ach, sorry, that should be |
Have had a chat with @dpmatthews about this today, I think there's a generalisation to be had here. Don't have the to write up today, will do Monday. |
It is done: #4686 |
Closing as superseded by #4686 |
This is a condensation of concerns raised with the
cylc trigger
logic on #4651 a different formulation of which can be found in #4653.Outline:
cylc trigger
[without--reflow
] it is not assigned to a flow but will merge with the first flow that catches it.Is this analysis correct? Yes
Problem:
Do we agree that this is a problem? No
Desired Behaviour:
Do we agree that this is the desired behaviour? No
The text was updated successfully, but these errors were encountered: