-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trigger: generalisation of triggering approaches #4686
Comments
Nice write-up @oliver-sanders 👍 I generally agree on the matrix of cases we need to support, in terms of the functional capability. I'm not so sure on the conceptual description, bearing in mind that as you said, the terminology needs work-shopping.
I find this use of "continue" a bit brain-twisting.
The "no" case of this is meaningless isn't it? If you trigger a task anywhere behind any flow, including the original flow which typically starts at the start of the graph, but it can't overrun previous runs, then the triggered task itself can't even run, no? Or do we need to specify which flow(s) it can't overrun?
I think it's intuitive to say "my newly-triggered flow can overrun previously-run flows". I'm not so sure that "my newly-triggered task/flow cannot be overrun by upcoming flows" is the best way to describe that concept though. Better to think of it as "my newly triggered flow is part of the upcoming flow(s)" (i.e. I am "pre-running" bits of the upcoming flow(s)). |
In other words, I think we should describe what happens in terms of the task or flow being triggered: I am triggering this task,
(with the |
I like the idea of adding a new flow number to triggered tasks (that aren't actually starting a new flow) so that they can be targeted with the CLI. I'm not so sure about the "add all existing flow numbers" bit. How is that better than just merging with the first flow that comes along, whatever its number? Do you want avoid merging with other flows triggered after the first triggering event?
Not sure I follow this. How is
This seems to suggest that the triggered task will merge with every subsequent flow that catches up with it? (or at least, every flow that was present when the task was triggered). I would think we want to support pre-running a task for a particular flow (or perhaps for the first one that catches it), but the first merge neutralises it so that further flows can overrun it. E.g. consider pre-running a task for the current main flow; then much later start an entirely new flow to re-run that part of the graph - it would be surprising if some hidden long-ago triggered task was treated as already-ran and merged with the new flow? I don't think we can tell the new flow not to merge with the old task, because that would preclude later decisions to pre-run tasks in the new flow. [Update: your CLI examples seem to suggest you don't mean every subsequent flow should merge with the triggered task 👍 ] Still considering the CLI options you've laid out ... at first glance I prefer something like 2. I don't mind the default being as you describe, for triggering tasks in front of a flow. (Just not keen on the description in terms of "continue" and "overrun", for reasons above). |
I think the continue/overrun model is valid, I don't think the "overrun" part has been communicated. I'm describing behaviour from the user's perspective not the implementation perspective. Here's an example to show how the overrun=yes/no cases differ and hopefully clarify the 'continue' bit.
Lets start with task "a" running in the original flow, then trigger task "d" using each of the four methods.
The following examples show what the workflow would look like after it has finished running. 1) Reflow
The two flows never meet in the pool so never merge.
2) Continue
The two flows never meet in the pool, however, they do meet in the DB. Because the triggered tasks belong to flow=1 they will not be overrun.
3) No-flow (implemented)
The triggered task leaves the pool. As it is not considered to belong to any flow it will be overrun.
4) N-flow (proposed)
The triggered task leaves the pool. As it is considered to belong to any flow it will be overrun.
|
Approaches (2) and (4) are both trying to solve the overrun=no problem but are doing it in different ways. Approach (4) uses a special marker to say "merge me with anything", approach (2) explicitly lists all of the flows tasks should be considered to belong to. The reason we are not using the special marker to implement (2) is because that would prohibit a reflow from being able to overrun the tasks.
If there are multiple flows (e.g. 1 & 2) active at the time the trigger is performed,
This is the use case for trigger type (4) [continue:no, overun:no]. The task will merge with the next flow that catches it, subsequent flows can overrun it.
Because we explicitly listed all of the flows the triggered task belongs to when we issued the trigger, any new flow numbered that are subsequently added (by reflow) will be able to overrun the task.
If we can come up with good |
I'm going to want animated GIFs if I have to explain this to users. |
Some suggestions ...
Seems fine - this is a new flow which is separate from any existing flow (unless it merges).
How about
Seems fine - this isn't part of any flow.
How about |
Coming back to this ... I understand the "continue" and "overrun" terminology, but IMO it's more intuitive to think of it like this: The triggered flow can:
[1] by "say it belongs to an upcoming flow" I mean any one of:
The main point of confusion of course is that Finally, "overrun" makes more sense, but it's kind of intuitively obvious in the following sense:
|
@hjoliver It sounds like we're all happy with the 4 options - the issue is to how best describe them? I'm still reasonable happy with
This seems the most difficult to describe. Other suggestions: |
As stated in the OP I am not proposing the words "continue" and "overrun" and am fully aware of the issue with "continue".
However, I am suggesting that this model for understanding the different methods might make for an intuitive interface given sensible substitutions for these words as it would help users to understand the ways in which the different trigger methods are similar and the ways in which they are different. |
Well your case 1 above does suggest those words as CLI options; and you did describe the whole thing in those terms, which I initially struggled to understand because the of the "continue" issue, and which I assumed (maybe wrongly) might be how you'd want to describe it to users too. (Maybe I should have moved on from this discussion already, but I came back to it after a few days and it didn't seem fully resolved).
Sorry, I wasn't suggesting anything there, just explaining why I don't think the conceptual description in terms of continue/overrun above is very helpful, compared to simply describing the triggered task belonging to an upcoming flow or not (and if it does belong, when the flow encounters it, it knows the task has already been run and so won't run it again).
That's the next step, which I meant to follow up on the other day but ran out of time ... |
OK, on to the real point ... CLI I like the It seems pretty intuitive, and it remains compatible with giving actual flow numbers if that's ever needed. E.g. trigger a task in front of multiple flows, but only one of them should merge with it (I'm not sure what the use case is for that, but I'm also not sure what the use case is for having any one of several upcoming flows merge with it ... the reality is, I don't really see people using multiple upcoming flows in close proximity like this until we can support entirely independent flows). How about a small variation on the a) trigger a one-off task that does not affect other flows: $ cylc trigger --flow=none b) trigger a new flow, that is not part of any existing flow: $ cylc trigger --flow=new c) pre-run parts of the graph ahead of an existing flow or flows: # trigger and continue as normal
$ cylc trigger --flow=any # any current flow can catch up and merge; others can overrun
$ cylc trigger --flow=1,2 # flows 1 or 2 can catch up and merge; others can overrun
# trigger as above but wait for merge before continuing
$ cylc trigger --wait --flow=any
$ cylc trigger --wait --flow=1,2 I like this because waiting is singled out as unusual behaviour. For a spawn-on-demand scheduler, the default should be to spawn downstream children as normal (i.e. as outputs are generated). Also, it makes it clear that continue-now vs continue-later are essentially the same thing as far as the flows are concerned, just with a delay in the wait case. Finally, maybe |
Can I confirm that my sketch matches the four Options @oliver-sanders ? In which case I like the terms,
And suggest that limiting the length of the bridge / size of island to 1 is just a special case - i.e.
The odd aspect of this "conceit" for describing reflow is that you don't know how long a bridge will be when you start building it. When it comes to documentation I'm guessing I won't be the only person who needs pictures to understand this. |
We have to remember that this discussion only applies to tasks not currently in the pool (*). If you, for example, trigger a failed task that should just run that task as part of the current flow and shouldn't require you to specify any additional options (and, if you do specify any options they should be ignored). (*) Not sure what terminology to use. This certainly applies to a task with incomplete outputs. Also to a) a queued task, b) a task waiting for a clock trigger or an xtrigger, c) what else? We need to make sure this is clearly documented. We also need to describe what happens if you trigger a submitted or running task (the trigger is ignored).
I think this is why @oliver-sanders wants the default to be the least dangerous option (a one-one task?) - you shouldn't trigger a new flow without choosing the required behaviour. |
@wxtim -
Yes we'll definitely need diagrams! I'm not keen on the "bridge" and "island" terminology for users (although I see what you're getting at). |
Yes, I was taking that as a given at this point.
Yes, but my latest suggestion is even safer than having a "safest default": there is no default, and you can't trigger a new flow without using For example, users who blithely do a |
We can document that as If a task already belongs to a flow, triggering it will just run it sooner as part of its own flow. Then we just have to make the "already belongs to a flow" bit obvious in the UI. As for "shouldn't have to specify any options" (if already in the pool): that requires some thought (for the CLI at least). |
Not sure about the first one, however, your "Alt visualizations" are spot on 💯 (and exactly how I would want to present this to users). We should be able to do this in ASCII for the CLI
Kinda get what you're going for there but Cylc "islands" are highly unstable:
Perhaps "sub-marine sea mounts"?
Note: Purposefully using new terminology to avoid conflation with existing terms, we may want to workshop "continue" and "overrun" a touch. I picked these awkward terms because otherwise we would have gone back around the terminology loop of the previous week again, please consider them as abstract concepts for which we can choose new labels. Irrespective of how the CLI ends up, explaining these four trigger spaces will ultimately involve explaining the behaviours which form the matrix (since the matrix defines the fundamental differences) so we can do with hashing out options.
Yes so we should define the interface from a description of its behaviour from the user's perspective (your If we add another option to reflect the "overrun" dimension, say
(I had considered this opposite arrangement, however, it makes reflow the default which is definitely not a good idea!) CLI options (1) & (2) are both viable, however, I don't see a good reason to mix them.
I don't think this is a good idea, especially from the UI perspective where it would be confusing as heck. The best option is to come up with a consistent behaviour that can apply across the whole range. I think that's option (4) (i.e. continue=no, rerun=no) (i.e. The full range of options should only need to be known to the very small proportion of users who would actually want to use them (i.e. keep the complexity away from the general case). |
CLI option (2) single
|
@wxtim - "alt visualization", very nice 👍 |
Yes I do understand that now, it's just that I (naturally IMO!) initially assumed you meant we might need to workshop better names (in our context) for the concept "continue" interpreted in the normal sense of the word. But nevermind, we are now on the same page on that one 😁
I agree that having no default is not ideal, especially for the (non-CLI) UI. But I'm not convinced yet that the any of the options constitute a safe default. Triggering tasks with or without reflow and with or without merging is more complex and more consequential than most (any?) other action. Taking The safest default is to trigger a one-off task, because that doesn't cause reflow and doesn't mess with existing flows. BUT that probably isn't what users will want most of the time. |
I have a different spin on what you're calling mixing the options, and the "behaviour-driven" nature of option 1 (overrun/continue) and the four-way matrix. I think a description in terms of the higher level "flow" concept explains those behaviours and in fact unifies two of them. And a good conceptual framework should always be easier to understand than a bunch of separate behaviours.
The bits in bold are all that users need to understand this, and they are pretty clean and intuitive concepts. So from a flow-centric perspective, under 3., there are only three fundamental options, not four as in your table; with a choice of continue-now or continue-on-catchup in the pre-run case. A flow-centric perspective should not be problematic for users only interested in running one flow. They may still need (i) and (iii) and so they need to understand the difference between triggering a future task "as part of my flow" or "independent of my flow". So this is obviously pretty close to your CLI option (2) single --flow argument, but there's one problem:
We need to be to able to combine "wait" with different flow options: |
I think no-flow option (4) is fine. Triggering a task is expected to have consequences, same for n=1 same for n=21, same for Cylc 7, same for Cylc 8. This consequence is obvious and easily understood, if you tell something to run, it will run, the fact it ran will persist. |
That's helpful, thanks. I'm still not sure if your conitinued use of the the original terminology is suggesting we stick with that to dexcribe and document the behaviour though? [UPDATE: 🤣 I just saw this below the image: These four trigger names are not a part of the proposal and should probably not make it past this issue 🤣]
|
Agreed. And
Great way to illustrate what happens, in combination with @wxtim's diagrams. One minor quibble on your description of case (3) ... related to "the missing --wait argument":
I would say, the graph does not run on from there because And (sorry 😬 !) possibly a more fundamental issue with your proposed Firstly, as an aside, the Secondly, this (what you've labelled as)
So if "a" had never run before, the default trigger (2 Continue) will cause "a" to flow on as part of flow 1. But if "a" had ever run before, the default trigger (which is supposed to "continue") will behave exactly like Instead, I think we should have consistent behaviour in terms of does the triggered task/flow merge with upcoming flows and continue or not regardless of where we trigger in the graph (still merging with existing flows by default, btw). NOTE this is why my "to be implemented" post above was short and did not distinguish between "n<0" and "n>0" - according to my view of how this should work, that's not necessary. I had not gone down to the specifics of flow numbers there, but here's what I was thinking... |
This case is straightforward (one-off task, no flow-on, no merge with existing flows):
And this case is straightforward (new flow, no merge with existing flows):
But we differ here:
The triggered flow should get:
That's it, and the default-trigger behaviour and result is the same before and after any flow. A few comments: If we have
If we have
Then wherever we default-trigger a task (outside of n=0 of course):
Other pros:
(Note that adding a new flow number isn't strictly necessary except when there are no existing flows that have not run the task already ... in which case the user is definitely doing a re-run, and there is no following flow to merge with). |
@hjoliver it is not clear what you are proposing, please could you fill out the above examples with your desired behaviour and highlight where they differ. You seem to be suggesting the rules for what flow numbers are provided by
|
Not really, I'm saying current active flows (i.e. those in With the small caveat (which is probably what caused the confusion here, sorry) that we should exclude flow numbers of flows that have already passed through the triggered task. That is what allows the default trigger to re-run a sub-graph (say) behind a flow (because the triggered task will not take the flow number of the flow that we are re-running, even if that flow number still exists in
OK, I'll try to do that now, since we desperately need to lay this one to rest. I wonder if this is gonna end up the longest single issue page on the project :-) |
Also, I'd say the rules are the exactly same in both cases, it's just that in the never-ran-before case there is no previous flow number to exclude. |
So if there is only one flow in the workflow the task will not run at all. If there are multiple flows in the workflow the "continue" trigger will result in a reflow irrespective of whether the other flow(s) are ahead or behind of the original? Examples would be great. |
No, see this comment:
|
Ok, so this effectively changes to default to reflow for historical tasks. I would much prefer for reflows to require users to opt-in in all cases because the consequences of reflow on users data are quite dangerous and reflow (and multiple flows in general) are way beyond what we can expect of the working knowledge of the vast majority of users. |
(See my terminology comments above on what exactly "reflow" means) So I think "the continue trigger" should, by definition, "continue", which means a flow should carry on from the triggered task. The main thing, which we agreed on, is that by default that continuing flow should not get overrun by any existing flows (and I'm not arguing with that). |
Meh, sort of. My way is simpler from a consistency perspective (same behaviour on triggering a task, whether or not it ever ran before), and I think what matters and is easier to understand is whether the triggered task flows on or not. The fact that flowing on after triggering an |
And my other related point is that if you are triggering a past task to re-run it, you are just as likely to want it to flow on (the regenerate some products use case), as opposed to running a single task. The re-run a single task case seems to me to be best expressed by non-default |
I don't disagree that "reflow is dangerous" in the sense that it re-runs tasks and that will probably overwrite existing data. However:
At least I think we probably both understand where the other is coming from now. Because I was focused more on consistent triggering behaviour, when you agreed to go back to the no-wait default I thought that applied equally to future and past tasks. i.e. no-wait in front of flow=1 means "flow on now" (with all current flow numbers that could catch up and merge); and no-wait behind flow=1 means exactly the same thing. Both generate a new flow front. The fact that one case involves re-running past tasks should be blindingly obvious to users because they deliberately triggered a task that already ran. |
If you're not coming around to my perspective (which again, makes for simpler, consistent triggering behaviour and does not treat flow=1 as |
Example 1 (
|
Disagree on "simpler", "consistent" and "magic" 😁. You're not winning me over I'm afraid. I see your points, but I don't agree with them. Since the start I've maintained that defaulting to reflow is dangerous and that all reflow functionality (and all its complex consequences e.g. no-flow) should be opt-in. You are proposing that If I understand correctly what you are proposing does not add any new functionality, it just changes the default. If so my interpretation covers all bases, but if you want a reflow you must manually say so. |
That's kind of a misrepresentation because it ignores the definition of flow. A flow is a self-consistent self-perpetuating run through the graph. If a flow has passed by a task, retriggering it should be considered a new flow (or a one-off no-flow), because by definition that task has already run in that flow. You are saying, give the task the same flow number it had before but run it anyway, even though it has already run in that flow.
My consistency is at the conceptual level. When you trigger a task, any task, does it flow on or not. This supposed inconsistency is down at the level of flow numbers which is really an implementation detail that we use to make the required behaviours work. |
That's right, but we are coming from two different flow models (in a sense). By my conceptual model (which I'm claiming is simpler) your default is different behind the first flow than it is in front of it. (And it doesn't even seem to make sense with respect to the names that you gave the options: behind flow=1 the "continue" / no-wait default does not actually continue anything.) |
I don't think we are going to get anywhere with this, suggest another call. |
(otherwise it's going to be another ten pages of reply, quote and response) |
Yep, can do 👍 |
OK, meeting done. Result: I concede defeat. 💥 Reasons, for the record:
Also, on terminology:
|
The final result then, for implementation. (@oliver-sanders' explicit examples above are all valid and useful, and should be made into tests, but I think we can ditch the four-way categorization at this point). Trigger Active Flows
The triggered task runs with the set of all active (
(It gets a bit gnarly to list exactly what happens when triggering ahead of all flows, behind all flows, and between flows ... but we don't need to do that here as it's all derivable from the above). Trigger Specific Flows
The triggered task runs with the specified set of flow numbers,
Trigger a New Flow
The triggered task runs with a new flow number, not in the set of active flows A (or any previous flow in fact).
Trigger No Flow
The triggered task runs with a "none" flow number.
|
After a long chat with @dpmatthews (who proposed yet another triggering approach 😁) I think we can generalise the trigger problem into two dimensions:
Combing these we get four spaces:
Going through the four spaces in detail:
1) Reflow (implemented)
Equivalent to
cylc trigger --flow=<new-flow-number>
.Continue: Yes
Overrun: Yes
The use case is for re-running over tasks which have been previously run e.g. change configuration and re-run a sub-graph.
2) Continue (proposed)
Equivalent to
cylc trigger --flow=<all-flow-numbers>,<new-flow-number>
.Continue: Yes
Overrun: No
--flow=1
to be used for, but has been generalised to be reflow compatible.This approach feels quite "natural". The use cases are setting off another bit of the same flow where you don't want tasks to be overrun.
3) No Flow (implemented)
Equivalent to
cylc trigger --flow -1
.Continue: No
Overrun: Yes
Useful for running one-off tasks that you do not want to impact the workflow in any way (i.e.
cylc submit
type uses).4) No Flow (proposed)
Equivalent to
cylc trigger --flow -2
.Continue: No
Overrun: No
Use case is for manually intervening in graph execution by ignoring dependencies or runahead limit and skipping ahead to a task which you want to be considered a part of the approaching flow front.
Interface
The internals to handle the four cases are already in-place, flow_nums, DB lookups etc, so it mostly boils down to an interface / documentation issue.
I think all four methods could be exposed via a single
--flow
argument, however, it is sensible to provide defaults for the different behaviours. I think it would be good to document the--flow
equivalents as they may help users to understand their function.1) Enable behaviours explicitly
If we are happy with the continue/overrun model (after workshopping the terms) we could expose it directly something like:
This is quite nice as you have to explicitly opt in to each behaviour separately reducing the scope for unintended results and accidents.
2) Single
--flow
argumentif we don't like the continue/overrun model we could move the presets into the flow argument something like:
It's less behaviour driven so we would need to explain each option separately.
3) Separate flag for each approach
An alternative to (2) would be to could come up with three/four different flags:
Default
I think no-continue & no-overrun is the safest, sanest default because:
But I'm biased. I think the default is less important than the clear separation of behaviours.
The text was updated successfully, but these errors were encountered: