Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task_pool: fix spawning when runahead tasks are triggered #4621

Closed
wants to merge 3 commits into from

Conversation

oliver-sanders
Copy link
Member

@oliver-sanders oliver-sanders commented Jan 27, 2022

Draft: Candidate fix for #4619, not sure what the consequences of this change are as yet

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Appropriate tests are included (unit and/or functional).
  • Appropriate change log entry included.
  • No documentation update required.

@oliver-sanders oliver-sanders added the bug Something is wrong :( label Jan 27, 2022
@oliver-sanders oliver-sanders added this to the cylc-8.0rc1 milestone Jan 27, 2022
@oliver-sanders oliver-sanders self-assigned this Jan 27, 2022
@hjoliver
Copy link
Member

hjoliver commented Feb 2, 2022

Superseded by #4640

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 2, 2022

Reopened - #4640 (review)

@oliver-sanders oliver-sanders marked this pull request as ready for review February 2, 2022 16:00
@MetRonnie
Copy link
Member

MetRonnie commented Feb 2, 2022

I'm afraid this doesn't work

[scheduling]
    cycling mode = integer
    runahead limit = P1
    [[graph]]
        P1 = foo
[runtime]
    [[foo]]
        script = sleep 20
$ cylc play workflow
# 1/foo spawns and starts running
# 2/foo spawns in runahead
$ cylc trigger workflow//2/foo
# When 1/foo and 2/foo finish, workflow shuts down, even though no FCP

Nor for your modified case #4640 (comment)

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 2, 2022

Hmm, strange things going on here.

Here's the simplest working test:

[scheduling]                                                
    initial cycle point = 1                                 
    cycling mode = integer                                  
    runahead limit = P1                                     
    [[graph]]                                               
        P1 = x      
               
[runtime]      
    [[x]]      
        script = """    
            cylc__job__wait_cylc_message_started
            if ((CYLC_TASK_CYCLE_POINT == 1)); then    
                cylc trigger "${CYLC_WORKFLOW_ID}//3/x"    
            fi      
            sleep 1    
        """ 

On this branch example works providing the trigger point is >3 but breaks if it is <3, dammit.

@oliver-sanders oliver-sanders marked this pull request as draft February 2, 2022 16:58
@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 2, 2022

@MetRonnie have a try now, I think that should do it but I'm not sure it's a route we can go down...

@MetRonnie
Copy link
Member

MetRonnie commented Feb 2, 2022

Haven't had a go with your most recent change yet, but it seems to me a combination of this and #4640 works for both cases?

Click to show diff
diff --git a/cylc/flow/task_pool.py b/cylc/flow/task_pool.py
index 6bed8fff5..b8622c28e 100644
--- a/cylc/flow/task_pool.py
+++ b/cylc/flow/task_pool.py
@@ -538,7 +538,9 @@ class TaskPool:
             self.workflow_db_mgr.pri_dao.select_tasks_to_hold()
         )
 
-    def spawn_successor(self, itask: TaskProxy) -> Optional[TaskProxy]:
+    def spawn_successor_if_parentless(
+        self, itask: TaskProxy
+    ) -> Optional[TaskProxy]:
         """Spawn next-cycle instance of itask if parentless.
 
         This includes:
@@ -597,16 +599,13 @@ class TaskPool:
             # implicit prev-instance parent
             return
 
-        if not itask.flow_nums:
-            # No reflow
-            return
-
-        if not runahead_limit_point:
-            return
-
-        # Autospawn successor of itask if parentless.
-        n_task = self.spawn_successor(itask)
-        if n_task and n_task.point <= runahead_limit_point:
+        # Autospawn successor of itask if parentless and reflowing.
+        n_task = self.spawn_successor_if_parentless(itask)
+        if (
+            n_task and
+            runahead_limit_point and
+            n_task.point <= runahead_limit_point
+        ):
             self.release_runahead_task(n_task, runahead_limit_point)
 
     def remove(self, itask, reason=""):

However it looks like it means there is no such thing as a non-reflow trigger?

@MetRonnie
Copy link
Member

MetRonnie commented Feb 2, 2022

Had a go with 24b7d2c and it works for both cases.

Also, it seems to be a better solution than the diff I posted above. Say you do

$ cylc play workflow
# 1/foo spawns and starts running; 2/foo spawns in runahead
$ cylc trigger workflow//5/foo
# 5/foo spawns and starts running; 6/foo spawns in runahead
# 1/foo finishes; 2/foo starts running and 3/foo spawns in runahead
# 5/foo finishes; *6/foo is still waiting*
...
# 3/foo finishes; 4/foo starts running; 5/foo spawns (again) in runahead
...
# 5/foo finishes; 6/foo starts running...

whereas with my diff, 6/foo would start running the first time 5/foo finished, leading to 2 ongoing flows.

@hjoliver
Copy link
Member

hjoliver commented Feb 3, 2022

On this branch example works providing the trigger point is >3 but breaks if it is <3, dammit.

They're different problems: a) triggering a runahead-limited task that already exists - #4640; b) triggering a new task that causes a flow-merge problem - #4644.

@hjoliver
Copy link
Member

hjoliver commented Feb 3, 2022

I had a go with 24b7d2c and it doesn't work. Maybe you tested with a) triggering the runahead task; and b) triggering a future task that finishes before the main flow catches up? I'd expect it to work in those cases, but not for triggering the very next task right after the runahead task.

@hjoliver
Copy link
Member

hjoliver commented Feb 3, 2022

Re-closing!

@hjoliver hjoliver closed this Feb 3, 2022
@MetRonnie
Copy link
Member

MetRonnie commented Feb 3, 2022

for triggering the very next task right after the runahead task.

Works for me?

demo2

Unless it is supposed to be the case that 3/foo spawns and runs again after 2/foo finishes?

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 3, 2022

Works for me too but whatever, so long as we understand the problem.

@MetRonnie
Copy link
Member

Ok so combining #4640 and #4645 results in this:

demo3

The difference is that instead of 4/foo spawning in runahead when 3/foo is triggered, 4/foo spawns in runahead when 2/foo starts running (i.e. when the original flow catches up, I presume)

@oliver-sanders
Copy link
Member Author

Re Re Opening...

@oliver-sanders oliver-sanders reopened this Feb 3, 2022
@oliver-sanders oliver-sanders marked this pull request as ready for review February 3, 2022 16:50
@oliver-sanders
Copy link
Member Author

(rebased and removed the old test)

@MetRonnie
Copy link
Member

Tested and it avoids running the triggered task (4/x) twice like in #4651 (comment). But if using cylc trigger --reflow should it run 4/x twice? Because at the moment, it doesn't; --reflow doesn't seem to make any difference

@oliver-sanders
Copy link
Member Author

That DB check will need to be a bit more advanced, it will have to fetch all results for the cycle/task, then go through the flow numbers json'loading them, then test that against the previous instance flow numb.

@MetRonnie
Copy link
Member

Hmm, can't imagine that will be good for performance

@hjoliver
Copy link
Member

hjoliver commented Feb 3, 2022

Ok so combining #4640 and #4645 results in this:
...
The difference is that instead of 4/foo spawning in runahead when 3/foo is triggered, 4/foo spawns in runahead when 2/foo starts running (i.e. when the original flow catches up, I presume)

That's correct. When 3/foo is force-triggered it is not supposed to reflow so it does not spawn 4/foo. But when the main flow catches up, the merged flow (now represented by 3/foo) IS supposed to continue so we retroactively spawn the next tasks to make that happen.

So, can someone explain why this PR has been re-re-opened? see #4651 (comment)

@hjoliver
Copy link
Member

hjoliver commented Feb 3, 2022

for triggering the very next task right after the runahead task.

Works for me?

OK, by "Had a go with 24b7d2c and it works for both cases" I thought you meant literally just the changes in that commit. If I do that, the scheduler shuts down prematurely.

To get the result in your screen recording above I need the line 600 change as well, as in @oliver-sanders current first commit on this branch: 8bf1d49

However, that is still the wrong result. When you triggered 3/foo you did not ask for reflow, so it should not spawn 4/foo (even if 4/foo has no flow numbers) at that point.

So while this branch does work for the two reported cases, it works for the wrong reasons, and that could come back to bite us in the future. Plus this branch fails the functional test on #4645, which extends the scenario to downstream dependencies of the triggered task.

Re-re-re-closing 🤣

@hjoliver hjoliver closed this Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :( superseded
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Triggering runahead tasks causes workflow to finish early
3 participants