Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flow-nums data-store fix #6115

Merged
merged 3 commits into from
Jul 9, 2024
Merged

Conversation

dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented May 30, 2024

closes #6114

Using example from above produces:
image

The solution was to create a delta from the new task proxy (instead of trying to overwrite the existing).

existing tests should cover code (?)

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • No changelog entry needed as not yet user-facing
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@dwsutherland dwsutherland added the bug Something is wrong :( label May 30, 2024
@dwsutherland dwsutherland added this to the 8.3.0 milestone May 30, 2024
@dwsutherland dwsutherland requested a review from MetRonnie May 30, 2024 10:58
@dwsutherland dwsutherland self-assigned this May 30, 2024
@MetRonnie

This comment was marked as resolved.

@MetRonnie MetRonnie modified the milestones: 8.3.0, 8.3.1 May 30, 2024
@MetRonnie

This comment was marked as resolved.

@dwsutherland

This comment was marked as resolved.

@oliver-sanders
Copy link
Member

It seems to be marking a succeeded task as failed: https://github.com/cylc/cylc-flow/actions/runs/9301382754/job/25601035801?pr=6115#step:17:3293

Reporting on the task status from the wrong flow?

@dwsutherland
Copy link
Member Author

dwsutherland commented May 30, 2024

It seems to be marking a succeeded task as failed: https://github.com/cylc/cylc-flow/actions/runs/9301382754/job/25601035801?pr=6115#step:17:3293

Reporting on the task status from the wrong flow?

I think a delta is being overwritten by another delta... maybe

@dwsutherland
Copy link
Member Author

dwsutherland commented May 30, 2024

Ah, I think I know what's happening..

[scheduler]
   allow implicit tasks = True
   [[events]]
      inactivity timeout = PT1M
      abort on inactivity timeout = True
[scheduling]
   [[graph]]
      R1 = """
         a => b => c? => d
         c:fail? => kick-c
      """
[runtime]
   # First run: c stops the scheduler then fails.
   # On restart, kick-c retriggers c to run 'cylc show'.
   [[kick-c]]
      script = cylc trigger "$CYLC_WORKFLOW_ID//1/c"
   [[c]]
      script = """
         if ((CYLC_TASK_SUBMIT_NUMBER == 1)); then
            cylc stop --now --max-polls=10 --interval=1 $CYLC_WORKFLOW_ID
            false
         else
            # Allow time for c submission => running
            sleep 2
            cylc show "$CYLC_WORKFLOW_ID//1/b" >> $CYLC_WORKFLOW_RUN_DIR/show-b.txt
            cylc show "$CYLC_WORKFLOW_ID//1/c" >> $CYLC_WORKFLOW_RUN_DIR/show-c.txt
            cylc show "$CYLC_WORKFLOW_ID//1/d" >> $CYLC_WORKFLOW_RUN_DIR/show-d.txt
         fi
      """

When you start the workflow up again, the states and prerequisites of tasks are loaded from the DB .. including ✓ 1/b succeeded in 1/c (as in the test).

However, when you trigger a task it's prerequisites aren't loaded.. So this change is correctly setting the prerequisites to the newly triggered 1/c (⨯ 1/b succeeded)... This is how Cylc works at the moment..

So I can either:

  1. change the test to reflect the state of the new 1/c
  2. fake it to keep the old 1/c's prerequisites..
  3. have forced-triggers/triggers actually set the prerequisites "correctly" for the task pool object (however that can be done).

Ideally maybe option 3 (?)

However, we shouldn't do option 2, as I can imagine cases where the new object does have correct prerequisites (i.e. from a reflow and/or after reload )..

So I'm going to change the test..

@MetRonnie

This comment was marked as resolved.

@dwsutherland

This comment was marked as resolved.

@MetRonnie MetRonnie changed the base branch from master to 8.3.x June 19, 2024 12:33
@oliver-sanders
Copy link
Member

So I can either:

I would go for the minimal change to fix the issue and kick wider Cylc implementation stuff to an issue.

@MetRonnie are you happy with the amended test?

@oliver-sanders
Copy link
Member

Tried this out, seems to do the job.

Wrote an integration test which fails before and passes after this change: dwsutherland#20

Discovered an unrelated bug in the process: #6174

@MetRonnie
Copy link
Member

(Figured out that the tests were passing locally for me because the comm command can't tell the difference between and when LANG=en_US.UTF-8 for some bizarre reason)

Comment on lines 2548 to 2555
self._process_internal_task_proxy(itask, tp_existing)
statics = {'id'}
for field, _ in tp_existing.ListFields():
old_value = getattr(tp_delta, field.name, None)
if old_value and field.name not in statics:
tp_delta.ClearField(field.name)
tp_delta.MergeFrom(tp_existing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to add a comment explaining what this is for

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is doing anything, actually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have in the general delta application..
Some protobuf fields are additive .. i.e. if the two messages contain lists, then the list would extend on MergeFrom, this can be a problem if we don't want duplicates in the list (i.e. we just want to replace one list with the new).. So we Clear them first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this couldn't simply be

-tp_existing = PbTaskProxy(id=tp_id)
-
-self._process_internal_task_proxy(itask, tp_existing)
+self._process_internal_task_proxy(itask, tp_delta)
-statics = {'id'}
-for field, _ in tp_existing.ListFields():
-    old_value = getattr(tp_delta, field.name, None)
-    if old_value and field.name not in statics:
-        tp_delta.ClearField(field.name)
-tp_delta.MergeFrom(tp_existing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if _process_internal_task_proxy wipes out the fields it's writing to .. If so, then yes, otherwise I already explained:

Some protobuf fields are additive .. i.e. if the two messages contain lists, then the list would extend on MergeFrom, this can be a problem if we don't want duplicates in the list (i.e. we just want to replace one list with the new).. So we Clear them first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added that block in the last commit.. However, it may have been a precautionary move.

I've reviewed the code, and the protobuf fields .. The one field that would have been effected is already cleared in _process_internal_task_proxy, the other mappings and fields should be ok. So, I've removed that block.

Let's see how my review and removal goes..

@oliver-sanders
Copy link
Member

oliver-sanders commented Jun 28, 2024

(Figured out that the tests were passing locally for me because the comm command can't tell the difference between ✓ and ⨯ when LANG=en_US.UTF-8 for some bizarre reason)

Sounds like it would make sense to set LANG=C for the purpose of this comparison?

LANG=C contains_ok ...

(sry, hit wrong button)

@MetRonnie
Copy link
Member

MetRonnie commented Jun 28, 2024

Sounds like it would make sense to set LANG=C for the purpose of this comparison?

#6183

@oliver-sanders
Copy link
Member

(that fix is now in, rebase this PR to pick it up)

@oliver-sanders
Copy link
Member

(@dwsutherland, one unresolved review comment above)

@oliver-sanders oliver-sanders modified the milestones: 8.3.1, 8.3.2 Jul 3, 2024
@dwsutherland
Copy link
Member Author

(@dwsutherland, one unresolved review comment above)

Replied.. hopefully this convinces you @MetRonnie

@oliver-sanders oliver-sanders requested a review from wxtim July 8, 2024 11:51
Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@oliver-sanders
Copy link
Member

(Note, no changelog entry required, this does not impact users at present)

Copy link
Member

@wxtim wxtim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@MetRonnie MetRonnie merged commit 29d26cf into cylc:8.3.x Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :( small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data store: flow numbers reported do not update to match reality
4 participants