Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphql: wishlist #3223

Closed
2 of 4 tasks
oliver-sanders opened this issue Jul 16, 2019 · 31 comments
Closed
2 of 4 tasks

graphql: wishlist #3223

oliver-sanders opened this issue Jul 16, 2019 · 31 comments
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Jul 16, 2019

Data which we would like available in the GraphQL schema:

  • progress (do it client side)
    • Job progress as percent or decimal, to compliment the dt field.
  • isHeld - hold_swap => is_held #3230
    • When a task is held its previous state is stored, when it is un-held that state is restored.
    • For GraphQL it would be better to leave the task state unchanged but add a field to show if the task is held or not.
    • Simple to implement, change the current "swap" logic (see graphql: wishlist #3223 (comment))
  • isRetry superseded by re-implement the task retry state using xtriggers #3423
    • Currently retrying is a strange state which a task may pass through very quickly.
    • For GraphQL it would be better to leave the task state unchanged but add a field to show if the task is attempting retry.
    • Note retry relates to Cylc's execution retry delays or submission retry delays not to user intervention.
    • This one might require more discussion.
  • ...
  • status / status_msg - separate status and status message #3267
    • Separate the suite status and status message

Note: these fields might not have a direct mapping onto data which is currently available to Cylc Flow internally. They might be awkward or not really possible at the moment.

Pull requests welcome!
This is an Open Source project - please consider contributing code yourself
(please read CONTRIBUTING.md before starting any work though).

@oliver-sanders oliver-sanders added this to the cylc-8.0a2 milestone Jul 16, 2019
@matthewrmshin
Copy link
Contributor

In the past, we have avoided making too much change to the task state internal representation mainly due to compatibility issue with the GUI representation. Now that the old GUI is gone, we should be in a much better position to work on this...

For held, the current internal representation is basically (status: str, hold_swap: str) so it can look like ("held", "waiting") (which turns back to ("waiting", None) on release). It would make more sense to change it to (status: str, is_held: bool) - so we can get rid of the complex status swap logic.

You are right about retry and submission retry needing more discussion. To me, they are basically ("waiting", is_held=True) status - the task is held by the next (submission) retry delay - and will be automatically released on completing the delay.

I also can't remember if retry and submission retry can be used as task outputs or not.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jul 16, 2019

To me, they are basically ("waiting", is_held=True)

Presumably that's ("waiting", is_[sub_]retry=True)

I also can't remember if retry and submission retry can be used as task outputs or not.

They can't as far as I'm aware so we are safe there.

@matthewrmshin
Copy link
Contributor

No, I did mean ("waiting", is_held=True) - the task is being held by a retry delay. The alternate view is simply a ("waiting", None) status - but now it has a new prerequisite in the form of a retry delay.

(I am sure there are many ways to look at this problem. 😄)

@oliver-sanders
Copy link
Member Author

No, I did mean ("waiting", is_held=True)

I kinda get what you mean but a held state won't make sense to the user, might make some sense as an xtrigger though.

This kinda comes down to data representation / UI so I'll leak some cylc/cylc-ui stuff here. How do we represent retrys? Here are four options off the top of my head, feel free to suggest others, I'm happy to mock them up:

retry

  • #1 - custom icon for each retry state.
    • + clear separation of retry and task state.
    • - more icons => more confusion.
  • #2 - discrete retry symbol
    • + clearer separation of retry and task state
    • - may be interesting to graphically represent a held retrying task (which is, of course possible)
  • #3 - discrete held symbol
    • + one less state to worry about
    • + communicates what cylc is actually doing
    • - the user diddn't actually hold the task and will be confused as to why it is held
    • - held retrying tasks...
  • #4 - do nothing
    • + simple!
    • - confusing!
  • #5 - clock-face counting up to next retry time?
    • + gives user access to information, otherwise hard to find
    • - information not available to GUI yet
    • - non-intuative UI

@matthewrmshin
Copy link
Contributor

The (submission) retry state is only applicable while the task is waiting for the clock. Once submitted, the multiple job icons should make it obvious that the task has been retried or re-triggered. Perhaps the job icons should display whether it is an automatic retry or a manual re-trigger? E.g. nothing for automatic retry and an M in the job icon for a manual re-trigger?

@hjoliver
Copy link
Member

E.g. nothing for automatic retry and an M in the job icon for a manual re-trigger?

A little ✋ badge for manual?

Note we also discussed in Exeter modifying edge style in the graph view (I think??), to indicate manual intervention (e.g. task was manually triggered despite prerequisites not being satisfied).

@oliver-sanders
Copy link
Member Author

Perhaps the job icons should display whether it is an automatic retry or a manual re-trigger?

I think this would be good, something else we have been asked for is if we could display the retry number e.g:

1/∞  # infinite potential retries e.g. PT5M
3/4  # finite retries e.g. PT5M, 3*PT10M

@TomekTrzeciak
Copy link
Contributor

It would be also good to tell apart normally succeeded tasks from manually succeeded ones. This is helpful for troubleshooting operational suites in the heat of failures, where actions were taken by operators and the support team is called in after the fact. Or more generally, having a clear, visual indication at a glance of where user interaction happened in the suite (manual task trigger, succeed, insertion/deletion, etc.) would be quite useful.

@sadielbartholomew
Copy link
Collaborator

sadielbartholomew commented Jul 19, 2019

@dwsutherland asked for thoughts in a comment that I will cross-post to leave as a question for those following this Issue (it doesn't strictly relate to this Issue, but I was looking for a suitable enough one on cylc-flow to re-raise it in with those who know more about the plans for the task/job data side to comment than I):

In the job pool (store of job data elements); the creation of an element happens just before job submission, so I added the "ready" state to them..

I guess this relates to TASK_STATUS_READY in the following (correct me if I am wrong, David, thanks)?

JOB_STATUSES_ALL = [
TASK_STATUS_READY,
TASK_STATUS_SUBMITTED,
TASK_STATUS_SUBMIT_FAILED,
TASK_STATUS_SUBMIT_RETRYING,
TASK_STATUS_RUNNING,
TASK_STATUS_SUCCEEDED,
TASK_STATUS_FAILED,
]
class JobPool(object):
"""Pool of protobuf job messages."""

@dwsutherland
Copy link
Member

@sadielbartholomew - Correct, jobs are usually submitted soon after creation, but there is a space between job file creation (where/when I create the data element alongside).. So ready made sense.

@hjoliver
Copy link
Member

hjoliver commented Jul 24, 2019

Not sure I understand the "ready" state discussion above. The "ready" state means "ready to run" ... i.e. prerequisites satisfied and queued to the subprocess pool for job submission. If the subprocess pool is small and/or you have a bunch of long-running processes executing in it (e.g. slow event handlers) then tasks can stay in the "ready" state for a while. The moment of job file creation doesn't really have task state implications.

@matthewrmshin
Copy link
Contributor

(The ready state was called the submitting state in the distant past.)

@dwsutherland
Copy link
Member

Not sure I understand the "ready" state discussion above. The "ready" state means "ready to run" ... i.e. prerequisites satisfied and queued to the subprocess pool for job submission. If the subprocess pool is small and/or you have a bunch of long-running processes executing in it (e.g. slow event handlers) then tasks can stay in the "ready" state for a while. The moment of job file creation doesn't really have task state implications.

Jobs have states too... Job file creation has job state implications "ready to submit"..

@matthewrmshin
Copy link
Contributor

And we can even complicate matters by adding the (future) trigger-edit workflow to the mix:

  • Put task on hold.
  • Write job file.
  • Return job file to client.
  • (Client edits job file content.)
  • Client uploads edited job file.
  • Verify uploaded job file.
  • Release task.
  • Submit job.

(What's the status at the various stages?)

@dwsutherland
Copy link
Member

At the moment, the job data element is:

  • Created with the ready state on job file write.
  • Deleted on backout (entire job element).
  • State changed by the same mechanism that changes the task state (for active states).

So job states are a subset of task states, although ready means something slightly different I suppose.
(not saying this is how it should be of course)

@hjoliver
Copy link
Member

hjoliver commented Jul 25, 2019

Jobs have states too... Job file creation has job state implications "ready to submit"..

Hmmm. Not necessarily. I would have thought that a job does not exist until the moment it is submitted (and job file creation is something that the task does before that).

@hjoliver
Copy link
Member

hjoliver commented Jul 25, 2019

I'm really talking about task and job states that users need to be aware of. Which doesn't necessarily mean we don't need job-related stuff in the back end beyond those states. But I don't think we should refer to those as "job states" ... in the interest of avoiding confusion.

@oliver-sanders
Copy link
Member Author

Options for dealing with the "retry" state.

  • An attribute of the TaskState called is_retry (similar to is_held).
  • Attempt to meld the retry state into the is_held logic.
  • Before a retry place a wallclock xtrigger dependency on the task (which will appear in the graph).

@hjoliver
Copy link
Member

hjoliver commented Aug 15, 2019

I like wallclock xtrigger idea. In that case, if a task fails and has a retry delay lined up can we just do this:

  • add the appropriate wallclock xtrigger
  • return the task to the "waiting" state

So there's really no need for a special retry attribute or use of the "held" state (the trouble with held is, it would need to be a self-releasing hold, which is weird).

We could use a special variant of the wallclock xtrigger, that takes an absolute time instead of a cycle point offset, then we could easily tell the difference (for display purposes) between a normal clock trigger and a retry one.

@hjoliver
Copy link
Member

@matthewrmshin -

(The ready state was called the submitting state in the distant past.)

Ha, I'm suggesting going back to that cylc/cylc-admin#47

@hjoliver
Copy link
Member

On "retry" again: if we just use waiting state plus clock trigger, the new job status icons will show definitively that the task is going to retry (you'll see the previous failed job, but the task state is waiting, not failed). Nice 👍

@dwsutherland
Copy link
Member

dwsutherland commented Aug 15, 2019

@matthewrmshin -

(The ready state was called the submitting state in the distant past.)

Ha, I'm suggesting going back to that cylc/cylc-admin#47

What if the task state is ready or queued, wouldn't you think it's misleading to have a job state submitted? To me submitted implies the handing over of a script/job to the batch system ..

@hjoliver
Copy link
Member

hjoliver commented Aug 15, 2019

@dwsutherland - submitting not submitted

The idea is that once a task's prerequisites are satisfied, we go through the process of submitting it (which may take some time), after which it is indeed submitted (to the batch system).

@hjoliver
Copy link
Member

hjoliver commented Aug 15, 2019

(I think the original change of terminology from "submitting" to "ready" was because technically we are submitting the job only when running the qsub process (e.g.) which happens at the end of the "ready" state. But that is probably just splitting hairs as far as users are concerned.)

@dwsutherland
Copy link
Member

Still, do you want job state submitting while a task is queued?

@hjoliver
Copy link
Member

?? I don't follow you. Submitting (aka ready) and queued are two different task states.

@hjoliver
Copy link
Member

Oh, sorry, you said job state, not task state.

@hjoliver
Copy link
Member

There is no job state until the task is submitted.

@dwsutherland
Copy link
Member

dwsutherland commented Aug 15, 2019

There is no job state until the task is submitted.

So you think the respective data element created before should have an empty state field?

@hjoliver
Copy link
Member

hjoliver commented Aug 15, 2019

I'm just talking about the official set of task and job status names that will be exposed to users, and what they mean, exactly. Presumably you already have null job states alongside other task states like "waiting", or is your question really about when the job "data element" should be created? (If the latter, then I guess it should be created when the task achieves the "submitted" state).

@oliver-sanders
Copy link
Member Author

The requested fields have either been implemented or superseded so closing this issue.

@hjoliver hjoliver modified the milestones: cylc-8.0a3, cylc-8.0b0 Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants