-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove named checkpointing #3891
Comments
To be fully removed - cylc#3891
On the Discourse discussion, someone has mentioned named vs automatic checkpoints:
I think I can find 3 types of automatic checkpoints:
What is to happen to those? Technically they are named checkpoints. If their functionality is to be kept intact, does that mean we can't removed named checkpoints? |
(Adding this to the 8.0a3 milestone, unless that was not the intent) |
Drop the two automatic reload checkpoints (2 and 3) as well as arbitrary checkpoints taken with Keep the restart checkpoint, since without that we couldn't restart from the latest state. This gets overwritten whenever anything changes, so there's only ever one of them. Reflow can't replace this (e.g. a proper restart knows about tasks that were running the scheduler went down). As I recall the "tatest" (restart) checkpoint appears in the checkpoint_id table, but after this it will be the only entry so there's no point in keeping the table. Further, if you look in So the checkpoint_id table, and the three blah_checkpoints tables, |
(Arrgh, lost control of keyboard! - reopening!) |
@oliver-sanders - I wonder if we should rename the "latest checkpoint" (/"latest_state"/"latest snapshot") tables consistently, to make it obvious what they are and distinguish from the ever-growing run history tables? Currently:
New?
|
Hmm, one thing I forgot: we also record a named checkpoint at every restart. sqlite> select * from checkpoint_id;
0|2020-10-28T11:42:38+13:00|latest
1|2020-10-28T11:42:19+13:00|restart
2|2020-10-28T11:42:26+13:00|restart I've never heard of anyone wanting to restart from a previous (not latest) restart state with all the broadcasts that happened to be extant at the time. So I suppose it's fine to ditch that. (However, we should warn users about broadcasts in the context of reflows: beware of any reflowed tasks that expect input via broadcasts from other pre-reflow tasks - mind you, the same is already true for warm starts. We should advise to use broadcast sparingly and to know where they are in the flow). However, suite_params_checkpoints serves as a historical record of global parameters that only change during restart or reload: sqlite> select * from suite_params;
uuid_str|fe7c3ac4-584a-498e-a273-2a9277970cd4
cylc_version|8.0a3.dev
UTC_mode|0
cycle_point_tz|+1300
sqlite> select * from suite_params_checkpoints;
1|uuid_str|fe7c3ac4-584a-498e-a273-2a9277970cd4
1|cylc_version|8.0a3.dev
1|UTC_mode|0
1|cycle_point_tz|+1300
2|uuid_str|fe7c3ac4-584a-498e-a273-2a9277970cd4
2|cylc_version|8.0a3.dev
2|UTC_mode|0
2|cycle_point_tz|+1300 Perhaps we should replace this with an ever-growing On the other hand, the DB does not track changes to everything in the flow config, so maybe it doesn't matter. |
It would be nice from a provenance perspective to maintain this information, however, I don't think it's a biggie, can tackle later on if required. |
To be fully removed - cylc#3891
Am I right in thinking only the Update: Ah wait, the initial cycle point etc could change |
Yep, also the user can specify different suite variables (e.g |
I've put up these changes: https://github.com/cylc/cylc-flow/pull/3906/files/8b1f2956de934b1d2996e3e2310c7274818c18da..0eb2b03cff3fc53822ba4293bfe4fbcbc582e9a5 So far I haven't removed the |
Yes remove them. The suite_params aren’t very useful as-is (particularly without corresponding state checkpoints) and as Oliver noted we can come back to a better provenance solution later if needed. |
What about keeping track of the number of restarts? Could we just have a table that consists of 1 cell that keeps the count? |
Do we need to keep track of that? Maybe (but not on this PR) a new “suite events” table that keeps a history of all restarts and interventions ... |
The number of the restart gets printed to the log on every restart. Can either comment that out or get rid of it |
Ok, I forgot about that. Seems to me we could get rid of it now, in lieu of a better provenance system in future. But maybe do it as a clean commit that can be backed out if others disagree! (Although I can’t think of any real use for that info at the moment) |
It's quite nice when you are trawling through the logs of some massive suite, or one where the user has repeatably restarted it in the hope that they might get a different outcome. Could record the re(start) number as a suite parameter? |
Supersedes #3864
Long story short:
So: let's get rid of it.
@MetRonnie - assigning you as you've run into this somewhat in #3863, but someone else can take it on if need be...
The text was updated successfully, but these errors were encountered: