-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached state will not exist for persisted check after a reboot #414
Comments
Discussed with @benhoyt and will propose a couple of PRs soon.
|
This could be fix for this issue identified above, but I would still like to look at improving the Plan Manager so that it does not propagate a plan change before managers can really cope with it (before the state engine is ready). |
Addresses #414. Running checks have changes that get persisted by the state-engine. This means that following a reboot, changes and tasks that are not complete (not ready) will be resumed. In the case of some managers, such as checkstate, carryover changes and tasks are an unwanted side effect. Currently the plan manager will perform the first plan load very early during startup (before the state-engine is ready, and before StartUp hooks have been called). The result is that a PlanChanged propagation will take place at plan load, and force checkstate to inspect the running checks prematurely. Checkstate discovers a running change from a previous boot context, and tries to load its data from cached state, which does not exist. Gracefully ignore changes for which cached state does not exist, and use this to identify changes that should be aborted on the first ensure pass. Test: ``` <reboot> 59 Hold today at 23:29 SAST today at 23:37 SAST Recover exec check "internet-online" 60 Error today at 23:37 SAST today at 23:37 SAST Perform exec check "internet-online" : 66 Doing today at 23:37 SAST - Recover exec check "internet-online" <reboot> : 66 Hold today at 23:37 SAST today at 23:41 SAST Recover exec check "internet-online" : 69 Error today at 23:41 SAST today at 23:42 SAST Perform exec check "internet-online" : 73 Doing today at 23:42 SAST - Recover exec check "internet-online" ``` As the test demonstrates, following a reboot, the change (66) is aborted, and as a result the status is now ```Hold```. A new change is created which starts as a ```Perform```, and then fails (as expected) after a while and changes to a ```Recover```.
Just copying from my personal notes. For a more holistic fix:
|
The text was updated successfully, but these errors were encountered: