Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a delay after halt and before backup. #12101

Closed
4 tasks
dwedul-figure opened this issue May 31, 2022 · 2 comments · Fixed by #12188
Closed
4 tasks

Add a delay after halt and before backup. #12101

dwedul-figure opened this issue May 31, 2022 · 2 comments · Fixed by #12188
Assignees
Labels
C:Cosmovisor Issues and PR related to Cosmovisor T:feature-request

Comments

@dwedul-figure
Copy link
Collaborator

Summary

In Cosmovisor, allow a node operator to define a delay between the node halt (for upgrade) and backup.

Problem Definition

As a node operator, I want to define a delay after the halt (for upgrade) so that all node sub-processes have time to terminate and close file handles before moving on to steps that might need to access the files (e.g. backup or restart).

Proposal

Use a DAEMON_RESTART_DELAY environment variable to provide the desired delay as a time.Duration format (e.g. 5s or 300ms), default to 0.
As soon as the node halts, have Cosmovisor sleep for that amount of time before moving on to any subsequent steps.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@dwedul-figure dwedul-figure added C:Cosmovisor Issues and PR related to Cosmovisor T:feature-request labels May 31, 2022
@0xruff
Copy link

0xruff commented Jun 1, 2022

I second that. During some upgrades via cosmovisor I had problems with it probably restarting too fast. Cosmovisor or rather the binary it runs raises an error at these occasions that the database (leveldb/cleveldb) is still locked while trying to start a new process. After a manual restart of cosmovisor the nodes run just fine.
AFAIK cosmovisor itself does not know anything about locked files or database access of the process it runs, so the proposed delay is a good minimalistic approach to tackle the problem imo.

Logs of described issue:

May 27 xx:xx:xx <host> cosmovisor[86693]: 12:02AM INF daemon shutting down in an attempt to restart module=cosmovisor
May 27 xx:xx:xx <host> cosmovisor[86693]: 12:02AM INF pre-upgrade command does not exist. continuing the upgrade. module=cosmovisor
May 27 xx:xx:xx <host> cosmovisor[86693]: 12:02AM INF upgrade detected, relaunching app=provenanced module=cosmovisor
May 27 xx:xx:xx <host> cosmovisor[86693]: 12:02AM INF running app args=["start"] module=cosmovisor path=/home/<user>/.provenanced/cosmovisor/upgrades/lava/bin/provenanced
May 27 xx:xx:xx <host> cosmovisor[2110771]: Error: could not create "cleveldb" db in /home/<user>/.provenanced/data with name "application": failed to initialize database: IO error: lock /home/<user>/.provenanced/data/application.db/LOCK: Resource temporarily unavailable
May 27 xx:xx:xx <host> cosmovisor[86693]: 12:02AM ERR  error="exit status 1" module=cosmovisor```

@ValarDragon
Copy link
Contributor

ValarDragon commented Jun 2, 2022

Oh wow, this is a great point that I haven't thought about before

I support this being added

@julienrbrt julienrbrt self-assigned this Jun 6, 2022
@julienrbrt julienrbrt moved this to 📝 Todo in Cosmos-SDK Jun 6, 2022
@julienrbrt julienrbrt moved this from 📝 Todo to 💪 In Progress in Cosmos-SDK Jun 8, 2022
@julienrbrt julienrbrt linked a pull request Jun 8, 2022 that will close this issue
19 tasks
@julienrbrt julienrbrt moved this from 💪 In Progress to 👀 Needs Review in Cosmos-SDK Jun 8, 2022
@mergify mergify bot closed this as completed in #12188 Jun 9, 2022
mergify bot pushed a commit that referenced this issue Jun 9, 2022
…12188)

## Description

Closes: #12101



---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [x] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [ ] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/main/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#testing)
- [x] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [x] updated the relevant documentation or specification
- [x] reviewed "Files changed" and left comments if necessary
- [x] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed 
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)
Repository owner moved this from 👀 Needs Review to 👏 Done in Cosmos-SDK Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Cosmovisor Issues and PR related to Cosmovisor T:feature-request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants