-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition between cosmovisor and upgrade handler #8964
Comments
We are actually going to switch cosmovisor to just wait for the file instead of reading stdout. That should address this. Any update on the status of that fix @robert-zaremba ? |
We ended creating a "git-filter fork" of the cosmovisor code into our own repo to fix this issue. This seemed like the best way forward as we also wanted to see additional changes such as taking a snapshot of the data directory before performing an upgrade (per best practice) in case a rollback is required as well as some useful docker packaging. https://github.com/provenance-io/cosmovisor Let me know if there is interest in seeing any of that work come back into the SDK. |
Hey @iramiller -- we'll be merging #8590 into a v1.0 of cosmovisor soon (in conjunction with v0.43 release of the SDK). Hopefully that resolves this issue. Will you be able to see if the changes in #8590 take care of this? |
The other pieces you mention look like they could be great to have in cosmovisor. Maybe let's see what the v1.0 release looks like with the change to file based handlers, and then if there's interest in opening a PR upstream with the remainder of pieces from your fork that would still be useful we can get some folks on our team to review it. |
Well @clevinson -- #9384 directly implements the remediation as outlined above in this issue (even though these two are not linked at the moment). The other changes in #8590 are nice too as relying on logs at all seems like a poor design. |
This is solved in: #8590 (still in review and testing). |
Adding upgrade file watcher for cosmovisor. Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable: + depends on the log level output (x/upgrade uses INFO) + can be hacked by accidentally logging user user content + can be broken by using upgrade name which will break the regex pattern. closes: #7703 closes: #8523 closes: #8651 closes: #8793 closes: #8964 **Depends on**: - #9652 --- Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why. - [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work. - [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md). - [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`) - [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code). - [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md` - [ ] Re-reviewed `Files changed` in the Github PR explorer - [ ] Review `Codecov Report` in the comment section below once CI passes
Adding upgrade file watcher for cosmovisor. Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable: + depends on the log level output (x/upgrade uses INFO) + can be hacked by accidentally logging user user content + can be broken by using upgrade name which will break the regex pattern. closes: cosmos#7703 closes: cosmos#8523 closes: cosmos#8651 closes: cosmos#8793 closes: cosmos#8964 **Depends on**: - cosmos#9652 --- Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why. - [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work. - [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md). - [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`) - [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code). - [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md` - [ ] Re-reviewed `Files changed` in the Github PR explorer - [ ] Review `Codecov Report` in the comment section below once CI passes (cherry picked from commit 13559f9)
Adding upgrade file watcher for cosmovisor. Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable: + depends on the log level output (x/upgrade uses INFO) + can be hacked by accidentally logging user user content + can be broken by using upgrade name which will break the regex pattern. closes: cosmos#7703 closes: cosmos#8523 closes: cosmos#8651 closes: cosmos#8793 closes: cosmos#8964 **Depends on**: - cosmos#9652 --- Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why. - [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work. - [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md). - [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`) - [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code). - [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md` - [ ] Re-reviewed `Files changed` in the Github PR explorer - [ ] Review `Codecov Report` in the comment section below once CI passes (cherry picked from commit 13559f9)
Adding upgrade file watcher for cosmovisor. Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable: + depends on the log level output (x/upgrade uses INFO) + can be hacked by accidentally logging user user content + can be broken by using upgrade name which will break the regex pattern. closes: cosmos#7703 closes: cosmos#8523 closes: cosmos#8651 closes: cosmos#8793 closes: cosmos#8964 **Depends on**: - cosmos#9652 --- Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why. - [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work. - [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md). - [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`) - [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code). - [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md` - [ ] Re-reviewed `Files changed` in the Github PR explorer - [ ] Review `Codecov Report` in the comment section below once CI passes (cherry picked from commit 13559f9)
Adding upgrade file watcher for cosmovisor. Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable: + depends on the log level output (x/upgrade uses INFO) + can be hacked by accidentally logging user user content + can be broken by using upgrade name which will break the regex pattern. closes: cosmos#7703 closes: cosmos#8523 closes: cosmos#8651 closes: cosmos#8793 closes: cosmos#8964 **Depends on**: - cosmos#9652 --- Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why. - [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work. - [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md). - [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`) - [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code). - [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md` - [ ] Re-reviewed `Files changed` in the Github PR explorer - [ ] Review `Codecov Report` in the comment section below once CI passes (cherry picked from commit 13559f9)
Summary of Bug
Under certain conditions the cosmosvisor process can terminate the blockchain executable before an upgrade plan file is flushed to disk. This action prevents a successful upgrade due to a missing upgrade info file.
Version
Tested against
0.42.2
Issue is present in
latest/master
Steps to Reproduce
Create a local instance of the network running with cosmovisor. Use a local binary for upgrade such that there is no latency from downloading a binary. The cosmosvisor will terminiate the process as soon as the log message is received but before the upgrade info file can be persisted to disk.
In the following code the upgrade required message is written to the log on line 45 while the upgrade file is dumped on 49.
cosmos-sdk/x/upgrade/abci.go
Lines 45 to 52 in a78f777
Meanwhile in the cosmovisor process the monitor will execute an unclean process termination to force an immediate exist when the message appears in the logs
cosmos-sdk/cosmovisor/process.go
Lines 118 to 124 in a78f777
Remediation
Move the log message on
abci.goL45
after thek.DumpUpgradeInfoToDisk
on line 49.For Admin Use
The text was updated successfully, but these errors were encountered: