Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition between cosmovisor and upgrade handler #8964

Closed
4 tasks
iramiller opened this issue Mar 23, 2021 · 6 comments · Fixed by #8590
Closed
4 tasks

Race condition between cosmovisor and upgrade handler #8964

iramiller opened this issue Mar 23, 2021 · 6 comments · Fixed by #8590
Assignees
Labels
C:Cosmovisor Issues and PR related to Cosmovisor T:Bug

Comments

@iramiller
Copy link
Contributor

iramiller commented Mar 23, 2021

Summary of Bug

Under certain conditions the cosmosvisor process can terminate the blockchain executable before an upgrade plan file is flushed to disk. This action prevents a successful upgrade due to a missing upgrade info file.

Version

Tested against 0.42.2
Issue is present in latest/master

Steps to Reproduce

Create a local instance of the network running with cosmovisor. Use a local binary for upgrade such that there is no latency from downloading a binary. The cosmosvisor will terminiate the process as soon as the log message is received but before the upgrade info file can be persisted to disk.

In the following code the upgrade required message is written to the log on line 45 while the upgrade file is dumped on 49.

ctx.Logger().Error(upgradeMsg)
// Write the upgrade info to disk. The UpgradeStoreLoader uses this info to perform or skip
// store migrations.
err := k.DumpUpgradeInfoToDisk(ctx.BlockHeight(), plan.Name)
if err != nil {
panic(fmt.Errorf("unable to write upgrade info to filesystem: %s", err.Error()))
}

Meanwhile in the cosmovisor process the monitor will execute an unclean process termination to force an immediate exist when the message appears in the logs

upgrade, err := WaitForUpdate(scan)
if err != nil {
res.SetError(err)
} else if upgrade != nil {
res.SetUpgrade(upgrade)
// now we need to kill the process
_ = cmd.Process.Kill()

Remediation

Move the log message on abci.goL45 after the k.DumpUpgradeInfoToDisk on line 49.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@aaronc
Copy link
Member

aaronc commented Mar 25, 2021

We are actually going to switch cosmovisor to just wait for the file instead of reading stdout. That should address this. Any update on the status of that fix @robert-zaremba ?

@aaronc aaronc added T:Bug C:Cosmovisor Issues and PR related to Cosmovisor labels Mar 25, 2021
@iramiller
Copy link
Contributor Author

We ended creating a "git-filter fork" of the cosmovisor code into our own repo to fix this issue. This seemed like the best way forward as we also wanted to see additional changes such as taking a snapshot of the data directory before performing an upgrade (per best practice) in case a rollback is required as well as some useful docker packaging.

https://github.com/provenance-io/cosmovisor

Let me know if there is interest in seeing any of that work come back into the SDK.

@clevinson
Copy link
Contributor

clevinson commented Jun 23, 2021

Hey @iramiller -- we'll be merging #8590 into a v1.0 of cosmovisor soon (in conjunction with v0.43 release of the SDK). Hopefully that resolves this issue. Will you be able to see if the changes in #8590 take care of this?

@clevinson
Copy link
Contributor

The other pieces you mention look like they could be great to have in cosmovisor. Maybe let's see what the v1.0 release looks like with the change to file based handlers, and then if there's interest in opening a PR upstream with the remainder of pieces from your fork that would still be useful we can get some folks on our team to review it.

@iramiller
Copy link
Contributor Author

Well @clevinson -- #9384 directly implements the remediation as outlined above in this issue (even though these two are not linked at the moment). The other changes in #8590 are nice too as relying on logs at all seems like a poor design.

@robert-zaremba robert-zaremba self-assigned this Jun 28, 2021
@robert-zaremba
Copy link
Collaborator

This is solved in: #8590 (still in review and testing).

@ryanchristo ryanchristo added this to the cosmovisor v1.1 milestone Jun 29, 2021
@mergify mergify bot closed this as completed in #8590 Aug 11, 2021
mergify bot pushed a commit that referenced this issue Aug 11, 2021
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: #7703
closes: #8523
closes: #8651
closes: #8793
closes: #8964 

**Depends on**: 
- #9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes
RiccardoM pushed a commit to desmos-labs/cosmos-sdk that referenced this issue Nov 2, 2021
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: cosmos#7703
closes: cosmos#8523
closes: cosmos#8651
closes: cosmos#8793
closes: cosmos#8964

**Depends on**:
- cosmos#9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes

(cherry picked from commit 13559f9)
leobragaz pushed a commit to desmos-labs/cosmos-sdk that referenced this issue Apr 5, 2022
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: cosmos#7703
closes: cosmos#8523
closes: cosmos#8651
closes: cosmos#8793
closes: cosmos#8964

**Depends on**:
- cosmos#9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes

(cherry picked from commit 13559f9)
RiccardoM pushed a commit to desmos-labs/cosmos-sdk that referenced this issue Oct 25, 2022
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: cosmos#7703
closes: cosmos#8523
closes: cosmos#8651
closes: cosmos#8793
closes: cosmos#8964

**Depends on**:
- cosmos#9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes

(cherry picked from commit 13559f9)
RiccardoM pushed a commit to desmos-labs/cosmos-sdk that referenced this issue Nov 14, 2022
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: cosmos#7703
closes: cosmos#8523
closes: cosmos#8651
closes: cosmos#8793
closes: cosmos#8964

**Depends on**:
- cosmos#9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes

(cherry picked from commit 13559f9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Cosmovisor Issues and PR related to Cosmovisor T:Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants