Skip to content

Commit

Permalink
feat: file watcher for cosmovisor (cosmos#8590)
Browse files Browse the repository at this point in the history
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: cosmos#7703
closes: cosmos#8523
closes: cosmos#8651
closes: cosmos#8793
closes: cosmos#8964

**Depends on**:
- cosmos#9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes

(cherry picked from commit 13559f9)
  • Loading branch information
robert-zaremba authored and RiccardoM committed Nov 14, 2022
1 parent 47f4664 commit c67c248
Show file tree
Hide file tree
Showing 39 changed files with 474 additions and 342 deletions.
1 change: 1 addition & 0 deletions cosmovisor/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/cosmovisor
23 changes: 21 additions & 2 deletions cosmovisor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@

*Note: If new versions of the application are not set up to run in-place store migrations, migrations will need to be run manually before restarting `cosmovisor` with the new binary. For this reason, we recommend applications adopt in-place store migrations.*

## Contributing

Release branches has the following format `release/cosmovisor/vA.B.x`, where A and B are a number (eg: `release/cosmovisor/v0.1.x`). Releases are tagged using the following format: `cosmovisor/vA.B.C`.

## Installation

To install `cosmovisor`, run the following command:
Expand All @@ -22,6 +26,8 @@ All arguments passed to `cosmovisor` will be passed to the application binary (a
* `DAEMON_NAME` is the name of the binary itself (e.g. `gaiad`, `regend`, `simd`, etc.).
* `DAEMON_ALLOW_DOWNLOAD_BINARIES` (*optional*), if set to `true`, will enable auto-downloading of new binaries (for security reasons, this is intended for full nodes rather than validators). By default, `cosmovisor` will not auto-download new binaries.
* `DAEMON_RESTART_AFTER_UPGRADE` (*optional*), if set to `true`, will restart the subprocess with the same command-line arguments and flags (but with the new binary) after a successful upgrade. By default, `cosmovisor` stops running after an upgrade and requires the system administrator to manually restart it. Note that `cosmovisor` will not auto-restart the subprocess if there was an error.
* `DAEMON_POLL_INTERVAL` is the interval length in milliseconds for polling the upgrade plan file. Default: 300.
* `UNSAFE_SKIP_BACKUP` (defaults to `false`), if set to `false`, will backup the data before trying the upgrade. Otherwise it will upgrade directly without doing any backup. This is useful (and recommended) in case of failures and when needed to rollback. It is advised to use backup option, i.e., `UNSAFE_SKIP_BACKUP=false`

## Folder Layout

Expand All @@ -35,8 +41,9 @@ All arguments passed to `cosmovisor` will be passed to the application binary (a
│   └── $DAEMON_NAME
└── upgrades
└── <name>
└── bin
└── $DAEMON_NAME
├── bin
│   └── $DAEMON_NAME
└── upgrade-info.json
```

The `cosmovisor/` directory incudes a subdirectory for each version of the application (i.e. `genesis` or `upgrades/<name>`). Within each subdirectory is the application binary (i.e. `bin/$DAEMON_NAME`) and any additional auxiliary files associated with each binary. `current` is a symbolic link to the currently active directory (i.e. `genesis` or `upgrades/<name>`). The `name` variable in `upgrades/<name>` is the URI-encoded name of the upgrade as specified in the upgrade module plan.
Expand Down Expand Up @@ -66,6 +73,18 @@ In order to support downloadable binaries, a tarball for each upgrade binary wil

The `DAEMON` specific code and operations (e.g. tendermint config, the application db, syncing blocks, etc.) all work as expected. The application binaries' directives such as command-line flags and environment variables also work as expected.


### Detecting Upgrades

`cosmovisor` is polling the `$DAEMON_HOME/data/upgrade-info.json` file for new upgrade instructions. The file is created by the x/upgrade module in `BeginBlocker` when an upgrade is detected and the blockchain reaches the upgrade height.
The following heuristic is applied to detect the upgrade:
+ When starting, `cosmovisor` doesn't know much about currently running upgrade, except the binary which is `current/bin/`. It tries to read the `current/update-info.json` file to get information about the current upgrade name.
+ If neither `cosmovisor/current/upgrade-info.json` nor `data/upgrade-info.json` exist, then `cosmovisor` will wait for `data/upgrade-info.json` file to trigger an upgrade.
+ If `cosmovisor/current/upgrade-info.json` doesn't exist but `data/upgrade-info.json` exists, then `cosmovisor` assumes that whatever is in `data/upgrade-info.json` is a valid upgrade request. In this case `cosmovisor` tries immediately to make an upgrade according to the `name` attribute in `data/upgrade-info.json`.
+ Otherwise, `cosmovisor` waits for changes in `upgrade-info.json`. As soon as a new upgrade name is recorded in the file, `cosmovisor` will trigger an upgrade mechanism.

When the upgrade mechanism is triggered, `cosmovisor` will start by auto-downloading a new binary (if `DAEMON_ALLOW_DOWNLOAD_BINARIES` is enabled) into `cosmovisor/<name>/bin` (where `<name>` is the `upgrade-info.json:name` attribute). `cosmovisor` will then update the `current` symbolic link to point to the new directory and save `data/upgrade-info.json` to `cosmovisor/current/upgrade-info.json`.

## Auto-Download

Generally, `cosmovisor` requires that the system administrator place all relevant binaries on disk before the upgrade happens. However, for people who don't need such control and want an easier setup (maybe they are syncing a non-validating fullnode and want to do little maintenance), there is another option.
Expand Down
113 changes: 98 additions & 15 deletions cosmovisor/args.go
Original file line number Diff line number Diff line change
@@ -1,29 +1,39 @@
package cosmovisor

import (
"bufio"
"encoding/json"
"errors"
"fmt"
"io/ioutil"
"net/url"
"os"
"path/filepath"
"strconv"
"time"
)

const (
rootName = "cosmovisor"
genesisDir = "genesis"
upgradesDir = "upgrades"
currentLink = "current"
rootName = "cosmovisor"
genesisDir = "genesis"
upgradesDir = "upgrades"
currentLink = "current"
upgradeFilename = "upgrade-info.json"
)

// must be the same as x/upgrade/types.UpgradeInfoFilename
const defaultFilename = "upgrade-info.json"

// Config is the information passed in to control the daemon
type Config struct {
Home string
Name string
AllowDownloadBinaries bool
RestartAfterUpgrade bool
LogBufferSize int
PollInterval time.Duration
UnsafeSkipBackup bool

// currently running upgrade
currentUpgrade UpgradeInfo
}

// Root returns the root directory where all info lives
Expand All @@ -44,10 +54,15 @@ func (cfg *Config) UpgradeBin(upgradeName string) string {
// UpgradeDir is the directory named upgrade
func (cfg *Config) UpgradeDir(upgradeName string) string {
safeName := url.PathEscape(upgradeName)
return filepath.Join(cfg.Root(), upgradesDir, safeName)
return filepath.Join(cfg.Home, rootName, upgradesDir, safeName)
}

// UpgradeInfoFile is the expected upgrade-info filename created by `x/upgrade/keeper`.
func (cfg *Config) UpgradeInfoFilePath() string {
return filepath.Join(cfg.Home, "data", defaultFilename)
}

// Symlink to genesis
// SymLinkToGenesis creates a symbolic link from "./current" to the genesis directory.
func (cfg *Config) SymLinkToGenesis() (string, error) {
genesis := filepath.Join(cfg.Root(), genesisDir)
link := filepath.Join(cfg.Root(), currentLink)
Expand Down Expand Up @@ -83,7 +98,8 @@ func (cfg *Config) CurrentBin() (string, error) {
}

// and return the binary
return filepath.Join(dest, "bin", cfg.Name), nil
binpath := filepath.Join(dest, "bin", cfg.Name)
return binpath, nil
}

// GetConfigFromEnv will read the environmental variables into a config
Expand All @@ -102,21 +118,22 @@ func GetConfigFromEnv() (*Config, error) {
cfg.RestartAfterUpgrade = true
}

logBufferSizeStr := os.Getenv("DAEMON_LOG_BUFFER_SIZE")
if logBufferSizeStr != "" {
logBufferSize, err := strconv.Atoi(logBufferSizeStr)
interval := os.Getenv("DAEMON_POLL_INTERVAL")
if interval != "" {
i, err := strconv.ParseUint(interval, 10, 32)
if err != nil {
return nil, err
}
cfg.LogBufferSize = logBufferSize * 1024
cfg.PollInterval = time.Millisecond * time.Duration(i)
} else {
cfg.LogBufferSize = bufio.MaxScanTokenSize
cfg.PollInterval = 300 * time.Millisecond
}

cfg.UnsafeSkipBackup = os.Getenv("UNSAFE_SKIP_BACKUP") == "true"

if err := cfg.validate(); err != nil {
return nil, err
}

return cfg, nil
}

Expand Down Expand Up @@ -148,3 +165,69 @@ func (cfg *Config) validate() error {

return nil
}

// SetCurrentUpgrade sets the named upgrade to be the current link, returns error if this binary doesn't exist
func (cfg *Config) SetCurrentUpgrade(u UpgradeInfo) error {
// ensure named upgrade exists
bin := cfg.UpgradeBin(u.Name)

if err := EnsureBinary(bin); err != nil {
return err
}

// set a symbolic link
link := filepath.Join(cfg.Root(), currentLink)
safeName := url.PathEscape(u.Name)
upgrade := filepath.Join(cfg.Root(), upgradesDir, safeName)

// remove link if it exists
if _, err := os.Stat(link); err == nil {
os.Remove(link)
}

// point to the new directory
if err := os.Symlink(upgrade, link); err != nil {
return fmt.Errorf("creating current symlink: %w", err)
}

cfg.currentUpgrade = u
f, err := os.Create(filepath.Join(upgrade, upgradeFilename))
if err != nil {
return err
}
bz, err := json.Marshal(u)
if err != nil {
return err
}
if _, err := f.Write(bz); err != nil {
return err
}
return f.Close()
}

func (cfg *Config) UpgradeInfo() UpgradeInfo {
if cfg.currentUpgrade.Name != "" {
return cfg.currentUpgrade
}

filename := filepath.Join(cfg.Root(), currentLink, upgradeFilename)
_, err := os.Lstat(filename)
var u UpgradeInfo
var bz []byte
if err != nil { // no current directory
goto returnError
}
if bz, err = ioutil.ReadFile(filename); err != nil {
goto returnError
}
if err = json.Unmarshal(bz, &u); err != nil {
goto returnError
}
cfg.currentUpgrade = u
return cfg.currentUpgrade

returnError:
fmt.Println("[cosmovisor], error reading", filename, err)
cfg.currentUpgrade.Name = "_"
return cfg.currentUpgrade
}
34 changes: 34 additions & 0 deletions cosmovisor/buffer_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
package cosmovisor_test

import (
"bytes"
"sync"
)

// buffer is a thread safe bytes buffer
type buffer struct {
b bytes.Buffer
m sync.Mutex
}

func NewBuffer() *buffer {
return &buffer{}
}

func (b *buffer) Write(bz []byte) (int, error) {
b.m.Lock()
defer b.m.Unlock()
return b.b.Write(bz)
}

func (b *buffer) String() string {
b.m.Lock()
defer b.m.Unlock()
return b.b.String()
}

func (b *buffer) Reset() {
b.m.Lock()
defer b.m.Unlock()
b.b.Reset()
}
14 changes: 11 additions & 3 deletions cosmovisor/cmd/cosmovisor/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import (

func main() {
if err := Run(os.Args[1:]); err != nil {
fmt.Fprintf(os.Stderr, "%+v\n", err)
fmt.Fprintf(os.Stderr, "[cosmovisor] %+v\n", err)
os.Exit(1)
}
}
Expand All @@ -20,11 +20,19 @@ func Run(args []string) error {
if err != nil {
return err
}
launcher, err := cosmovisor.NewLauncher(cfg)
if err != nil {
return err
}

doUpgrade, err := cosmovisor.LaunchProcess(cfg, args, os.Stdout, os.Stderr)
doUpgrade, err := launcher.Run(args, os.Stdout, os.Stderr)
// if RestartAfterUpgrade, we launch after a successful upgrade (only condition LaunchProcess returns nil)
for cfg.RestartAfterUpgrade && err == nil && doUpgrade {
doUpgrade, err = cosmovisor.LaunchProcess(cfg, args, os.Stdout, os.Stderr)
fmt.Println("[cosmovisor] upgrade detected, relaunching the app ", cfg.Name)
doUpgrade, err = launcher.Run(args, os.Stdout, os.Stderr)
}
if doUpgrade && err == nil {
fmt.Println("[cosmovisor] upgrade detected, DAEMON_RESTART_AFTER_UPGRADE is off. Verify new upgrade and start cosmovisor again.")
}
return err
}
6 changes: 3 additions & 3 deletions cosmovisor/go.mod
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
module github.com/cosmos/cosmos-sdk/cosmovisor

go 1.14
go 1.15

require (
github.com/hashicorp/go-getter v1.4.1
github.com/otiai10/copy v1.2.0
github.com/stretchr/testify v1.6.1
github.com/otiai10/copy v1.4.2
github.com/stretchr/testify v1.7.0
)
12 changes: 6 additions & 6 deletions cosmovisor/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -61,20 +61,20 @@ github.com/mitchellh/go-homedir v1.0.0 h1:vKb8ShqSby24Yrqr/yDYkuFz8d0WUjys40rvnG
github.com/mitchellh/go-homedir v1.0.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
github.com/mitchellh/go-testing-interface v1.0.0 h1:fzU/JVNcaqHQEcVFAKeR41fkiLdIPrefOvVG1VZ96U0=
github.com/mitchellh/go-testing-interface v1.0.0/go.mod h1:kRemZodwjscx+RGhAo8eIhFbs2+BFgRtFPeD/KE+zxI=
github.com/otiai10/copy v1.2.0 h1:HvG945u96iNadPoG2/Ja2+AUJeW5YuFQMixq9yirC+k=
github.com/otiai10/copy v1.2.0/go.mod h1:rrF5dJ5F0t/EWSYODDu4j9/vEeYHMkc8jt0zJChqQWw=
github.com/otiai10/copy v1.4.2 h1:RTiz2sol3eoXPLF4o+YWqEybwfUa/Q2Nkc4ZIUs3fwI=
github.com/otiai10/copy v1.4.2/go.mod h1:XWfuS3CrI0R6IE0FbgHsEazaXO8G0LpMp9o8tos0x4E=
github.com/otiai10/curr v0.0.0-20150429015615-9b4961190c95/go.mod h1:9qAhocn7zKJG+0mI8eUu6xqkFDYS2kb2saOteoSB3cE=
github.com/otiai10/curr v1.0.0 h1:TJIWdbX0B+kpNagQrjgq8bCMrbhiuX73M2XwgtDMoOI=
github.com/otiai10/curr v1.0.0/go.mod h1:LskTG5wDwr8Rs+nNQ+1LlxRjAtTZZjtJW4rMXl6j4vs=
github.com/otiai10/mint v1.3.0/go.mod h1:F5AjcsTsWUqX+Na9fpHb52P8pcRX2CI6A3ctIT91xUo=
github.com/otiai10/mint v1.3.1 h1:BCmzIS3n71sGfHB5NMNDB3lHYPz8fWSkCAErHed//qc=
github.com/otiai10/mint v1.3.1/go.mod h1:/yxELlJQ0ufhjUwhshSj+wFjZ78CnZ48/1wtmBH1OTc=
github.com/otiai10/mint v1.3.2 h1:VYWnrP5fXmz1MXvjuUvcBrXSjGE6xjON+axB/UrpO3E=
github.com/otiai10/mint v1.3.2/go.mod h1:/yxELlJQ0ufhjUwhshSj+wFjZ78CnZ48/1wtmBH1OTc=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
github.com/stretchr/testify v1.6.1 h1:hDPOHmpOpP40lSULcqw7IrRb/u7w6RpDC9399XyoNd0=
github.com/stretchr/testify v1.6.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.7.0 h1:nwc3DEeHmmLAfoZucVR881uASk0Mfjw8xYJ99tb5CcY=
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/ulikunitz/xz v0.5.5 h1:pFrO0lVpTBXLpYw+pnLj6TbvHuyjXMfjGeCwSqCVwok=
github.com/ulikunitz/xz v0.5.5/go.mod h1:2bypXElzHzzJZwzH67Y6wb67pO62Rzfn7BSiF4ABRW8=
go.opencensus.io v0.21.0/go.mod h1:mSImk1erAIZhrmZN+AvHh14ztQfjbGwt4TtuofqLduU=
Expand Down
Loading

0 comments on commit c67c248

Please sign in to comment.