Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: file watcher for cosmovisor #8590

Merged
merged 64 commits into from
Aug 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
cb6a5b9
Adding upgrade file watcher for cosmovisor
robert-zaremba Feb 15, 2021
9580900
add fsnotify file watcher
robert-zaremba Feb 15, 2021
895ee7b
use upgrade info file instead of parsing logs
robert-zaremba Feb 16, 2021
091d54a
add file parsing tests
robert-zaremba Feb 16, 2021
04fa006
update comment
robert-zaremba Feb 16, 2021
4d3434f
Merge branch 'master' into robert/cosmvisor-file-watch
robert-zaremba Jun 21, 2021
0145cdc
Update x/upgrade for writing Plan to the disk
robert-zaremba Jun 21, 2021
0c20af4
refactore cosmwisor scanner and launcher
robert-zaremba Jun 21, 2021
e8b5c6b
update ENV params
robert-zaremba Jun 21, 2021
b46399a
update tests
robert-zaremba Jun 21, 2021
ae4280a
attach stderr and stdout
robert-zaremba Jun 22, 2021
81a437a
fix update binary path
robert-zaremba Jun 22, 2021
2d780ab
update tests
robert-zaremba Jun 22, 2021
aed19cb
update tests
robert-zaremba Jun 22, 2021
508741f
tests update
robert-zaremba Jun 24, 2021
fcc8ed1
debugging
robert-zaremba Jun 24, 2021
76974af
add logs
robert-zaremba Jul 2, 2021
b3c3732
docs(gov): improve gov module spec
robert-zaremba Jul 12, 2021
8d7d2fe
adding currentUpgrade name check
robert-zaremba Jul 12, 2021
abbcf16
adding one more log message
robert-zaremba Jul 13, 2021
ca65678
update log
robert-zaremba Jul 13, 2021
8828498
compatibility with go 1.15
robert-zaremba Jul 13, 2021
f8353c1
add more debug logs
robert-zaremba Jul 13, 2021
62428e9
fix filename
robert-zaremba Jul 13, 2021
24d2bac
describe update detaction mechanism in README
robert-zaremba Jul 13, 2021
8051146
adding cosmovisor binary to gitignore
robert-zaremba Jul 13, 2021
f97530f
Merge branch 'master' into robert/cosmvisor-file-watch
robert-zaremba Jul 13, 2021
420245e
update tests
robert-zaremba Jul 14, 2021
f417c7d
update download tests
robert-zaremba Jul 14, 2021
c2ea0d3
rename test files
robert-zaremba Jul 14, 2021
6884d04
update checksum
robert-zaremba Jul 14, 2021
67fbf64
udpate test chain files
robert-zaremba Jul 14, 2021
5704397
update chain files
robert-zaremba Jul 14, 2021
53e3a95
update checksums
robert-zaremba Jul 14, 2021
7b9e039
remove debug logs
robert-zaremba Jul 14, 2021
327675a
remove debug info
robert-zaremba Jul 14, 2021
fb071a6
review changes
robert-zaremba Jul 14, 2021
7b6641f
update log text
robert-zaremba Jul 14, 2021
0cb7150
Merge remote-tracking branch 'origin/master' into robert/cosmvisor-fi…
robert-zaremba Jul 14, 2021
5bac319
add missing defer to mutex.Unlock in the test buffer
robert-zaremba Jul 16, 2021
7e2be0c
wip - data race
robert-zaremba Jul 16, 2021
c63fa6c
solve linter issue
robert-zaremba Jul 17, 2021
1dd4dfc
cleaning tests
robert-zaremba Jul 17, 2021
91fa1c6
bring back clearing upgrade plan
robert-zaremba Jul 17, 2021
bc4b9fb
Merge branch 'master' into robert/cosmvisor-file-watch
robert-zaremba Jul 17, 2021
341269d
add changelog entry
robert-zaremba Jul 17, 2021
8ad8f94
SetCurrentUpgrade: save UpgradeInfo object as JSON rather than only name
robert-zaremba Jul 17, 2021
8fb1ebf
Apply suggestions from code review
robert-zaremba Jul 28, 2021
edd34ca
review changes
robert-zaremba Jul 28, 2021
8d4c34c
remove UpgradeInfoFilename config
robert-zaremba Jul 28, 2021
acf9c6a
Merge branch 'master' into robert/cosmvisor-file-watch
robert-zaremba Jul 28, 2021
6281866
documentation update
robert-zaremba Jul 28, 2021
efecbed
add validation to the scanner
robert-zaremba Jul 28, 2021
f657b5b
add more upgrade-info parser tests
robert-zaremba Jul 28, 2021
48d7a38
adding one more test
robert-zaremba Jul 28, 2021
58dbbd3
logs update
robert-zaremba Jul 29, 2021
5e2d186
Merge branch 'master' into robert/cosmvisor-file-watch
robert-zaremba Jul 29, 2021
2bcc571
Merge branch 'master' into robert/cosmvisor-file-watch
robert-zaremba Aug 6, 2021
e3d378c
improve comment
robert-zaremba Aug 6, 2021
624d25f
Update cosmovisor/process.go
robert-zaremba Aug 10, 2021
4e101ad
Apply suggestions from code review
robert-zaremba Aug 10, 2021
ed5c0cd
Merge branch 'master' into robert/cosmvisor-file-watch
robert-zaremba Aug 10, 2021
c632332
Merge branch 'master' into robert/cosmvisor-file-watch
robert-zaremba Aug 11, 2021
25a2378
update cosmovisor readme
robert-zaremba Aug 11, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cosmovisor/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/cosmovisor
18 changes: 16 additions & 2 deletions cosmovisor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ All arguments passed to `cosmovisor` will be passed to the application binary (a
* `DAEMON_NAME` is the name of the binary itself (e.g. `gaiad`, `regend`, `simd`, etc.).
* `DAEMON_ALLOW_DOWNLOAD_BINARIES` (*optional*), if set to `true`, will enable auto-downloading of new binaries (for security reasons, this is intended for full nodes rather than validators). By default, `cosmovisor` will not auto-download new binaries.
* `DAEMON_RESTART_AFTER_UPGRADE` (*optional*), if set to `true`, will restart the subprocess with the same command-line arguments and flags (but with the new binary) after a successful upgrade. By default, `cosmovisor` stops running after an upgrade and requires the system administrator to manually restart it. Note that `cosmovisor` will not auto-restart the subprocess if there was an error.
* `DAEMON_POLL_INTERVAL` is the interval length in milliseconds for polling the upgrade plan file. Default: 300.
* `UNSAFE_SKIP_BACKUP` (defaults to `false`), if set to `false`, will backup the data before trying the upgrade. Otherwise it will upgrade directly without doing any backup. This is useful (and recommended) in case of failures and when needed to rollback. It is advised to use backup option, i.e., `UNSAFE_SKIP_BACKUP=false`

## Folder Layout
Expand All @@ -40,8 +41,9 @@ All arguments passed to `cosmovisor` will be passed to the application binary (a
│   └── $DAEMON_NAME
└── upgrades
└── <name>
└── bin
└── $DAEMON_NAME
├── bin
│   └── $DAEMON_NAME
└── upgrade-info.json
```

The `cosmovisor/` directory incudes a subdirectory for each version of the application (i.e. `genesis` or `upgrades/<name>`). Within each subdirectory is the application binary (i.e. `bin/$DAEMON_NAME`) and any additional auxiliary files associated with each binary. `current` is a symbolic link to the currently active directory (i.e. `genesis` or `upgrades/<name>`). The `name` variable in `upgrades/<name>` is the URI-encoded name of the upgrade as specified in the upgrade module plan.
Expand Down Expand Up @@ -71,6 +73,18 @@ In order to support downloadable binaries, a tarball for each upgrade binary wil

The `DAEMON` specific code and operations (e.g. tendermint config, the application db, syncing blocks, etc.) all work as expected. The application binaries' directives such as command-line flags and environment variables also work as expected.


### Detecting Upgrades

`cosmovisor` is polling the `$DAEMON_HOME/data/upgrade-info.json` file for new upgrade instructions. The file is created by the x/upgrade module in `BeginBlocker` when an upgrade is detected and the blockchain reaches the upgrade height.
The following heuristic is applied to detect the upgrade:
+ When starting, `cosmovisor` doesn't know much about currently running upgrade, except the binary which is `current/bin/`. It tries to read the `current/update-info.json` file to get information about the current upgrade name.
+ If neither `cosmovisor/current/upgrade-info.json` nor `data/upgrade-info.json` exist, then `cosmovisor` will wait for `data/upgrade-info.json` file to trigger an upgrade.
+ If `cosmovisor/current/upgrade-info.json` doesn't exist but `data/upgrade-info.json` exists, then `cosmovisor` assumes that whatever is in `data/upgrade-info.json` is a valid upgrade request. In this case `cosmovisor` tries immediately to make an upgrade according to the `name` attribute in `data/upgrade-info.json`.
+ Otherwise, `cosmovisor` waits for changes in `upgrade-info.json`. As soon as a new upgrade name is recorded in the file, `cosmovisor` will trigger an upgrade mechanism.

When the upgrade mechanism is triggered, `cosmovisor` will start by auto-downloading a new binary (if `DAEMON_ALLOW_DOWNLOAD_BINARIES` is enabled) into `cosmovisor/<name>/bin` (where `<name>` is the `upgrade-info.json:name` attribute). `cosmovisor` will then update the `current` symbolic link to point to the new directory and save `data/upgrade-info.json` to `cosmovisor/current/upgrade-info.json`.

## Auto-Download

Generally, `cosmovisor` requires that the system administrator place all relevant binaries on disk before the upgrade happens. However, for people who don't need such control and want an easier setup (maybe they are syncing a non-validating fullnode and want to do little maintenance), there is another option.
Expand Down
116 changes: 98 additions & 18 deletions cosmovisor/args.go
Original file line number Diff line number Diff line change
@@ -1,30 +1,39 @@
package cosmovisor

import (
"bufio"
"encoding/json"
"errors"
"fmt"
"io/ioutil"
"net/url"
"os"
"path/filepath"
"strconv"
"time"
)

const (
rootName = "cosmovisor"
genesisDir = "genesis"
upgradesDir = "upgrades"
currentLink = "current"
rootName = "cosmovisor"
genesisDir = "genesis"
upgradesDir = "upgrades"
currentLink = "current"
upgradeFilename = "upgrade-info.json"
)

// must be the same as x/upgrade/types.UpgradeInfoFilename
const defaultFilename = "upgrade-info.json"
anilcse marked this conversation as resolved.
Show resolved Hide resolved

// Config is the information passed in to control the daemon
type Config struct {
Home string
Name string
AllowDownloadBinaries bool
RestartAfterUpgrade bool
LogBufferSize int
PollInterval time.Duration
UnsafeSkipBackup bool

// currently running upgrade
currentUpgrade UpgradeInfo
}

// Root returns the root directory where all info lives
Expand All @@ -45,10 +54,15 @@ func (cfg *Config) UpgradeBin(upgradeName string) string {
// UpgradeDir is the directory named upgrade
func (cfg *Config) UpgradeDir(upgradeName string) string {
safeName := url.PathEscape(upgradeName)
return filepath.Join(cfg.Root(), upgradesDir, safeName)
return filepath.Join(cfg.Home, rootName, upgradesDir, safeName)
}

// UpgradeInfoFile is the expected upgrade-info filename created by `x/upgrade/keeper`.
func (cfg *Config) UpgradeInfoFilePath() string {
return filepath.Join(cfg.Home, "data", defaultFilename)
}

// Symlink to genesis
// SymLinkToGenesis creates a symbolic link from "./current" to the genesis directory.
func (cfg *Config) SymLinkToGenesis() (string, error) {
genesis := filepath.Join(cfg.Root(), genesisDir)
link := filepath.Join(cfg.Root(), currentLink)
Expand All @@ -67,24 +81,25 @@ func (cfg *Config) CurrentBin() (string, error) {
// if nothing here, fallback to genesis
info, err := os.Lstat(cur)
if err != nil {
//Create symlink to the genesis
// Create symlink to the genesis
return cfg.SymLinkToGenesis()
}
// if it is there, ensure it is a symlink
if info.Mode()&os.ModeSymlink == 0 {
//Create symlink to the genesis
// Create symlink to the genesis
return cfg.SymLinkToGenesis()
}

// resolve it
dest, err := os.Readlink(cur)
if err != nil {
//Create symlink to the genesis
// Create symlink to the genesis
return cfg.SymLinkToGenesis()
}

// and return the binary
return filepath.Join(dest, "bin", cfg.Name), nil
binpath := filepath.Join(dest, "bin", cfg.Name)
return binpath, nil
}

// GetConfigFromEnv will read the environmental variables into a config
Expand All @@ -103,23 +118,22 @@ func GetConfigFromEnv() (*Config, error) {
cfg.RestartAfterUpgrade = true
}

logBufferSizeStr := os.Getenv("DAEMON_LOG_BUFFER_SIZE")
if logBufferSizeStr != "" {
logBufferSize, err := strconv.Atoi(logBufferSizeStr)
interval := os.Getenv("DAEMON_POLL_INTERVAL")
if interval != "" {
i, err := strconv.ParseUint(interval, 10, 32)
if err != nil {
return nil, err
}
cfg.LogBufferSize = logBufferSize * 1024
cfg.PollInterval = time.Millisecond * time.Duration(i)
} else {
cfg.LogBufferSize = bufio.MaxScanTokenSize
cfg.PollInterval = 300 * time.Millisecond
}

cfg.UnsafeSkipBackup = os.Getenv("UNSAFE_SKIP_BACKUP") == "true"

if err := cfg.validate(); err != nil {
return nil, err
}

return cfg, nil
}

Expand Down Expand Up @@ -151,3 +165,69 @@ func (cfg *Config) validate() error {

return nil
}

// SetCurrentUpgrade sets the named upgrade to be the current link, returns error if this binary doesn't exist
func (cfg *Config) SetCurrentUpgrade(u UpgradeInfo) error {
// ensure named upgrade exists
bin := cfg.UpgradeBin(u.Name)

if err := EnsureBinary(bin); err != nil {
return err
}

// set a symbolic link
link := filepath.Join(cfg.Root(), currentLink)
safeName := url.PathEscape(u.Name)
upgrade := filepath.Join(cfg.Root(), upgradesDir, safeName)

// remove link if it exists
if _, err := os.Stat(link); err == nil {
os.Remove(link)
}

// point to the new directory
if err := os.Symlink(upgrade, link); err != nil {
return fmt.Errorf("creating current symlink: %w", err)
}

cfg.currentUpgrade = u
f, err := os.Create(filepath.Join(upgrade, upgradeFilename))
if err != nil {
return err
}
bz, err := json.Marshal(u)
if err != nil {
return err
}
if _, err := f.Write(bz); err != nil {
return err
}
return f.Close()
}

func (cfg *Config) UpgradeInfo() UpgradeInfo {
if cfg.currentUpgrade.Name != "" {
return cfg.currentUpgrade
}

filename := filepath.Join(cfg.Root(), currentLink, upgradeFilename)
_, err := os.Lstat(filename)
var u UpgradeInfo
var bz []byte
if err != nil { // no current directory
goto returnError
}
if bz, err = ioutil.ReadFile(filename); err != nil {
goto returnError
}
if err = json.Unmarshal(bz, &u); err != nil {
goto returnError
}
cfg.currentUpgrade = u
return cfg.currentUpgrade

returnError:
fmt.Println("[cosmovisor], error reading", filename, err)
cfg.currentUpgrade.Name = "_"
return cfg.currentUpgrade
}
34 changes: 34 additions & 0 deletions cosmovisor/buffer_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
package cosmovisor_test

import (
"bytes"
"sync"
)

// buffer is a thread safe bytes buffer
type buffer struct {
b bytes.Buffer
m sync.Mutex
}

func NewBuffer() *buffer {
return &buffer{}
}

func (b *buffer) Write(bz []byte) (int, error) {
b.m.Lock()
defer b.m.Unlock()
return b.b.Write(bz)
}

func (b *buffer) String() string {
b.m.Lock()
defer b.m.Unlock()
return b.b.String()
}

func (b *buffer) Reset() {
b.m.Lock()
defer b.m.Unlock()
b.b.Reset()
}
15 changes: 11 additions & 4 deletions cosmovisor/cmd/cosmovisor/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import (

func main() {
if err := Run(os.Args[1:]); err != nil {
fmt.Fprintf(os.Stderr, "%+v\n", err)
fmt.Fprintf(os.Stderr, "[cosmovisor] %+v\n", err)
os.Exit(1)
}
}
Expand All @@ -20,12 +20,19 @@ func Run(args []string) error {
if err != nil {
return err
}
launcher, err := cosmovisor.NewLauncher(cfg)
if err != nil {
return err
}

doUpgrade, err := cosmovisor.LaunchProcess(cfg, args, os.Stdout, os.Stderr)

doUpgrade, err := launcher.Run(args, os.Stdout, os.Stderr)
// if RestartAfterUpgrade, we launch after a successful upgrade (only condition LaunchProcess returns nil)
for cfg.RestartAfterUpgrade && err == nil && doUpgrade {
doUpgrade, err = cosmovisor.LaunchProcess(cfg, args, os.Stdout, os.Stderr)
fmt.Println("[cosmovisor] upgrade detected, relaunching the app ", cfg.Name)
doUpgrade, err = launcher.Run(args, os.Stdout, os.Stderr)
}
if doUpgrade && err == nil {
fmt.Println("[cosmovisor] upgrade detected, DAEMON_RESTART_AFTER_UPGRADE is off. Verify new upgrade and start cosmovisor again.")
}
return err
}
6 changes: 3 additions & 3 deletions cosmovisor/go.mod
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
module github.com/cosmos/cosmos-sdk/cosmovisor

go 1.14
go 1.15

require (
github.com/hashicorp/go-getter v1.4.1
github.com/otiai10/copy v1.2.0
github.com/stretchr/testify v1.6.1
github.com/otiai10/copy v1.4.2
github.com/stretchr/testify v1.7.0
)
13 changes: 6 additions & 7 deletions cosmovisor/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -61,21 +61,20 @@ github.com/mitchellh/go-homedir v1.0.0 h1:vKb8ShqSby24Yrqr/yDYkuFz8d0WUjys40rvnG
github.com/mitchellh/go-homedir v1.0.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
github.com/mitchellh/go-testing-interface v1.0.0 h1:fzU/JVNcaqHQEcVFAKeR41fkiLdIPrefOvVG1VZ96U0=
github.com/mitchellh/go-testing-interface v1.0.0/go.mod h1:kRemZodwjscx+RGhAo8eIhFbs2+BFgRtFPeD/KE+zxI=
github.com/otiai10/copy v1.2.0 h1:HvG945u96iNadPoG2/Ja2+AUJeW5YuFQMixq9yirC+k=
github.com/otiai10/copy v1.2.0/go.mod h1:rrF5dJ5F0t/EWSYODDu4j9/vEeYHMkc8jt0zJChqQWw=
github.com/otiai10/copy v1.4.2 h1:RTiz2sol3eoXPLF4o+YWqEybwfUa/Q2Nkc4ZIUs3fwI=
github.com/otiai10/copy v1.4.2/go.mod h1:XWfuS3CrI0R6IE0FbgHsEazaXO8G0LpMp9o8tos0x4E=
github.com/otiai10/curr v0.0.0-20150429015615-9b4961190c95/go.mod h1:9qAhocn7zKJG+0mI8eUu6xqkFDYS2kb2saOteoSB3cE=
github.com/otiai10/curr v1.0.0 h1:TJIWdbX0B+kpNagQrjgq8bCMrbhiuX73M2XwgtDMoOI=
github.com/otiai10/curr v1.0.0/go.mod h1:LskTG5wDwr8Rs+nNQ+1LlxRjAtTZZjtJW4rMXl6j4vs=
github.com/otiai10/mint v1.3.0/go.mod h1:F5AjcsTsWUqX+Na9fpHb52P8pcRX2CI6A3ctIT91xUo=
github.com/otiai10/mint v1.3.1 h1:BCmzIS3n71sGfHB5NMNDB3lHYPz8fWSkCAErHed//qc=
github.com/otiai10/mint v1.3.1/go.mod h1:/yxELlJQ0ufhjUwhshSj+wFjZ78CnZ48/1wtmBH1OTc=
github.com/otiai10/mint v1.3.2 h1:VYWnrP5fXmz1MXvjuUvcBrXSjGE6xjON+axB/UrpO3E=
github.com/otiai10/mint v1.3.2/go.mod h1:/yxELlJQ0ufhjUwhshSj+wFjZ78CnZ48/1wtmBH1OTc=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.2.2 h1:bSDNvY7ZPG5RlJ8otE/7V6gMiyenm9RtJ7IUVIAoJ1w=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
github.com/stretchr/testify v1.6.1 h1:hDPOHmpOpP40lSULcqw7IrRb/u7w6RpDC9399XyoNd0=
github.com/stretchr/testify v1.6.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.7.0 h1:nwc3DEeHmmLAfoZucVR881uASk0Mfjw8xYJ99tb5CcY=
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/ulikunitz/xz v0.5.5 h1:pFrO0lVpTBXLpYw+pnLj6TbvHuyjXMfjGeCwSqCVwok=
github.com/ulikunitz/xz v0.5.5/go.mod h1:2bypXElzHzzJZwzH67Y6wb67pO62Rzfn7BSiF4ABRW8=
go.opencensus.io v0.21.0/go.mod h1:mSImk1erAIZhrmZN+AvHh14ztQfjbGwt4TtuofqLduU=
Expand Down
Loading