Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CV add snapshot manager to dig app #425

Merged
merged 1 commit into from
Nov 23, 2022

Conversation

chillyvee
Copy link
Collaborator

@chillyvee chillyvee commented Nov 20, 2022

What is the purpose of the change

dig v3.3.1 and cosmos-sdk v0.46.6 omit the snapshot manager
add it back in

Brief Changelog

add snapshot manager to cmd/root.go

Testing and Verifying

Currently broken

INF indexed block exents height=4409975 module=txindex
INF completed state snapshot format=2 height=4409975
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xe44873]
goroutine 6571 [running]:
github.com/cosmos/iavl.(*nodeDB).incrVersionReaders(0x2cfec78?, 0x20?)
        /home/dig/golang/1.19.3/packages/pkg/mod/github.com/cosmos/[email protected]/nodedb.go:943 +0x33
github.com/cosmos/iavl.newExporter(0xc009f74100)
        /home/dig/golang/1.19.3/packages/pkg/mod/github.com/cosmos/[email protected]/export.go:46 +0xc8
github.com/cosmos/iavl.(*ImmutableTree).Export(...)
        /home/dig/golang/1.19.3/packages/pkg/mod/github.com/cosmos/[email protected]/immutable_tree.go:159
github.com/cosmos/cosmos-sdk/store/iavl.(*Store).Export(0xc00e45d368?, 0x2cebe20?)
        /home/dig/cosmos-sdk/store/iavl/store.go:282 +0x52
github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).Snapshot(0xc000219a20, 0x434a77, {0x2cddc40, 0xc00303abd0})
        /home/dig/cosmos-sdk/store/rootmulti/store.go:742 +0x5bd
github.com/cosmos/cosmos-sdk/snapshots.(*Manager).createSnapshot(0xc000230e10, 0xc8ed66?, 0xc0016c6880?)
        /home/dig/cosmos-sdk/snapshots/manager.go:199 +0xb1
created by github.com/cosmos/cosmos-sdk/snapshots.(*Manager).Create
        /home/dig/cosmos-sdk/snapshots/manager.go:181 +0x2a5
ERR  error="exit status 2" module=cosmovisor

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@chillyvee
Copy link
Collaborator Author

chillyvee commented Nov 20, 2022

Crashes here: iavl/nodedb.go
https://github.com/cosmos/iavl/blob/v0.19.4/nodedb.go#L943

func (ndb *nodeDB) incrVersionReaders(version int64) {
        ndb.mtx.Lock() <- crashes here
        defer ndb.mtx.Unlock()
        ndb.versionReaders[version]++
}

@chillyvee
Copy link
Collaborator Author

Note: Removing snapshot directory to force a new db creation does not resolve the issue

@faddat
Copy link
Collaborator

faddat commented Nov 20, 2022

hey :)! this PR crashes?

@faddat faddat marked this pull request as draft November 20, 2022 13:19
@chillyvee
Copy link
Collaborator Author

chillyvee commented Nov 20, 2022

hey :)! this PR crashes?

Yes, need some help to figure out why it explodes. Sharing my progress :)

Trying to verify whether cosmos-sdk tests out clean, then find what's different about digd.

Only issue is most comsos-sdk tests are using memory based store rather than a full iavl db.

cosmos-sdk v0.46.6 tests clean, so it's some difference either in:

  • setup of snapshotManager
  • memory based test vs actual iavl disk store

@chillyvee
Copy link
Collaborator Author

chillyvee commented Nov 20, 2022

so far the cause found is:
icahost store has no ndb set

Snapshot.Export Store Key Done ibc
Snapshot.Export Store Key icahost
iavl/export newExporter failed to create because tree.ndb is nil
Snapshot.Export Store Key icahost exporter nil
Snapshot.Export Store Key mint
Snapshot.Export Store Key Done mint

@chillyvee
Copy link
Collaborator Author

chillyvee commented Nov 20, 2022

Root cause is that Dig's icahost store has cosmos/iavl/ImmutableTree ndb *nodedb is set to nil

Snapshot works if cosmos/iavl doesn't crash on that error.

So

  1. iavl needs a safety patch
  2. would be good to determine why dig's icahost doesn't have a nodedb set <- this should definitely be done rather than just skipping the store.
    2a) remove unused icahost?
    2b) properly configure icahost nodedb

@chillyvee
Copy link
Collaborator Author

Related PRs:

Prevent newExporter crash if tree.ndb is nil #622
cosmos/iavl#622

fix: snapshot recover from exporter error #13935
cosmos/cosmos-sdk#13935

@chillyvee
Copy link
Collaborator Author

chillyvee commented Nov 20, 2022

Snapshot is tested to work by

replacing cosmos/iavl with chillyvee/iavl branch cv_rescue_tree_ndb_nil

replacing cosmos/cosmos-sdk with chillyvee/cosmos-sdk branch v0466_snapshot_recover_from_exporter_error

tagged as follows

replace github.com/cosmos/cosmos-sdk => github.com/chillyvee/cosmos-sdk v0.46.6-dig.2

replace github.com/cosmos/iavl => github.com/chillyvee/iavl v0.19.4-dig.1

Currently running this as a build script

git clone https://github.com/chillyvee/dig 
cd dig
git checkout cv331_snapshot
go mod edit -replace github.com/cosmos/cosmos-sdk=github.com/chillyvee/[email protected]
go mod edit -replace github.com/cosmos/iavl=github.com/chillyvee/[email protected]
go mod tidy
make install

Use the following config.toml snapshot edits

  enable = true
  rpc_servers = "https://rpc-1-dig.notional.ventures:443,https://rpc-1-dig.notional.ventures:443"
  trust_hash = "fc25ffbbd03fac9c7a32f3c9a1fde171771786ea36999144565cc6ef41834de8"
  trust_height = "4409000"
  trust_period = "168h0m0s"

@chillyvee
Copy link
Collaborator Author

chillyvee commented Nov 20, 2022

goleveldb -> pebbledb restore - OK
pebbledb -> goleveldb restore - OK

@chillyvee
Copy link
Collaborator Author

chillyvee commented Nov 20, 2022

@faddat - Wondering if the pruning panic issue is due to icahost not having nodedb configured.

Pruning icahost which does not have a ndb *nodedb configured is probably what is breaking pruning

Does Dig chain use icahost? It seems to statesync/snapshot restore just fine without icahost.

In dig app/app.go

        ICAModule := ica.NewAppModule(nil, &app.ICAHostKeeper)

In cosmos/ibc-go/modules/apps/27-interchain-accounts/module.go

// NewAppModule creates a new IBC interchain accounts module
func NewAppModule(controllerKeeper *controllerkeeper.Keeper, hostKeeper *hostkeeper.Keeper) AppModule {
        return AppModule{
                controllerKeeper: controllerKeeper,
                hostKeeper:       hostKeeper,
        }
}

Would controllerKeeper: nil add to this problem?


Tried to remove ICAModule

app.mm = module.NewManager(
   ...
  //ICAModule,
)

But still get panic: cannot delete latest saved version (590)

Don't think that icahost would complain about 590 if it's empty. It is some other problem.

icahost pruning without a nodedb configured may add a problem later.

@chillyvee
Copy link
Collaborator Author

Current snapshot creation/restore testing notes published here:

#426

@faddat faddat marked this pull request as ready for review November 23, 2022 22:43
@faddat
Copy link
Collaborator

faddat commented Nov 23, 2022

amazing dude

@faddat faddat merged commit 728488a into notional-labs:master Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants