Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC requests during node startup trigger error #14620

Closed
ttl33 opened this issue Jan 14, 2023 · 23 comments · Fixed by #14692
Closed

RPC requests during node startup trigger error #14620

ttl33 opened this issue Jan 14, 2023 · 23 comments · Fixed by #14692
Assignees
Labels

Comments

@ttl33
Copy link
Contributor

ttl33 commented Jan 14, 2023

Summary of Bug

RPC requests to nodes when a node is just getting initialized/started trigger the following error:

"rpc error: code = Unknown desc = codespace sdk code 18: invalid request: failed to load state at height 0; version mismatch on immutable IAVL tree; version does not exist. Version has either been pruned, or is for a future block height (latest height: 0)",

The error is being thrown here when creating the context for the query.

It seems that the IAVL tree does not exist at height 0, so the height hasn't been created yet (here is the VersionExists method on the IAVL struct).

Cosmos SDK version previously returned an empty IAVL tree in this scenario. PR that introduced the change is #13355.

This might be related to this other issue.

Version

v0.47.0-alpha2

Steps to Reproduce

  • Create some task/job that call a node RPC endpoint
  • Start a new node
@facundomedica
Copy link
Member

To me it makes sense that it returns an error. Height 0 isn't a thing, and this error looks easily handleable on the client's side (I might be very wrong tho).

@ttl33
Copy link
Contributor Author

ttl33 commented Jan 17, 2023

Yes, it can be handled on the client side, but my focus is mostly on how the request is being handled on the node (server) side.

It's odd that the node would throw an error if "height 0 is not a thing". I would assume that the node would reject the call until the setup has been complete. It feels like the node is opening up RPC endpoints before the node is fully ready, which doesn't quite feel right.

IMO, it's better for the node to be defensive and gracefully handle these requests before the node is ready to serve these requests. Let me know your thoughts!

@alexanderbez
Copy link
Contributor

Which RPC endpoint are you referring to exactly? There's a bunch of moving pieces (Tendermint has plenty) and then the app itself, so it's not that straightforward.

@ttl33
Copy link
Contributor Author

ttl33 commented Jan 19, 2023

We ran into errors when making gRPC query requests. I can't share the exact gRPC query method (given that our codebase is private for now), but it's essentially any of the module specific query methods like this one

@julienrbrt julienrbrt self-assigned this Jan 19, 2023
@julienrbrt julienrbrt moved this from 📝 Todo to 💪 In Progress in Cosmos-SDK Jan 19, 2023
@julienrbrt julienrbrt moved this from 💪 In Progress to 👀 Needs Review in Cosmos-SDK Jan 19, 2023
@alexanderbez
Copy link
Contributor

And you made these queries right as the app started up?

@github-project-automation github-project-automation bot moved this from 👀 Needs Review to 👏 Done in Cosmos-SDK Jan 19, 2023
@arhamchordia
Copy link

arhamchordia commented Feb 2, 2023

I tried making the chain run using sdk v0.46.4. And got a similar error but at a different height. It occurs when I try to query the chain.
Error: rpc error: code = InvalidArgument desc = failed to load state at height 10; version does not exist (latest height: 10): invalid request
It fails while trying to createQueryContext(). Enters this function and fails for feegrant while trying to do this.

@alexanderbez
Copy link
Contributor

@arhamchordia was this on an upgraded binary?

@arhamchordia
Copy link

@arhamchordia was this on an upgraded binary?

@alexanderbez I tried it on simd and there was no such error.
So, do you think it depends on how things are declared in app.go?

@alexanderbez
Copy link
Contributor

No, when you see that error it's a function of your pruning settings and if the binary was upgraded or not. So did you upgrade the network and what are your pruning settings?

@arhamchordia
Copy link

arhamchordia commented Feb 6, 2023

No, when you see that error it's a function of your pruning settings and if the binary was upgraded or not. So did you upgrade the network and what are your pruning settings?

I tried starting it up from scratch with a fresh genesis. Also the pruning settings are set to "default".

After attempting to query multiple time. It ends up with an error and chain halts: maximum number of retries exceeded, last error: rpc error: code = Unknown desc = codespace sdk code 18: invalid request: failed to load state at height 30; version does not exist (latest height: 30)

@Reecepbcups
Copy link
Member

Reecepbcups commented Feb 6, 2023

We are seeing the same thing on Juno after bumping up to SDK 45.12 in our e2e. I noticed this as well before we bumped tendermint to the 35.25 fix too.

https://github.com/CosmosContracts/juno/actions/runs/4106733477/jobs/7085521033#step:9:890

Additional Context:

  • upgrade height 71
  • upgrade goes through, then 5 blocks later it says upgradeHeight-2 (69 in this case) is bad after it queries PEX

failed to load state at height 69; version mismatch on immutable IAVL tree; version does not exist. Version has either been pruned, or is for a future block height (latest height: 69)

I have replicated this locally without any custom modules too. So feel its something with the patch used to fix this issue

@faddat
Copy link
Contributor

faddat commented Feb 6, 2023

Yeah I recommend that we re-open this.

Something is off.

@alexanderbez
Copy link
Contributor

@Reecepbcups what are the pruning settings in your instance? Are heights 70 and 71 queryable? What about 68?

@Reecepbcups
Copy link
Member

Reecepbcups commented Feb 7, 2023

@alexanderbez pruning = nothing, single and multi node setups

Other blocks around it are querying just fine (after I start the node back, since panic = exit)

I patched it in my fork like so (how it was in 45.11) and it works just fine.

Reecepbcups@9fa1c52

My theories:

  • the new change in 45.12 (error instead of nil) has an incorrect panic from a calling function somewhere but then:
  • the IAVL store is losing reference to versions or something? Really odd.

Looking into more this week

@alexanderbez
Copy link
Contributor

the new change in 45.12 (error instead of nil) has an incorrect panic from a calling function somewhere but then:

This seems more probable to me. In fact, after some quick digging it is the case ->

CreateQueryContext -> CacheMultiStoreWithVersion

Now that it errors, CreateQueryContext will return that error. What we need to do is return a typed error and ignore it in CreateQueryContext. I'll open a PR.

the IAVL store is losing reference to versions or something? Really odd.

I'm a bit suspect on this, I doubt it.

@alexanderbez
Copy link
Contributor

ref: #13355

So it seems this was intentional. To return nil and error, meaning the version does not exist. So why is it the case that on a height that pre-upgrade it does not exist?

@Reecepbcups
Copy link
Member

@alexanderbez So we just had like 5 people look through this through the night
turns out out we were missing a storeKey in our upgrade handler. The reason being: we added this in a previous upgrade on mainnet (v8)
So in the localnet/testnet environment, it fails because the key has not been added yet

Maybe on IAVL error should mention version AND dump the diff of root keys before and after the upgrade? not sure if this is possible but will remove so much headache in the future

@alexanderbez
Copy link
Contributor

You mean you forgot to set the added field? If so, we already patched that.

@dzmitryhil
Copy link

@Reecepbcups could you please elaborate a bit on the changes you made to fix that issue. We faced the similar.
After the upgrade where we introduce 2 new modules and update the cosmos from v0.45.11 to v0.45.14 we get the error on any RPC request.
Upgrade handler

app.UpgradeKeeper.SetUpgradeHandler("upgrade", func(ctx sdk.Context, plan upgradetypes.Plan, fromVM module.VersionMap) (module.VersionMap, error) {
			return app.mm.RunMigrations(ctx, app.configurator, fromVM)
		})

Error

github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).createQueryContext
	github.com/cosmos/[email protected]/baseapp/abci.go:674
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).handleQueryGRPC
	github.com/cosmos/[email protected]/baseapp/abci.go:596
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).Query
	github.com/cosmos/[email protected]/baseapp/abci.go:445
github.com/tendermint/tendermint/abci/client.(*localClient).QuerySync
	github.com/tendermint/[email protected]/abci/client/local_client.go:256
github.com/tendermint/tendermint/proxy.(*appConnQuery).QuerySync
	github.com/tendermint/[email protected]/proxy/app_conn.go:159
github.com/tendermint/tendermint/rpc/core.ABCIQuery
	github.com/tendermint/[email protected]/rpc/core/abci.go:20
github.com/tendermint/tendermint/rpc/client/local.(*Local).ABCIQueryWithOptions
	github.com/tendermint/[email protected]/rpc/client/local/local.go:87
github.com/cosmos/cosmos-sdk/client.Context.queryABCI
	github.com/cosmos/[email protected]/client/query.go:94
github.com/cosmos/cosmos-sdk/client.Context.QueryABCI
	github.com/cosmos/[email protected]/client/query.go:57
github.com/cosmos/cosmos-sdk/client.Context.Invoke
	github.com/cosmos/[email protected]/client/grpc_query.go:81
github.com/CoreumFoundation/coreum/x/customparams/types.(*queryClient).StakingParams
	github.com/CoreumFoundation/coreum/x/customparams/types/query.pb.go:174
github.com/CoreumFoundation/coreum/x/customparams/types.request_Query_StakingParams_0
	github.com/CoreumFoundation/coreum/x/customparams/types/query.pb.gw.go:40
github.com/CoreumFoundation/coreum/x/customparams/types.RegisterQueryHandlerClient.func1
	github.com/CoreumFoundation/coreum/x/customparams/types/query.pb.gw.go:133
github.com/grpc-ecosystem/grpc-gateway/runtime.(*ServeMux).ServeHTTP
	github.com/grpc-ecosystem/[email protected]/runtime/mux.go:240
github.com/gorilla/mux.(*Router).ServeHTTP
	github.com/gorilla/[email protected]/mux.go:210
github.com/gorilla/handlers.(*cors).ServeHTTP
	github.com/gorilla/[email protected]/cors.go:54
github.com/tendermint/tendermint/rpc/jsonrpc/server.maxBytesHandler.ServeHTTP
	github.com/tendermint/[email protected]/rpc/jsonrpc/server/http_server.go:256
github.com/tendermint/tendermint/rpc/jsonrpc/server.RecoverAndLogHandler.func1
	github.com/tendermint/[email protected]/rpc/jsonrpc/server/http_server.go:229
net/http.HandlerFunc.ServeHTTP
	net/http/server.go:2122
net/http.serverHandler.ServeHTTP
	net/http/server.go:2936
net/http.(*conn).serve
	net/http/server.go:1995
failed to load state at height 46; version mismatch on immutable IAVL tree; version does not exist. Version has either been pruned, or is for a future block height (latest height: 46): invalid request

@Reecepbcups
Copy link
Member

Reecepbcups commented Mar 15, 2023

@dzmitryhil You have to add the storeKeys for those modules as well. This is an extra thing on top of the upgrade handler

https://github.com/CosmosContracts/juno/blob/main/app/upgrades/v13/constants.go#L22

(This will more than likely be in your app.go, we have a different format to make upgrades cleaner)

In our case, we missed icacontrollertypes.StoreKey, since it was already on mainnet from a previous release, but not on our reset testnet

@dzmitryhil
Copy link

Thanks a lot @Reecepbcups, it helped.

@lakefishingman522
Copy link

I am using cosmos sdk v0.46.13 and when I am trying to send tx to tendermint, I got follow messages.

rpc error: code = Unknown desc = codespace sdk code 18: invalid request: failed to load state at height 1; version mismatch on immutable IAVL tree; version does not exist. Version has either been pruned, or is for a future block height (latest height: 1361)

is this related to my hardhat task script or my tendermint side? how to fix this?

@violog
Copy link

violog commented Apr 10, 2024

On command query gov proposals I received a similar error:

Error: rpc error: code = InvalidArgument desc = failed to load state at height 46; version does not exist (latest height: 46): invalid request

My app is built on top of Cosmos SDK v0.46.7. However, the SDK itself is not an issue. Try to find indirect dependency in your go.mod: github.com/syndtr/goleveldb. On versions:

  • v1.0.1-0.20210819022825-2ae1ddf74ef7 no issue
  • v1.0.1-0.20220721030215-126854af5e6d the issue is reproduced

Hope it saves you some extra hours of debug 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.