Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: avalanche interchain relayer keep error *nonce too low* and can not be auto recover #2560

Closed
annguyen-darenft opened this issue Jan 17, 2025 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@annguyen-darenft
Copy link

Describe the bug
When a 'nonce too low' error occurs, the relayer will remain in that state and cannot automatically recover the correct nonce on blockchain.

To Reproduce
I do not know how to reproduce it, however in severe cases the relayer may encounter such errors

Expected behavior
When a 'nonce too low' error occurs, automatically reset application's nonce value to the value of web3.eth.getTransactionCount(relayerManageWallet).

Screenshots

Image

Logs
If applicable, please include the relevant logs that indicate a problem.

Operating System
Ubuntu 22.04

Additional context
Currently, the only way to fix this error is to restart the relayer service by following commands:

avalanche interchain relayer stop
avalanche interchain relayer start
@felipemadero
Copy link
Collaborator

felipemadero commented Jan 17, 2025

hi! the address set to pay relayer tx fees should be used only by the relayer itself, and not for other means (that is so as to avoid the issue you mention). may it be the case that such address was used for some other purpose?

@felipemadero felipemadero self-assigned this Jan 17, 2025
@felipemadero
Copy link
Collaborator

@cam-schultz can you provide technical background on why the relayer does not automatically update the latest nonce? I believe it is used to keep track of relayer's state, but probably a brief explanation will be useful as context here.

@cam-schultz
Copy link

The relayer tracks the account nonce for each configured destination chain in memory. It fetches the nonce via chain RPC once on startup, and then from there increments the stored nonce whenever a tx is successfully sent. A consequence of this is that the relayer must be the only entity sending transactions from that account. If a tx is sent from elsewhere, then the on-chain account nonce will not match the relayer's stored nonce and tx's will fail.

why the relayer does not automatically update the latest nonce?

There's no safe way to do this in a world where multiple uncoordinated entities are sending tx's from the same account. It's really easy to hit a race condition when fetching the nonce via RPC that would result in the same nonce mismatch. In a world where only the relayer is using the account to send tx's, there's no need to fetch the nonce via RPC since the relayer is the only entity that can increment it.

If the tx fails for any reason, the relayer's health check endpoint will report unhealthy. This can be used to implement automatic restart in unrecoverable failure scenarios, removing the need to manually restart the relayer service, as described in the issue submission text.

@felipemadero
Copy link
Collaborator

The relayer tracks the account nonce for each configured destination chain in memory. It fetches the nonce via chain RPC once on startup, and then from there increments the stored nonce whenever a tx is successfully sent. A consequence of this is that the relayer must be the only entity sending transactions from that account. If a tx is sent from elsewhere, then the on-chain account nonce will not match the relayer's stored nonce and tx's will fail.

why the relayer does not automatically update the latest nonce?

There's no safe way to do this in a world where multiple uncoordinated entities are sending tx's from the same account. It's really easy to hit a race condition when fetching the nonce via RPC that would result in the same nonce mismatch. In a world where only the relayer is using the account to send tx's, there's no need to fetch the nonce via RPC since the relayer is the only entity that can increment it.

If the tx fails for any reason, the relayer's health check endpoint will report unhealthy. This can be used to implement automatic restart in unrecoverable failure scenarios, removing the need to manually restart the relayer service, as described in the issue submission text.

thanks!

@meaghanfitzgerald meaghanfitzgerald added this to the Reported Issues milestone Jan 17, 2025
@meaghanfitzgerald meaghanfitzgerald moved this from Backlog 🗄️ to Researching 📚 in Platform Engineering Group Jan 17, 2025
@annguyen-darenft
Copy link
Author

hi! the address set to pay relayer tx fees should be used only by the relayer itself, and not for other means (that is so as to avoid the issue you mention). may it be the case that such address was used for some other purpose?

Yes, it might be my fault when used that address in multiple places (I didn't notice). Thanks for the advice

@sukantoraymond
Copy link
Collaborator

@felipemadero should we disable using relayer address for other purposes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

5 participants