-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relayer panic: 'can not convert float seconds to Duration: value is either too big or NaN' in SyncerEtaCalculator #2723
Comments
Seems it's probably to do with https://github.com/hyperlane-xyz/hyperlane-monorepo/blob/main/rust/hyperlane-base/src/contract_sync/eta_calculator.rs#L59-L64, but unsure why we hit this case Wondering why we've never ran into this before, wonder if the recent contract sync changes could have had some implications? |
daniel-savu
added a commit
that referenced
this issue
Sep 13, 2023
Removes some unnecessary logs from the sealevel igp indexing fn. Also instruments the eta calculator in response to #2723 ### Description <!-- What's included in this PR? --> ### Drive-by changes <!-- Are there any minor or drive-by changes also included? --> ### Related issues <!-- - Fixes #[issue number here] --> ### Backward compatibility <!-- Are these changes backward compatible? Are there any infrastructure implications, e.g. changes that would prohibit deploying older commits using this infra tooling? Yes/No --> ### Testing <!-- What kind of testing have these changes undergone? None/Manual/Unit Tests -->
we have a quick fix for the log symptom of the problem but need to figure out root cause |
Merged
daniel-savu
added a commit
that referenced
this issue
Sep 19, 2023
## Bug 1 Closes #2723 The relayer panic is caused by an overflow, bc of dividing by ~`6.540888459481895e-211`. On my local, the effective rate of indexing starts at `0.61`. ``` {"timestamp":"2023-09-15T09:57:10.746276Z","level":"INFO","fields":{"message":"~~~ blocks_processed: 2508, tip_progression: 2042, elapsed: 757.10340475, old_rate: Some(0.6155037701275111), effective_rate: 0.6155037701275111"},"target":"hyperlane_base::contract_sync::eta_calculator","span":{"domain":"solanadevnet","label":"gas_payments","name":"ContractSync"},"spans":[{"domain":"solanadevnet","label":"gas_payments","name":"ContractSync"}]} ``` But then both the `blocks_processed` and the `tip_progression` are consistently zero, which makes the `new_rate` be zero (https://github.com/hyperlane-xyz/hyperlane-monorepo/blob/eea423ad049acfd15855465792147fb99bc8dd4d/rust/hyperlane-base/src/contract_sync/eta_calculator.rs#L41), and over time it takes over the entire moving average window to make it almost zero, leading to an overflow. 15 mins after that initial log, the effective rate already became `0.00038`. The reason for blocks_processed and tip_progression consistently being zero after the first log is that `eta_calculator.calculate(from, tip)` is always called with the same from and tip although it expects to get the latest values. ### The fix the tip wasn't being set after the sequence_and_tip query here: https://github.com/hyperlane-xyz/hyperlane-monorepo/blob/eea423ad049acfd15855465792147fb99bc8dd4d/rust/hyperlane-base/src/contract_sync/cursor.rs#L565 And then the to and from are calculated based on it: https://github.com/hyperlane-xyz/hyperlane-monorepo/blob/eea423ad049acfd15855465792147fb99bc8dd4d/rust/hyperlane-base/src/contract_sync/cursor.rs#L550 So even though the sync_state internal variables were kept up-to-date, the min(...) would cause the issue. The first PR commit fixes this. ## Bug 2 There was another bug in the eta calculator, caused by it only using `next_block` to approximate synced state, which doesn't apply to sequence indexing. The way the eta calculator is called had to be changed to use abstract measures of sync progression (which could be blocks or sequences). The second PR commit fixes this, afaict.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a new one :(
On hyperlane context of omniscient relayer on testnet3.
Image:
gcr.io/abacus-labs-dev/hyperlane-agent:892cc5d-20230908-162614
Logs: https://cloudlogging.app.goo.gl/uR8m3H8dLtC77J7P6
Pasted:
The text was updated successfully, but these errors were encountered: