Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relayer panic: 'can not convert float seconds to Duration: value is either too big or NaN' in SyncerEtaCalculator #2723

Closed
tkporter opened this issue Sep 12, 2023 · 3 comments · Fixed by #2735
Assignees
Labels
agent bug Something isn't working

Comments

@tkporter
Copy link
Collaborator

This is a new one :(

On hyperlane context of omniscient relayer on testnet3.

Image: gcr.io/abacus-labs-dev/hyperlane-agent:892cc5d-20230908-162614

Logs: https://cloudlogging.app.goo.gl/uR8m3H8dLtC77J7P6

Pasted:

{"timestamp":"2023-09-12T13:54:05.436645Z","level":"INFO","fields":{"return":"58"},"target":"hyperlane_sealevel::mailbox","span":{"_maybe_lag":"None","name":"count"},"spans":[{"domain":"solanadevnet","label":"dispatched_messages","name":"ContractSync"},{"name":"sequence_and_tip"},{"_maybe_lag":"None","name":"count"}]}
{"timestamp":"2023-09-12T13:54:08.450546Z","level":"INFO","fields":{"message":"Found log(s) in index range","range":"16414563..=16414648","num_logs":0,"estimated_time_to_sync":"synced"},"target":"hyperlane_base::contract_sync","span":{"domain":"proteustestnet","label":"gas_payments","name":"ContractSync"},"spans":[{"domain":"proteustestnet","label":"gas_payments","name":"ContractSync"}]}
{"timestamp":"2023-09-12T13:54:08.983441Z","level":"INFO","fields":{"message":"Found log(s) in index range","range":"9682143..=9682143","num_logs":0,"estimated_time_to_sync":"synced"},"target":"hyperlane_base::contract_sync","span":{"domain":"goerli","label":"gas_payments","name":"ContractSync"},"spans":[{"domain":"goerli","label":"gas_payments","name":"ContractSync"}]}
thread 'main' panicked at 'can not convert float seconds to Duration: value is either too big or NaN', /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/time.rs:741:23
stack backtrace:
   0:     0x5624797204e1 - std::backtrace_rs::backtrace::libunwind::trace::h782cc21a5acaf6cb
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x5624797204e1 - std::backtrace_rs::backtrace::trace_unsynchronized::hc579eb24ab204515
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x5624797204e1 - std::sys_common::backtrace::_print_fmt::h7223525cfdbacda2
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x5624797204e1 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hbd7d55b7108d2ab8
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x5624797489ef - core::fmt::rt::Argument::fmt::hb4f4a02b9bd9dd49
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/fmt/rt.rs:138:9
   5:     0x5624797489ef - core::fmt::write::h6d54cd7c9e155ec5
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/fmt/mod.rs:1094:21
   6:     0x56247971d051 - std::io::Write::write_fmt::h6a453a71c692f63b
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/io/mod.rs:1713:15
   7:     0x5624797202f5 - std::sys_common::backtrace::_print::h1cbaa8b42678f928
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x5624797202f5 - std::sys_common::backtrace::print::h4ddf81241a51b337
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x562479721797 - std::panicking::default_hook::{{closure}}::hff91f1f484ade5cd
  10:     0x562479721584 - std::panicking::default_hook::h21f14afd59f7aef9
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:288:9
  11:     0x562479721c4c - std::panicking::rust_panic_with_hook::h45f66047b14c555c
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:705:13
  12:     0x562479721b47 - std::panicking::begin_panic_handler::{{closure}}::h49d1a88ef0908eb4
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:597:13
  13:     0x562479720916 - std::sys_common::backtrace::__rust_end_short_backtrace::hccebf9e57f8cc425
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:151:18
  14:     0x562479721892 - rust_begin_unwind
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:593:5
  15:     0x562477cc3f53 - core::panicking::panic_fmt::h54ec9d0e3180a83d
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/panicking.rs:67:14
  16:     0x562478741bb3 - hyperlane_base::contract_sync::eta_calculator::SyncerEtaCalculator::calculate::h5a03739f5d94a9bc
  17:     0x5624784cebf4 - <hyperlane_base::contract_sync::cursor::RateLimitedContractSyncCursor<T> as hyperlane_core::traits::cursor::ContractSyncCursor<T>>::next_action::{{closure}}::h3203622cf2b24249
  18:     0x562477e4b437 - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h4ec261072df4bfd7
  19:     0x56247837b6bc - relayer::relayer::Relayer::run_interchain_gas_payment_sync::{{closure}}::{{closure}}::h7b3cb0bed130d02e
  20:     0x562478335f79 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h7d243cdb6165239f
  21:     0x5624784ee7e7 - tokio::runtime::task::core::Core<T,S>::poll::hf9310fdffcc01b3b
  22:     0x5624785d9b7e - tokio::runtime::task::harness::Harness<T,S>::poll::hd510b0a04a7dbf49
  23:     0x562478564ac1 - tokio::runtime::scheduler::current_thread::Context::run_task::h91cceeeeff75fb03
  24:     0x5624782ef061 - tokio::runtime::context::scoped::Scoped<T>::set::h85416ddaa56dcb3e
  25:     0x56247823d9e0 - tokio::runtime::context::set_scheduler::hb2004ab3a67ad87e
  26:     0x562478564c9c - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::ha9882bf6dc51f595
  27:     0x56247847335c - tokio::runtime::context::runtime::enter_runtime::h671fce596e58b502
  28:     0x5624784ef6e0 - tokio::runtime::runtime::Runtime::block_on::h1441e56d5fb9c552
  29:     0x56247847e619 - relayer::main::ha1155d336e6745af
  30:     0x562478220273 - std::sys_common::backtrace::__rust_begin_short_backtrace::hb2611260fa5dc33f
  31:     0x562478220cfd - std::rt::lang_start::{{closure}}::h86a64400b5f2ee02
  32:     0x5624797176c5 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h802b0fdd426a12ca
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:284:13
  33:     0x5624797176c5 - std::panicking::try::do_call::h2a2f25050efa0cf8
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40
  34:     0x5624797176c5 - std::panicking::try::h9ca7f841c0f0e3dd
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19
  35:     0x5624797176c5 - std::panic::catch_unwind::h92d42a62587f8121
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14
  36:     0x5624797176c5 - std::rt::lang_start_internal::{{closure}}::hf1926c7a173d562c
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:48
  37:     0x5624797176c5 - std::panicking::try::do_call::haac70d88f0cce898
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40
  38:     0x5624797176c5 - std::panicking::try::h5c20719e1031e74b
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19
  39:     0x5624797176c5 - std::panic::catch_unwind::h9b15b36860d5fe4a
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14
  40:     0x5624797176c5 - std::rt::lang_start_internal::hf502095b101390bb
                               at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:20
  41:     0x56247847e6f5 - main
  42:     0x7fe748a74083 - __libc_start_main
  43:     0x562477ce826a - _start
  44:                0x0 - <unknown>
Error: task 583 panicked ##$$## Stack backtrace:(0) 0x5624786d4702 - hyperlane_base::oneline_eyre::handler::HookBuilder::install::{{closure}}(1) 0x5624797015c2 - eyre::capture_handler(2) 0x56247871944a - eyre::error::<impl core::convert::From<E> for eyre::Report>::from(3) 0x56247874098d - hyperlane_base::agent::run_all::{{closure}}(4) 0x5624787337b2 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut(5) 0x5624787de287 - tokio::runtime::task::core::Core<T,S>::poll(6) 0x5624787ef531 - tokio::runtime::task::harness::Harness<T,S>::poll(7) 0x562478564ac1 - tokio::runtime::scheduler::current_thread::Context::run_task(8) 0x5624782ef061 - tokio::runtime::context::scoped::Scoped<T>::set(9) 0x56247823d9e0 - tokio::runtime::context::set_scheduler(10) 0x562478564c9c - tokio::runtime::scheduler::current_thread::CoreGuard::block_on(11) 0x56247847335c - tokio::runtime::context::runtime::enter_runtime(12) 0x5624784ef6e0 - tokio::runtime::runtime::Runtime::block_on(13) 0x56247847e619 - relayer::main(14) 0x562478220273 - std::sys_common::backtrace::__rust_begin_short_backtrace(15) 0x562478220cfd - std::rt::lang_start::{{closure}}(16) 0x5624797176c5 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once @ /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:284:13 ## (16) 0x5624797176c5 - std::panicking::try::do_call @ /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40 ## (16) 0x5624797176c5 - std::panicking::try @ /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19 ## (16) 0x5624797176c5 - std::panic::catch_unwind @ /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14 ## (16) 0x5624797176c5 - std::rt::lang_start_internal::{{closure}} @ /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:48 ## (16) 0x5624797176c5 - std::panicking::try::do_call @ /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:500:40 ## (16) 0x5624797176c5 - std::panicking::try @ /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panicking.rs:464:19 ## (16) 0x5624797176c5 - std::panic::catch_unwind @ /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/panic.rs:142:14 ## (16) 0x5624797176c5 - std::rt::lang_start_internal @ /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:148:20 ## (17) 0x56247847e6f5 - main(18) 0x7fe748a74083 - __libc_start_main(19) 0x562477ce826a - _starttask 583 panicked
@tkporter tkporter added bug Something isn't working agent labels Sep 12, 2023
@tkporter tkporter moved this to Sprint in Hyperlane Tasks Sep 12, 2023
@tkporter
Copy link
Collaborator Author

Seems it's probably to do with https://github.com/hyperlane-xyz/hyperlane-monorepo/blob/main/rust/hyperlane-base/src/contract_sync/eta_calculator.rs#L59-L64, but unsure why we hit this case

Wondering why we've never ran into this before, wonder if the recent contract sync changes could have had some implications?

daniel-savu added a commit that referenced this issue Sep 13, 2023
Removes some unnecessary logs from the sealevel igp indexing fn. Also
instruments the eta calculator in response to
#2723

### Description

<!--
What's included in this PR?
-->

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

<!--
- Fixes #[issue number here]
-->

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
@avious00
Copy link
Contributor

we have a quick fix for the log symptom of the problem but need to figure out root cause

@avious00 avious00 assigned tkporter and unassigned tkporter Sep 13, 2023
@daniel-savu daniel-savu moved this from Sprint to In Progress in Hyperlane Tasks Sep 14, 2023
@daniel-savu
Copy link
Contributor

Still can't see logs about this, @tkporter you said it was occurring fairly consistently so I'm thinking debug! logs may not be displayed? Given the available filters, it should probably be a warn so it stands out. Opened this and will redeploy: #2732

Image

@avious00 avious00 moved this from In Progress to In Review in Hyperlane Tasks Sep 15, 2023
daniel-savu added a commit that referenced this issue Sep 19, 2023
## Bug 1

Closes #2723

The relayer panic is caused by an overflow, bc of dividing by
~`6.540888459481895e-211`. On my local, the effective rate of indexing
starts at `0.61`.
```
{"timestamp":"2023-09-15T09:57:10.746276Z","level":"INFO","fields":{"message":"~~~ blocks_processed: 2508, tip_progression: 2042, elapsed: 757.10340475, old_rate: Some(0.6155037701275111), effective_rate: 0.6155037701275111"},"target":"hyperlane_base::contract_sync::eta_calculator","span":{"domain":"solanadevnet","label":"gas_payments","name":"ContractSync"},"spans":[{"domain":"solanadevnet","label":"gas_payments","name":"ContractSync"}]}
```

But then both the `blocks_processed` and the `tip_progression` are
consistently zero, which makes the `new_rate` be zero
(https://github.com/hyperlane-xyz/hyperlane-monorepo/blob/eea423ad049acfd15855465792147fb99bc8dd4d/rust/hyperlane-base/src/contract_sync/eta_calculator.rs#L41),
and over time it takes over the entire moving average window to make it
almost zero, leading to an overflow. 15 mins after that initial log, the
effective rate already became `0.00038`.

The reason for blocks_processed and tip_progression consistently being
zero after the first log is that `eta_calculator.calculate(from, tip)`
is always called with the same from and tip although it expects to get
the latest values.

### The fix

the tip wasn't being set after the sequence_and_tip query here:
https://github.com/hyperlane-xyz/hyperlane-monorepo/blob/eea423ad049acfd15855465792147fb99bc8dd4d/rust/hyperlane-base/src/contract_sync/cursor.rs#L565

And then the to and from are calculated based on it:
https://github.com/hyperlane-xyz/hyperlane-monorepo/blob/eea423ad049acfd15855465792147fb99bc8dd4d/rust/hyperlane-base/src/contract_sync/cursor.rs#L550

So even though the sync_state internal variables were kept up-to-date,
the min(...) would cause the issue.

The first PR commit fixes this.

## Bug 2

There was another bug in the eta calculator, caused by it only using
`next_block` to approximate synced state, which doesn't apply to
sequence indexing. The way the eta calculator is called had to be
changed to use abstract measures of sync progression (which could be
blocks or sequences).

The second PR commit fixes this, afaict.
@github-project-automation github-project-automation bot moved this from In Review to Done in Hyperlane Tasks Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants