Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread 'tokio-runtime-worker' panicked at 'Header of imported block must exist; qed', /mnt/Data2/src/github.com/subspace/subspace/domains/client/domain-executor/src/domain_worker.rs:113 #1143

Closed
liuchengxu opened this issue Feb 3, 2023 · 8 comments · Fixed by #1217
Labels
bug Something isn't working execution Subspace execution
Milestone

Comments

@liuchengxu
Copy link
Contributor

====================

Version: 0.1.0-4b2066ae8a1

   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/alloc/src/boxed.rs:2002:9
      std::panicking::rust_panic_with_hook
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/std/src/panicking.rs:692:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/std/src/panicking.rs:579:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/std/src/sys_common/backtrace.rs:137:18
   4: rust_begin_unwind
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/std/src/panicking.rs:575:5
   5: core::panicking::panic_fmt
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/core/src/panicking.rs:64:14
   6: core::panicking::panic_display
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/core/src/panicking.rs:147:5
   7: core::panicking::panic_str
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/core/src/panicking.rs:131:5
   8: core::option::expect_failed
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/core/src/option.rs:1924:5
   9: futures_util::future::future::FutureExt::poll_unpin
  10: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
  11: domain_client_executor::core_domain_worker::start_worker::{{closure}}
  12: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
  13: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
  14: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
  15: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  16: tokio::runtime::context::BlockingRegionGuard::block_on
  17: tokio::runtime::handle::Handle::block_on
  18: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  19: tokio::runtime::task::core::Core<T,S>::poll
  20: tokio::runtime::task::harness::Harness<T,S>::poll
  21: tokio::runtime::blocking::pool::Inner::run
  22: std::sys_common::backtrace::__rust_begin_short_backtrace
  23: core::ops::function::FnOnce::call_once{{vtable.shim}}
  24: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/alloc/src/boxed.rs:1988:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/alloc/src/boxed.rs:1988:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/afaf3e07aaa7ca9873bdb439caec53faffa4230c/library/std/src/sys/unix/thread.rs:108:17
  25: <unknown>
  26: <unknown>


Thread 'tokio-runtime-worker' panicked at 'Header of imported block must exist; qed', /mnt/Data2/src/github.com/subspace/subspace/domains/client/domain-executor/src/domain_worker.rs:113

https://github.com/subspace/subspace/blob/074e0aecc2fd72f91a90668370001db6fce298a4/domains/client/domain-executor/src/domain_worker.rs#L108-L114

Looks like the expect is not fine even on Option, I have no idea what happened exactly, but I think we simply avoid it by passing the block_hash in the original primary block notification. cc @nazar-pc

https://github.com/subspace/subspace/blob/074e0aecc2fd72f91a90668370001db6fce298a4/crates/sc-consensus-subspace/src/lib.rs#L1175-L1180

@liuchengxu liuchengxu added bug Something isn't working execution Subspace execution labels Feb 3, 2023
@liuchengxu liuchengxu added this to the Gemini 3 milestone Feb 3, 2023
@nazar-pc
Copy link
Member

nazar-pc commented Feb 3, 2023

The reason is likely that block is not imported YET. That notification specifically is fired right before block import is finished, so it is not in the history yet. IIRC there is another subscription on the client that is fire by Substrate AFTER block is fully imported, you should use that one instead.

@liuchengxu
Copy link
Contributor Author

Well, we can use the subscription from substrate. If so, I wonder whether the block import notification in sc-consensus-subspace is necessary if the one from substrate is sufficient, as from what I see, now the consumers of our own subscription are the archiver (start archiving on K-depth) and executor (process each imported primary block), they could both use the subscription from substrate.

BTW, if we choose to switch, the primary block import throttling may also need to be reworked, currently, the ack sender is part of the notification from sc-consensus-subspace.

https://github.com/subspace/subspace/blob/074e0aecc2fd72f91a90668370001db6fce298a4/domains/client/domain-executor/src/domain_worker.rs#L94-L96

@nazar-pc
Copy link
Member

nazar-pc commented Feb 4, 2023

Well, that subscription was created specifically for archiving because it has to run before block import finishes, that was we can guarantee root block information will be included into the very next block.

The fact that it subscription is also used for other things is an accident and turns out to be a bug even. We probably need to stop exposing it unless we need to pause block importing for any other reason.

@liuchengxu
Copy link
Contributor Author

I had another look and found we can't directly use the subscription from substrate because fork_choice is not available there, plus we need to pause the primary block import when needed anyway, adding block_hash to our subscription as is the easiest way, which does not make thing worse and avoid the specific panic in this issue.

I agree we should consider using the subscription in which the block import is completed as invoking header seems risky as well even though we haven't seen it, need more think though.

@nazar-pc
Copy link
Member

nazar-pc commented Feb 4, 2023

BTW, why do you even need fork_choice in there? Wouldn't it be sufficient to simply build all the branches that primary chain has and finalize/mark as longest chain the same blocks that primary chain finalizes after the fact.

@liuchengxu
Copy link
Contributor Author

fork_choice is used because in theory the domain should use the same fork choice rule as the consensus chain, hence it's reasonable to apply the consensus block's fork choice to the domain block. Using the longest chain should work as long as we follow all the branches but somewhat lacks some beauty in theory :P.

Since you mentioned building the branches from the consensus chain, I realized that using the substrate subscription may have another benefit: we likely don't have to calculate the tree route on our own as it's already part of the substrate block import notification. Will take a closer look, and now I feel switching to the substrate subscription is potentially better practically.

@liuchengxu
Copy link
Contributor Author

paritytech/substrate#13315 is needed to use the notification fired after finishing the block import pipeline, will create a PR upstream soon.

I realized that using the substrate subscription may have another benefit: we likely don't have to calculate the tree route on our own as it's already part of the substrate block import notification.

With the local experiment, I found this doesn't work as the tree route in the notification is calculated from the new best to the old best's parent, which has a problem in that after a reorg, we still need some work to find the appropriate domain parent to build the next domain block, so it doesn't simplify the code.

In short, we'll just replace the notification stream in the domain executor once paritytech/substrate#13315 is resolved.

@nazar-pc
Copy link
Member

It will make things easier, with the trade-off being execution of the forks that may never become canonical, wasting precious computational resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working execution Subspace execution
Projects
Archived in project
2 participants