Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

--light sync panics with thread 'tokio-runtime-worker' has overflowed its stack #5998

Closed
amaury1093 opened this issue May 12, 2020 · 23 comments
Labels
A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. I3-bug The node fails to follow expected behavior.

Comments

@amaury1093
Copy link

amaury1093 commented May 12, 2020

On a recent kusama, I'm running:

kusama git:(master) ./target/release/polkadot --light

At some point, the sync panics with

thread 'tokio-runtime-worker' has overflowed its stack
fatal runtime error: stack overflow
[1]    23325 abort      ./target/release/polkadot --light

Gist: https://gist.github.com/amaurymartiny/b5b85f88b3bf2e590409ac371acba288

The "at some point" depends, I tried running twice:

  • first time, it panics at 1370562 (see gist)
  • second time, 1375509

I'm not sure if it's related to #4527, if yes, we can close this issue and continue there.

Specs:

  • macos 10.15.4
  • kusama 0.7.33-e64635a3-x86_64-macos
@amaury1093 amaury1093 added the I3-bug The node fails to follow expected behavior. label May 12, 2020
@bkchr
Copy link
Member

bkchr commented May 12, 2020

Can you try to run with gdb (or similiar) to get a proper stack trace? I tried it on my PC and could not yet reproduce it.

@arkpar
Copy link
Member

arkpar commented May 12, 2020

macos has default stack size of 512k for non-main threads. So this should be reproducible on linux with something like ulimit -s 500

@bkchr
Copy link
Member

bkchr commented May 12, 2020

Yep, getting it ;)

@bkchr
Copy link
Member

bkchr commented May 12, 2020

(it is related to decoding the ForkTree which is a recursive data structure)

@expenses
Copy link
Contributor

expenses commented Oct 5, 2020

I'm getting this, on a full node when trying to sync for #7225. The command I'm running is simply target/release/substrate --execution native --wasm-execution compiled.

@expenses
Copy link
Contributor

expenses commented Oct 5, 2020

So if the recursive nature of the ForkTree is the problem here, what's the solution? To rewrite it so that it's a flat array?

@bkchr
Copy link
Member

bkchr commented Oct 5, 2020

Maybe there is a bug and we should already have pruned some of the state. Not sure. @andresilva probably knows more.

@andresilva
Copy link
Contributor

andresilva commented Oct 5, 2020

We will prune BABE's epoch tree as we finalize new blocks, in this case we are not catching up to the latest finalized block and as we sync BABE blocks the epoch tree gets deeper and deeper. Since all of the operations on the fork tree are currently implemented using recursion we eventually reach the limits of the call stack. The solution is to rewrite the operations on the fork tree to not use recursion and instead use some auxiliary data structure as stack (my brain understands recursion better so that's why I wrote it like that the first time around).

@expenses
Copy link
Contributor

expenses commented Oct 5, 2020

It should also be possible to write Node::decode using a loop without recursion.

@expenses
Copy link
Contributor

expenses commented Oct 6, 2020

It should also be possible to write Node::decode using a loop without recursion.

I just tried this and it had no effect, so this doesn't seem to be a problem with decoding. Here's the gdb backtrace:

Thread 68 "tokio-runtime-w" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffa30d4640 (LWP 28733)]
0x0000555557056a6c in <sc_client_db::BlockchainDb<Block> as sp_blockchain::header_metadata::HeaderMetadata<Block>>::header_metadata ()
(gdb) backtrace
#0  0x0000555557056a6c in <sc_client_db::BlockchainDb<Block> as sp_blockchain::header_metadata::HeaderMetadata<Block>>::header_metadata ()
#1  0x0000555556c996c2 in sp_blockchain::header_metadata::lowest_common_ancestor ()
#2  0x0000555556979bc1 in sc_client_api::utils::is_descendent_of::{{closure}} ()
#3  0x0000555556b47431 in fork_tree::node_implementation::Node<H,N,V>::import ()
#4  0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#5  0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#6  0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#7  0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#8  0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#9  0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#10 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#11 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#12 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#13 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#14 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#15 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#16 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#17 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#18 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#19 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#20 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#21 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#22 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#23 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#24 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#25 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#26 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#27 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#28 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#29 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#30 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#31 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#32 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#33 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#34 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#35 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
...
#2436 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#2437 0x0000555556b4730c in fork_tree::node_implementation::Node<H,N,V>::import ()
#2438 0x00005555570db022 in fork_tree::ForkTree<H,N,V>::import ()
#2439 0x00005555569c2e93 in sc_consensus_epochs::EpochChanges<Hash,Number,E>::import ()
#2440 0x0000555556c8e31f in <sc_consensus_babe::BabeBlockImport<Block,Client,Inner> as sp_consensus::block_import::BlockImport<Block>>::import_block ()
#2441 0x0000555556c85371 in sp_consensus::import_queue::import_single_block_metered ()
#2442 0x0000555556a29c9c in <futures_util::future::poll_fn::PollFn<F> as core::future::future::Future>::poll ()
#2443 0x0000555556be40ba in <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll ()
#2444 0x0000555556df14c9 in <futures_util::future::future::flatten::Flatten<Fut,<Fut as core::future::future::Future>::Output> as core::future::future::Future>::poll ()
#2445 0x0000555556a2e94a in <futures_util::future::poll_fn::PollFn<F> as core::future::future::Future>::poll ()
#2446 0x000055555750e070 in <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll ()
#2447 0x000055555750645c in <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll ()
#2448 0x0000555557508df9 in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll ()
#2449 0x0000555556c23f7c in std::thread::local::LocalKey<T>::with ()
#2450 0x0000555556ac4191 in futures_executor::local_pool::block_on ()
#2451 0x00005555570ccb73 in tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut ()
#2452 0x0000555556f8cd7d in tokio::runtime::task::core::Core<T,S>::poll ()
#2453 0x0000555556fb7449 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#2454 0x0000555556b21098 in tokio::runtime::task::harness::Harness<T,S>::poll ()
#2455 0x0000555558097d35 in tokio::runtime::blocking::pool::Inner::run ()
#2456 0x000055555809d1f9 in tokio::runtime::context::enter ()
#2457 0x00005555580a0820 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#2458 0x00005555580a80a0 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#2459 0x00005555585bd16a in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/397b390cc76ba1d98f80b2a24a371f708dcc9169/library/alloc/src/boxed.rs:1042
#2460 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/397b390cc76ba1d98f80b2a24a371f708dcc9169/library/alloc/src/boxed.rs:1042
#2461 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:87
#2462 0x00007ffff39293e9 in start_thread () from /usr/lib/libpthread.so.0
#2463 0x00007ffff383d293 in clone () from /usr/lib/libc.so.6

@bkchr
Copy link
Member

bkchr commented Oct 6, 2020

@expenses you have rewritten the decode as iterative function? Perfect! Decode and import suffer from the same problem, so we need a solution for both of them ;)

@expenses
Copy link
Contributor

expenses commented Oct 6, 2020

Unfortunately, import is a lot harder to iterative-ize 🤔

@expenses
Copy link
Contributor

expenses commented Oct 7, 2020

If we're going to re-write the fork tree, I think it makes sense to use a library for the tree structure. I'm not really sure which ones are commonly used, but indextree seemed decent. I don't think we need to do this.

@expenses
Copy link
Contributor

expenses commented Oct 8, 2020

I have written an iterative version of import: https://github.com/paritytech/substrate/compare/82867294110481d421f77bddb04aef10c0674e09

The node still panics though due to find_node_index_where being recursive.

@andresilva
Copy link
Contributor

Thanks! Yeah I think we can keep the forktree since it's small enough and specialized for our two use cases (BABE and GRANDPA), although I appreciate that you looked into alternatives. A PR for removing recursion out of it (the work that you already started here) would be greatly appreciated :)

@notlesh
Copy link
Contributor

notlesh commented Feb 26, 2021

I've seen this a few times on Moonbeam, and today I ran into it in such a way that I can reproduce it by running a node against the original data dir, running against our alphanet. The node will then consistently crash within seconds. If anyone wants a copy of this data, let me know.

I'm not running with --light, so let me know if there is a more relevant open issue I should redirect this to.

My stack looks something like this:

#0  0x0000555556fd1f95 in <sc_client_db::BlockchainDb<Block> as sp_blockchain::header_metadata::HeaderMetadata<Block>>::header_metadata ()
#1  0x000055555697b965 in sp_blockchain::header_metadata::lowest_common_ancestor ()
#2  0x000055555653ca63 in fork_tree::node_implementation::Node<H,N,V>::import ()

<repeated 1500 times>

#1541 0x000055555653c748 in fork_tree::node_implementation::Node<H,N,V>::import ()
#1542 0x0000555557228145 in sc_finality_grandpa::authorities::AuthoritySet<H,N>::add_pending_change ()
#1543 0x00005555570a72f6 in <sc_consensus_babe::BabeBlockImport<Block,Client,Inner> as sp_consensus::block_import::BlockImport<Block>>::import_block ()
#1544 0x000055555723c9ff in _$LT$alloc..boxed..Box$LT$dyn$u20$sp_consensus..block_import..BlockImport$LT$B$GT$$u2b$Transaction$u20$$u3d$$u20$Transaction$u2b$Error$u20$$u3d$$u20$sp_consensus..error..Error$u2b$core..marker..Sync$u2b$core..marker..Send$GT$$u20$as$u20$sp_consensus..block_import..BlockImport$LT$B$GT$$GT$::import_block::hf7ca69bcaae0e7c3 ()
#1545 0x000055555697637f in sp_consensus::import_queue::import_single_block_metered ()
#1546 0x0000555556e16331 in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll ()
#1547 0x0000555556eaa1b7 in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll ()
#1548 0x0000555557e53ee1 in <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll ()
#1549 0x0000555557e67ecb in <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll ()
#1550 0x0000555557e565a6 in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll ()
#1551 0x0000555557e66376 in <tracing_futures::Instrumented<T> as core::future::future::Future>::poll ()
#1552 0x0000555555bfd14c in futures_executor::local_pool::block_on ()
#1553 0x00005555561c2629 in tokio::runtime::task::core::Core<T,S>::poll ()
#1554 0x0000555555bb6f13 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#1555 0x0000555555fd35ad in tokio::runtime::task::harness::Harness<T,S>::poll ()
#1556 0x0000555558733741 in tokio::runtime::blocking::pool::Inner::run ()
#1557 0x0000555558741730 in tokio::runtime::context::enter ()
#1558 0x0000555558746fd6 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#1559 0x000055555874069b in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#1560 0x0000555558fde96a in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/8876ffc9235dade728e1fbc4be4c85415fdd0bcd/library/alloc/src/boxed.rs:1042
#1561 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/8876ffc9235dade728e1fbc4be4c85415fdd0bcd/library/alloc/src/boxed.rs:1042
#1562 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:87
#1563 0x00007ffff7a32590 in start_thread (arg=0x7ffff452d640) at pthread_create.c:463
#1564 0x00007ffff7cc0223 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@andresilva
Copy link
Contributor

@notlesh I assume that your chain is not finalizing blocks, that is the root cause of the issue you're seeing.

@notlesh
Copy link
Contributor

notlesh commented Mar 2, 2021

@notlesh I assume that your chain is not finalizing blocks, that is the root cause of the issue you're seeing.

This is running on Moonbeam's alphanet, which is using its own relay chain and running grandpa for finality.

By the way, I saw the same thing again over the weekend (where I needed to purge my chaindata and resync).

@JoshOrndorff
Copy link
Contributor

So a little more info about when we hit this in the moonbeam context. Moonbeam is a parachain that is built with cumulus. As such, it runs a polkadot full service to follow relay chain state.

It is that polkadot service within the moonbeam process that is panicing in @notlesh 's trace above (correct me if I'm wrong, but that's where grandpa is running)

@andresilva
Copy link
Contributor

andresilva commented Mar 2, 2021

Can you confirm that it is following finality from the relay chain? Please just post some of the node output here before the crash, I assume it isn't finalizing and hence why the GRANDPA pending changes tree is so deep (we only clean it up on finality).

@notlesh
Copy link
Contributor

notlesh commented Mar 2, 2021

Can you confirm that it is following finality from the relay chain? Please just post some of the node output here before the crash, I assume it isn't finalizing and hence why the GRANDPA pending changes tree is so deep (we only clean it up on finality).

Here's some recent output from the original crash:

2021-02-26 20:36:54  [Relaychain] 👶 New epoch 12966 launching at block 0xe34b…2ce3 (block slot 269061969 >= start slot 269061969).
2021-02-26 20:36:54  [Relaychain] 👶 Next epoch starts at slot 269061979
2021-02-26 20:36:54  [Relaychain] ✨ Imported #129633 (0xe34b…2ce3)
2021-02-26 20:36:54  ✨ Imported #58096 (0x0baa…060d)
2021-02-26 20:36:57  💤 Idle (50 peers), best: #58095 (0xeb26…5fc8), finalized #51830 (0x34c0…4824), ⬇ 16.4kiB/s ⬆ 17.8kiB/s
2021-02-26 20:36:58  [Relaychain] 💤 Idle (50 peers), best: #129633 (0xe34b…2ce3), finalized #114261 (0x34a8…0eb3), ⬇ 11.3kiB/s ⬆ 16.9kiB/s
2021-02-26 20:37:00  [Relaychain] ✨ Imported #129634 (0x2649…8c3b)
2021-02-26 20:37:00  [Relaychain] ♻️  Reorg on #129634,0x2649…8c3b to #129634,0x71de…b5b0, common ancestor #129633,0xe34b…2ce3
2021-02-26 20:37:00  [Relaychain] ✨ Imported #129634 (0x71de…b5b0)
2021-02-26 20:37:02  💤 Idle (50 peers), best: #58095 (0xeb26…5fc8), finalized #51830 (0x34c0…4824), ⬇ 4.8kiB/s ⬆ 4.7kiB/s
2021-02-26 20:37:03  [Relaychain] 💤 Idle (50 peers), best: #129634 (0x71de…b5b0), finalized #114261 (0x34a8…0eb3), ⬇ 8.3kiB/s ⬆ 12.4kiB/s
2021-02-26 20:37:06  [Relaychain] ✨ Imported #129635 (0x4388…426b)
2021-02-26 20:37:06  [Relaychain] ✨ Imported #129635 (0x6bbc…7160)
2021-02-26 20:37:07  ✨ Imported #58097 (0xccd2…ed9c)
2021-02-26 20:37:07  💤 Idle (50 peers), best: #58096 (0x0baa…060d), finalized #51830 (0x34c0…4824), ⬇ 2.0kiB/s ⬆ 2.9kiB/s
2021-02-26 20:37:08  [Relaychain] 💤 Idle (50 peers), best: #129635 (0x4388…426b), finalized #114261 (0x34a8…0eb3), ⬇ 8.9kiB/s ⬆ 12.9kiB/s
2021-02-26 20:37:12  [Relaychain] ✨ Imported #129636 (0x4a27…0253)
2021-02-26 20:37:12  💤 Idle (50 peers), best: #58096 (0x0baa…060d), finalized #51830 (0x34c0…4824), ⬇ 10.5kiB/s ⬆ 10.8kiB/s
2021-02-26 20:37:13  [Relaychain] 💤 Idle (50 peers), best: #129636 (0x4a27…0253), finalized #114261 (0x34a8…0eb3), ⬇ 8.2kiB/s ⬆ 11.8kiB/s
2021-02-26 20:37:17  💤 Idle (50 peers), best: #58096 (0x0baa…060d), finalized #51830 (0x34c0…4824), ⬇ 0.7kiB/s ⬆ 0.7kiB/s
2021-02-26 20:37:18  [Relaychain] ✨ Imported #129637 (0xc2e9…f58e)
2021-02-26 20:37:18  [Relaychain] 💤 Idle (50 peers), best: #129637 (0xc2e9…f58e), finalized #114261 (0x34a8…0eb3), ⬇ 10.9kiB/s ⬆ 15.5kiB/s
2021-02-26 20:37:22  💤 Idle (50 peers), best: #58096 (0x0baa…060d), finalized #51830 (0x34c0…4824), ⬇ 0.9kiB/s ⬆ 0.8kiB/s
2021-02-26 20:37:23  [Relaychain] 💤 Idle (50 peers), best: #129637 (0xc2e9…f58e), finalized #114261 (0x34a8…0eb3), ⬇ 4.6kiB/s ⬆ 6.1kiB/s
2021-02-26 20:37:24  [Relaychain] ✨ Imported #129638 (0x6338…bf37)
2021-02-26 20:37:27  💤 Idle (50 peers), best: #58096 (0x0baa…060d), finalized #51830 (0x34c0…4824), ⬇ 1.4kiB/s ⬆ 1.8kiB/s
2021-02-26 20:37:28  [Relaychain] 💤 Idle (50 peers), best: #129638 (0x6338…bf37), finalized #114261 (0x34a8…0eb3), ⬇ 5.6kiB/s ⬆ 9.7kiB/s
2021-02-26 20:37:30  [Relaychain] ✨ Imported #129639 (0xdad3…2695)
2021-02-26 20:37:30  💔 Error importing block 0xf86c0bf66eabda6fefe7c126fcd295142567ce427075001e8247612636b6b8bb: Err(UnknownParent)
2021-02-26 20:37:31  ✨ Imported #58097 (0x7ff7…eaa0)
2021-02-26 20:37:31  ✨ Imported #58098 (0xf86c…b8bb)
2021-02-26 20:37:32  💤 Idle (50 peers), best: #58097 (0x7ff7…eaa0), finalized #51830 (0x34c0…4824), ⬇ 8.5kiB/s ⬆ 9.4kiB/s
2021-02-26 20:37:33  [Relaychain] 💤 Idle (50 peers), best: #129639 (0xdad3…2695), finalized #114261 (0x34a8…0eb3), ⬇ 5.6kiB/s ⬆ 9.1kiB/s
2021-02-26 20:37:36  [Relaychain] ✨ Imported #129640 (0x482e…bf2d)
2021-02-26 20:37:37  💤 Idle (50 peers), best: #58097 (0x7ff7…eaa0), finalized #51830 (0x34c0…4824), ⬇ 0.9kiB/s ⬆ 0.8kiB/s
2021-02-26 20:37:38  [Relaychain] 💤 Idle (50 peers), best: #129640 (0x482e…bf2d), finalized #114261 (0x34a8…0eb3), ⬇ 7.2kiB/s ⬆ 11.0kiB/s
2021-02-26 20:37:42  [Relaychain] ✨ Imported #129641 (0x0c87…22ea)
2021-02-26 20:37:42  [Relaychain] ♻️  Reorg on #129641,0x0c87…22ea to #129641,0xd49c…737c, common ancestor #129640,0x482e…bf2d
2021-02-26 20:37:42  [Relaychain] ✨ Imported #129641 (0xd49c…737c)
2021-02-26 20:37:42  💤 Idle (50 peers), best: #58098 (0xf86c…b8bb), finalized #51830 (0x34c0…4824), ⬇ 1.5kiB/s ⬆ 3.1kiB/s
2021-02-26 20:37:43  [Relaychain] 💤 Idle (50 peers), best: #129641 (0xd49c…737c), finalized #114261 (0x34a8…0eb3), ⬇ 10.9kiB/s ⬆ 19.4kiB/s
2021-02-26 20:37:47  💤 Idle (50 peers), best: #58098 (0xf86c…b8bb), finalized #51830 (0x34c0…4824), ⬇ 5.4kiB/s ⬆ 4.9kiB/s
2021-02-26 20:37:48  [Relaychain] ✨ Imported #129642 (0x6eaa…825b)
2021-02-26 20:37:48  [Relaychain] 💤 Idle (50 peers), best: #129642 (0x6eaa…825b), finalized #114261 (0x34a8…0eb3), ⬇ 10.1kiB/s ⬆ 14.4kiB/s
2021-02-26 20:37:52  💤 Idle (50 peers), best: #58098 (0xf86c…b8bb), finalized #51830 (0x34c0…4824), ⬇ 0.8kiB/s ⬆ 1.0kiB/s
2021-02-26 20:37:53  [Relaychain] 💤 Idle (50 peers), best: #129642 (0x6eaa…825b), finalized #114261 (0x34a8…0eb3), ⬇ 6.8kiB/s ⬆ 11.5kiB/s
2021-02-26 20:37:54  [Relaychain] 👶 New epoch 12967 launching at block 0xb73d…e4d6 (block slot 269061979 >= start slot 269061979).
2021-02-26 20:37:54  [Relaychain] 👶 Next epoch starts at slot 269061989

thread 'tokio-runtime-worker' has overflowed its stack
fatal runtime error: stack overflow

@andresilva
Copy link
Contributor

So like I said the node is not finalizing, the relay chain best finalized is stuck at #114261, I don't know if the network as a whole is finalizing or if it's just a local issue on that node. On top of that your epochs seem to be 1 minute long which just makes the problem worse.

We should focus on why the node is not finalizing.

We can make this case better by not using recursion in the fork-tree code (so that we don't overflow the stack if the tree is too deep) but that's just treating the symptoms and not the root cause.

@stale
Copy link

stale bot commented Jul 7, 2021

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 7, 2021
@bkchr bkchr closed this as completed Feb 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. I3-bug The node fails to follow expected behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants