stack overflow error when syncing a new node #4527

mikiquantum · 2020-01-03T14:05:06Z

We are facing this error when trying to fully sync up a new node with the network:

thread 'import-queue-worker-0' has overflowed its stack
fatal runtime error: stack overflow

Context:

We have 7 nodes validating blocks (fully synced and healthy)
We have 3 nodes that serve as sync nodes (full nodes, all of them lagging behind the last finalized and best block)

Steps:

Start another node from scratch. We specify some bootnodes, in our test scenario we use one of the validator nodes to bootstrap from.
Syncing starts. After a while the stack overflow error is triggered. Always around a different block number.

Host resources (cpu, mem) look fine, way below any suspicious threshold.

Only suspicious behavior we saw on the logs was this INFO msg:
Invalid justification provided by ...

Since the 3 sync full nodes were behind syncing with the network, we decided to bring all of them down and then try again syncing the new node, and it successfully caught up with no stack overflow panic. Although before we had to clear the new node DB from the previous attempt, otherwise the panic gets thrown as soon as the node process comes up.

Is it possible that the 3 full nodes DBs were corrupted and that triggered the stack overflow error in the new node when trying to build the local chain, since it was using the state of those DBs?

Any insight (theoretical or practical) on scenarios where this error could be triggered, would be highly appreciated it.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

NikVolf · 2020-01-03T15:08:48Z

Thanks for report!

What network are you trying to fully sync?

Can you reproduce "After a while the stack overflow error is triggered." now?

mikiquantum · 2020-01-03T15:23:35Z

Hi @NikVolf

Sorry I should have clarify that, we are using our own network: https://telemetry.polkadot.io/#list/Flint%20Testnet%20CC1

I was able to reproduce fairly consistently until I restarted/refreshed those 3 full nodes they were behind. At this moment I cannot reproduce it, since looks like the network is in a healthy state now.
I might be able to reproduce it in the upcoming days, since I took a snapshot of one of those DBs.

Any hints on how this could have been triggered?

NikVolf · 2020-01-03T15:45:51Z

@mikiquantum If you will have another moment when it get in the state of being consistently reproduced, please try to run node with -l sync=trace argument until it gets reproduced

Meanwhile, we will try to figure out

If you have any custom code for your network and can share it, this will help too

mikiquantum · 2020-01-03T15:57:15Z

Sure thing, I will enable that flag as soon as I can reproduce again.

Here is the custom code for chainSpec and runtime lib:

Thank you!

bkchr · 2020-05-12T17:00:42Z

Any update on this?

mikiquantum · 2020-05-12T20:23:43Z

@bkchr I was not able to reproduce again, I think we can close it for now and reopen it if we can trigger again in the future.
Thanks!

aurexav mentioned this issue Feb 3, 2020

Stack Overflow While Syncing darwinia-network/darwinia#252

Closed

amaury1093 mentioned this issue May 12, 2020

--light sync panics with thread 'tokio-runtime-worker' has overflowed its stack #5998

Closed

bkchr closed this as completed May 12, 2020

koute mentioned this issue May 31, 2021

Dangerously long stack trace when syncing a new node #8950

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stack overflow error when syncing a new node #4527

stack overflow error when syncing a new node #4527

mikiquantum commented Jan 3, 2020

NikVolf commented Jan 3, 2020

mikiquantum commented Jan 3, 2020

NikVolf commented Jan 3, 2020

mikiquantum commented Jan 3, 2020

bkchr commented May 12, 2020

mikiquantum commented May 12, 2020

stack overflow error when syncing a new node #4527

stack overflow error when syncing a new node #4527

Comments

mikiquantum commented Jan 3, 2020

Context:

Steps:

NikVolf commented Jan 3, 2020

mikiquantum commented Jan 3, 2020

NikVolf commented Jan 3, 2020

mikiquantum commented Jan 3, 2020

bkchr commented May 12, 2020

mikiquantum commented May 12, 2020