-
Notifications
You must be signed in to change notification settings - Fork 2.6k
stack overflow error when syncing a new node #4527
Comments
Thanks for report! What network are you trying to fully sync? Can you reproduce "After a while the stack overflow error is triggered." now? |
Hi @NikVolf Sorry I should have clarify that, we are using our own network: https://telemetry.polkadot.io/#list/Flint%20Testnet%20CC1 I was able to reproduce fairly consistently until I restarted/refreshed those 3 full nodes they were behind. At this moment I cannot reproduce it, since looks like the network is in a healthy state now. Any hints on how this could have been triggered? |
@mikiquantum If you will have another moment when it get in the state of being consistently reproduced, please try to run node with Meanwhile, we will try to figure out If you have any custom code for your network and can share it, this will help too |
Sure thing, I will enable that flag as soon as I can reproduce again. Here is the custom code for chainSpec and runtime lib:
Thank you! |
Any update on this? |
@bkchr I was not able to reproduce again, I think we can close it for now and reopen it if we can trigger again in the future. |
We are facing this error when trying to fully sync up a new node with the network:
Context:
Steps:
Host resources (cpu, mem) look fine, way below any suspicious threshold.
Only suspicious behavior we saw on the logs was this INFO msg:
Invalid justification provided by ...
Since the 3 sync full nodes were behind syncing with the network, we decided to bring all of them down and then try again syncing the new node, and it successfully caught up with no stack overflow panic. Although before we had to clear the new node DB from the previous attempt, otherwise the panic gets thrown as soon as the node process comes up.
Is it possible that the 3 full nodes DBs were corrupted and that triggered the stack overflow error in the new node when trying to build the local chain, since it was using the state of those DBs?
Any insight (theoretical or practical) on scenarios where this error could be triggered, would be highly appreciated it.
Thanks in advance!
The text was updated successfully, but these errors were encountered: