Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework chain pull to be concurrency friendly #1346

Merged
merged 3 commits into from
Dec 17, 2019
Merged

Conversation

mzabaluev
Copy link
Contributor

Now that multiple chain pull tasks can pull the same or mostly the same header chain, and in the mean time between arriving headers some blocks in the chain may be received and applied, CandidateForest needs to account for this concurrency.
Do this by only retiring candidate entries on GC, setting the state to Applied when the block is committed to storage. When asynchronous ChainAdvance processing sees such an entry at the current block's parent, it knows to drop the hashes it collected to fetch and re-land the chain from the current hash.

Fixes #1327

@mzabaluev mzabaluev added the subsys-block block creation/acceptation related label Dec 11, 2019
@mzabaluev mzabaluev requested review from NicolasDP and a team December 11, 2019 23:59
@rinor
Copy link
Contributor

rinor commented Dec 12, 2019

Can I take this one for a spin or not yet ready?

@NicolasDP
Copy link
Contributor

Yes you can.

@mzabaluev
Copy link
Contributor Author

Testing now with "legacy", looks normal.

@rinor
Copy link
Contributor

rinor commented Dec 12, 2019

Testing, in another network and yes looks much "cleaner" (granted that the nodes may have changed an is not the same situation, but still). Now, I went after panics related to chain pull root candidate and this is what I found. To be noted that even after those panics, the node recovers nicely (not tested very much yet, just an initial thought). This are the filtered panics I got:

thread 'block2' panicked at 'missed chain pull root candidate 985452463b94498a604a22c02103e4a95324ee2d76193a01d1083debc3dcfbb8 that got committed to storage', jormungandr/src/blockchain/candidate.rs:563:13
stack backtrace:
   0:     0x55e8062dd454 - backtrace::backtrace::libunwind::trace::hda41dbcdfba36aa0
                               at /cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/libunwind.rs:88
   1:     0x55e8062dd454 - backtrace::backtrace::trace_unsynchronized::h1a8d6e1f8cb3f5d4
                               at /cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/mod.rs:66
   2:     0x55e8062dd454 - std::sys_common::backtrace::_print_fmt::h610c4127487e10da
                               at src/libstd/sys_common/backtrace.rs:76
   3:     0x55e8062dd454 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h0722dc552e01bd1d
                               at src/libstd/sys_common/backtrace.rs:60
   4:     0x55e80630119c - core::fmt::write::h01edf6dd68a42c9c
                               at src/libcore/fmt/mod.rs:1030
   5:     0x55e8062d79a7 - std::io::Write::write_fmt::hf15985f193f03c04
                               at src/libstd/io/mod.rs:1412
   6:     0x55e8062df9a5 - std::sys_common::backtrace::_print::hd8d5d08a1795e743
                               at src/libstd/sys_common/backtrace.rs:64
   7:     0x55e8062df9a5 - std::sys_common::backtrace::print::hf89a79e3921a2366
                               at src/libstd/sys_common/backtrace.rs:49
   8:     0x55e8062df9a5 - std::panicking::default_hook::{{closure}}::h3a8f42beb3bb8ae3
                               at src/libstd/panicking.rs:196
   9:     0x55e8062df696 - std::panicking::default_hook::h8f803b0bc31a5c37
                               at src/libstd/panicking.rs:210
  10:     0x55e8062e00a5 - std::panicking::rust_panic_with_hook::h825f041245da8739
                               at src/libstd/panicking.rs:473
  11:     0x55e8062dfc42 - std::panicking::continue_panic_fmt::hbe0378e33481e81b
                               at src/libstd/panicking.rs:380
  12:     0x55e8062dfb9c - std::panicking::begin_panic_fmt::hecb5f06c798651bd
                               at src/libstd/panicking.rs:335
  13:     0x55e804e083b3 - jormungandr::blockchain::candidate::CandidateForestThickets::on_applied_block::hc70cc39bd483fcef
                               at jormungandr/src/blockchain/candidate.rs:563
  14:     0x55e8050ff732 - jormungandr::blockchain::candidate::CandidateForest::on_applied_block::{{closure}}::h21ea91ae207e333b
                               at jormungandr/src/blockchain/candidate.rs:452
  15:     0x55e8049e82fb - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::{{closure}}::{{closure}}::hef5849af647f9176
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:34
  16:     0x55e80468d958 - core::result::Result<T,E>::map::h7a70c873f706d6f3
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/result.rs:512
  17:     0x55e8049dbc67 - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::{{closure}}::h37639e8c88246d93
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:33
  18:     0x55e8049266c8 - futures::future::chain::Chain<A,B,C>::poll::h0ac578dc86041f74
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:39
  19:     0x55e8049d97bd - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::h4e6a8149eef07a46
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:32
  20:     0x55e804a636c6 - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::hd8167356cca8f5c8
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/map_err.rs:30
  21:     0x55e804c4a251 - <futures::future::map::Map<A,F> as futures::future::Future>::poll::h19b7371868d1a048
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/map.rs:30
  22:     0x55e804932f4c - futures::future::chain::Chain<A,B,C>::poll::h206f7e21622ee3e3
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:42
  23:     0x55e8049d8e5d - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::h06dc8e68987dd892
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:32
  24:     0x55e804c50f61 - <futures::future::map::Map<A,F> as futures::future::Future>::poll::h56d5ccbb7ce7b3eb
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/map.rs:30
  25:     0x55e804d6da58 - <futures::future::either::Either<A,B> as futures::future::Future>::poll::h8dcc5a65fb70e6f1
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/either.rs:36
  26:     0x55e8049b2f2d - futures::future::chain::Chain<A,B,C>::poll::hfd1939c5f50cb8e8
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:42
  27:     0x55e8049da35d - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::haa6247052dd9df2e
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:32
  28:     0x55e804974c0b - futures::future::chain::Chain<A,B,C>::poll::h9bb7609c8ffa05fa
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:26
  29:     0x55e8051011fd - <futures::future::then::Then<A,B,F> as futures::future::Future>::poll::hd89205c5db480c79
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/then.rs:32
  30:     0x55e804d6dc17 - <futures::future::either::Either<A,B> as futures::future::Future>::poll::ha22a6727422dc2ae
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/either.rs:35
  31:     0x55e804970d1c - futures::future::chain::Chain<A,B,C>::poll::h953a09cc044208bf
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:42
  32:     0x55e8049d8dfd - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::h03ea0ff3d5880053
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:32
  33:     0x55e80439fca1 - <futures::future::loop_fn::LoopFn<A,F> as futures::future::Future>::poll::hc748f3e442e90460
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/loop_fn.rs:93
  34:     0x55e80496efeb - futures::future::chain::Chain<A,B,C>::poll::h951f319a7f8780ed
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:26
  35:     0x55e8049d990d - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::h51d7556186347769
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:32
  36:     0x55e804dc67ae - <tokio_timer::timeout::Timeout<T> as futures::future::Future>::poll::hbf3e4aaa97d38ad3
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-timer-0.2.12/src/timeout.rs:179
  37:     0x55e804a632d9 - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::hd46e6102099e786a
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/map_err.rs:30
  38:     0x55e8062b6ee8 - <alloc::boxed::Box<F> as futures::future::Future>::poll::h9a2dafa0406a21ca
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/mod.rs:113
  39:     0x55e80620db14 - futures::task_impl::Spawn<T>::poll_future_notify::{{closure}}::h66b2361b3f08f928
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:329
  40:     0x55e80620ddc4 - futures::task_impl::Spawn<T>::enter::{{closure}}::h40eb32079fbfc0b5
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:399
  41:     0x55e80620e0b4 - futures::task_impl::std::set::hb41125b58e9af010
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/std/mod.rs:83
  42:     0x55e80620dd5f - futures::task_impl::Spawn<T>::enter::hd5c2809b69b8a9d5
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:399
  43:     0x55e80620d9b5 - futures::task_impl::Spawn<T>::poll_fn_notify::he5b36de1c8dca92b
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:291
  44:     0x55e80620dae8 - futures::task_impl::Spawn<T>::poll_future_notify::hcd64ced47b0b6e87
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:329
  45:     0x55e8061f86f2 - tokio_threadpool::task::Task::run::{{closure}}::hbb8313cd0c806579
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/task/mod.rs:145
  46:     0x55e806227f79 - core::ops::function::FnOnce::call_once::h198b3cbedf48e4dd
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ops/function.rs:227
  47:     0x55e80620a149 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hc3bd0173501609d3
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panic.rs:315
  48:     0x55e806234859 - std::panicking::try::do_call::h1086c9b800841e32
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panicking.rs:292
  49:     0x55e8062e458a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:80
  50:     0x55e8062346c0 - std::panicking::try::hdd9fdac19e51ec5e
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panicking.rs:271
  51:     0x55e80620a781 - std::panic::catch_unwind::h079cc8026634b159
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panic.rs:394
  52:     0x55e8061f7f8e - tokio_threadpool::task::Task::run::h6831b0793e4b7fd5
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/task/mod.rs:130
  53:     0x55e806225921 - tokio_threadpool::worker::Worker::run_task2::h7034b2b3e952b87e
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:567
  54:     0x55e806225194 - tokio_threadpool::worker::Worker::run_task::hbe62cd2a5b54aa02
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:459
  55:     0x55e806224807 - tokio_threadpool::worker::Worker::try_run_owned_task::hf8c0c89c027f9808
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:390
  56:     0x55e80622424d - tokio_threadpool::worker::Worker::try_run_task::hf2eb66ed39db5bc4
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:297
  57:     0x55e8062240a6 - tokio_threadpool::worker::Worker::run::h7077827df884c91d
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:241
  58:     0x55e80617ab7c - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::{{closure}}::{{closure}}::{{closure}}::hf721c4a853808f85
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-0.1.22/src/runtime/threadpool/builder.rs:390
  59:     0x55e80617e667 - tokio_timer::timer::handle::with_default::hfbc83167fa17d256
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-timer-0.2.12/src/timer/handle.rs:74
  60:     0x55e80617abd9 - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::{{closure}}::{{closure}}::h3ea195fad1db2363
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-0.1.22/src/runtime/threadpool/builder.rs:382
  61:     0x55e8061713cb - tokio_timer::clock::clock::with_default::hdfe31d48e49674e3
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-timer-0.2.12/src/clock/clock.rs:125
  62:     0x55e80617ac1a - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::{{closure}}::h30071b9e43092ded
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-0.1.22/src/runtime/threadpool/builder.rs:381
  63:     0x55e806174bfb - tokio_reactor::with_default::hc7f9a3528f5af219
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-reactor-0.1.11/src/lib.rs:210
  64:     0x55e80617acb5 - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::h1a82250f31b96fbb
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-0.1.22/src/runtime/threadpool/builder.rs:380
  65:     0x55e806218b43 - tokio_threadpool::callback::Callback::call::hf3fd3aca8d05c233
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/callback.rs:22
  66:     0x55e806223d41 - tokio_threadpool::worker::Worker::do_run::{{closure}}::{{closure}}::had071ff60d24c9c7
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:127
  67:     0x55e80620445b - tokio_executor::global::with_default::{{closure}}::h609d715f476946a3
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-executor-0.1.9/src/global.rs:221
  68:     0x55e80621fee4 - std::thread::local::LocalKey<T>::try_with::h7d8b655ab2fecfe6
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/local.rs:262
  69:     0x55e80621fa33 - std::thread::local::LocalKey<T>::with::h6b5840c06899ac7e
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/local.rs:239
  70:     0x55e806204343 - tokio_executor::global::with_default::h96f81e64e05344ca
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-executor-0.1.9/src/global.rs:190
  71:     0x55e806223ded - tokio_threadpool::worker::Worker::do_run::{{closure}}::h5481440a9cb470e8
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:125
  72:     0x55e80621fd92 - std::thread::local::LocalKey<T>::try_with::h74f71eac8f63a42a
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/local.rs:262
  73:     0x55e80621fa9e - std::thread::local::LocalKey<T>::with::h943345803b1377a7
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/local.rs:239
  74:     0x55e806223c05 - tokio_threadpool::worker::Worker::do_run::h9d6ae295c2f6ce24
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:116
  75:     0x55e8062316d0 - tokio_threadpool::pool::Pool::spawn_thread::{{closure}}::hfdbb294a84f291a1
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/pool/mod.rs:345
  76:     0x55e806232efe - std::sys_common::backtrace::__rust_begin_short_backtrace::hcf42fe591cee80bd
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/sys_common/backtrace.rs:126
  77:     0x55e806216992 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::hedb48e6bd8b94b74
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/mod.rs:470
  78:     0x55e80620a112 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h768772e213e542e4
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panic.rs:315
  79:     0x55e8062348e3 - std::panicking::try::do_call::h1a7b8f30ec9ccd55
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panicking.rs:292
  80:     0x55e8062e458a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:80
  81:     0x55e8062344f6 - std::panicking::try::hbd411e273a35f4a8
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panicking.rs:271
  82:     0x55e80620a7c2 - std::panic::catch_unwind::h4b7418d72ca983ca
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panic.rs:394
  83:     0x55e806216776 - std::thread::Builder::spawn_unchecked::{{closure}}::ha15b4ce39046557b
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/mod.rs:469
  84:     0x55e806227e74 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h5356c3708fbaa6e7
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ops/function.rs:227
  85:     0x55e8062d07ff - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h483711add4ba2330
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/liballoc/boxed.rs:922
  86:     0x55e8062e38f0 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h7605b45eb29ed0be
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/liballoc/boxed.rs:922
  87:     0x55e8062e38f0 - std::sys_common::thread::start_thread::h557b0c2dc4449562
                               at src/libstd/sys_common/thread.rs:13
  88:     0x55e8062e38f0 - std::sys::unix::thread::Thread::new::thread_start::h7c2a7f9b68fe4bba
                               at src/libstd/sys/unix/thread.rs:79
  89:     0x7f71997a6fa3 - start_thread
  90:     0x7f71996bb4cf - clone
  91:                0x0 - <unknown>
block3' panicked at 'chain pull root candidate 9fe4f34a408592a13351e5b90f31b3a65dd8a56526a87ba19e14be3911bc3d15 was previously cached', jormungandr/src/blockchain/candidate.rs:486:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
thread 'block3' panicked at 'missed chain pull root candidate cb281066a04ff116c945fb0fe924a8a772a06a7d40702c46f79995424199ec13 that got committed to storage', jormungandr/src/blockchain/candidate.rs:563:13
thread 'block2' panicked at 'chain pull root candidate 0363d7019e439e766c9cdacf5eb82fef6c76780c7ff1d9f03eca6545b4d12892 was previously cached', jormungandr/src/blockchain/candidate.rs:486:17

Attached also the full rel8-chain_pool_root.log

@mzabaluev
Copy link
Contributor Author

The panic is due to another instance of out of order locking.

@dkijania
Copy link
Contributor

I've run tests with some private network topologies. At first glance looks like there is little improvement (pass rate [4/11] is the same like from before the change) in node synchronisation before there were cases in which one out of 7 nodes was not in sync at all. In this run, difference was more subtle (2 or 3 blocks were off) will update more after deeper analysis

Now that multiple chain pull tasks can pull the same or mostly the same
header chain, and in the mean time between arriving headers
some blocks in the chain may be received and applied,
CandidateForest needs to account for this concurrency.
Do this by only retiring candidate entries on GC, setting the
state to Applied when the block is committed to storage.
When asynchronous ChainAdvance processing sees such
an entry at the current block's parent, it knows to drop the hashes
it collected to fetch and re-land the chain from the current hash.
@@ -393,6 +434,9 @@ impl CandidateForest {
debug_assert!(block.header().hash() == block_hash);
Ok(())
}
CandidateData::Applied(_) => {
panic!("caching block {} which is already in storage", block_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't you worried this will blow in production in some cases? can't we just "accept" the state and continue? If the block is already applied then this is not necessarily cause for panic, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should never happen as the blocks are checked with storage first before being cached.

@mzabaluev mzabaluev added the DO NOT MERGE not to be merged until something else is done label Dec 16, 2019
This removes race conditions in the candidate forest,
and handles out-of-order blocks cached due to lacking a parent.
@mzabaluev mzabaluev removed the DO NOT MERGE not to be merged until something else is done label Dec 16, 2019
@mzabaluev
Copy link
Contributor Author

I have added an improvement to remove a cause for panics that @rinor has observed and properly handle out-of-order blocks.

@rinor
Copy link
Contributor

rinor commented Dec 16, 2019

Tested quickly under the same reproducible scenario and one is still there.

The fastest way to reproduce this is:
Once connected to a network (ITN in this case), after initial bootstrap is completed, just 'disconnect the network connectivity' (routing/physical/....) and leave the node running for a few minutes in that state (do not let it for a long time disconnected since probably poldercasts will quarantine most of the connections - which may be a separate issue to discuss). Reactivate the network connectivity and watch the node recovering and panic within a few moments.

thread 'block0' panicked at 'chain pull root candidate c04895dc43b7d6593a6672af730f86f3e8dc90ce15f3163061cfd81b9b4ee3e2 was previously cached', jormungandr/src/blockchain/candidate.rs:484:17
stack backtrace:
   0:     0x56166f386654 - backtrace::backtrace::libunwind::trace::hda41dbcdfba36aa0
                               at /cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/libunwind.rs:88
   1:     0x56166f386654 - backtrace::backtrace::trace_unsynchronized::h1a8d6e1f8cb3f5d4
                               at /cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/mod.rs:66
   2:     0x56166f386654 - std::sys_common::backtrace::_print_fmt::h610c4127487e10da
                               at src/libstd/sys_common/backtrace.rs:76
   3:     0x56166f386654 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h0722dc552e01bd1d
                               at src/libstd/sys_common/backtrace.rs:60
   4:     0x56166f3aa39c - core::fmt::write::h01edf6dd68a42c9c
                               at src/libcore/fmt/mod.rs:1030
   5:     0x56166f380ba7 - std::io::Write::write_fmt::hf15985f193f03c04
                               at src/libstd/io/mod.rs:1412
   6:     0x56166f388ba5 - std::sys_common::backtrace::_print::hd8d5d08a1795e743
                               at src/libstd/sys_common/backtrace.rs:64
   7:     0x56166f388ba5 - std::sys_common::backtrace::print::hf89a79e3921a2366
                               at src/libstd/sys_common/backtrace.rs:49
   8:     0x56166f388ba5 - std::panicking::default_hook::{{closure}}::h3a8f42beb3bb8ae3
                               at src/libstd/panicking.rs:196
   9:     0x56166f388896 - std::panicking::default_hook::h8f803b0bc31a5c37
                               at src/libstd/panicking.rs:210
  10:     0x56166f3892a5 - std::panicking::rust_panic_with_hook::h825f041245da8739
                               at src/libstd/panicking.rs:473
  11:     0x56166f388e42 - std::panicking::continue_panic_fmt::hbe0378e33481e81b
                               at src/libstd/panicking.rs:380
  12:     0x56166f388d9c - std::panicking::begin_panic_fmt::hecb5f06c798651bd
                               at src/libstd/panicking.rs:335
  13:     0x56166dffe20c - jormungandr::blockchain::candidate::CandidateForestThickets::add_or_refresh_root::hec7c983fb2c9a702
                               at jormungandr/src/blockchain/candidate.rs:484
  14:     0x56166ddae812 - jormungandr::blockchain::candidate::CandidateForest::land_header_chain::{{closure}}::{{closure}}::hc02cae2e79f2546b
                               at jormungandr/src/blockchain/candidate.rs:354
  15:     0x56166d58edbf - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::{{closure}}::{{closure}}::hbf82c3e6446f18f7
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:34
  16:     0x56166dde382c - core::result::Result<T,E>::map::h0b90f752db46eb66
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/result.rs:512
  17:     0x56166d586020 - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::{{closure}}::h995d805d94af181a
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:33
  18:     0x56166dfeb6a7 - futures::future::chain::Chain<A,B,C>::poll::hf8652d94defd0f80
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:39
  19:     0x56166d582e1d - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::hc5618510b946ec21
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:32
  20:     0x56166d9366b7 - <futures::future::either::Either<A,B> as futures::future::Future>::poll::hb205f37386b595df
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/either.rs:35
  21:     0x56166df6c65d - futures::future::chain::Chain<A,B,C>::poll::h1bbb68261dab3477
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:42
  22:     0x56166d58185d - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::h229e5396bad51679
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:32
  23:     0x56166dfa16bb - futures::future::chain::Chain<A,B,C>::poll::h7f6f807e13b0798a
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:26
  24:     0x56166d5821ed - <futures::future::and_then::AndThen<A,B,F> as futures::future::Future>::poll::h74437975cba18de9
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/and_then.rs:32
  25:     0x56166dfc2cfb - futures::future::chain::Chain<A,B,C>::poll::hac668794543d624d
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:26
  26:     0x56166dda376d - <futures::future::then::Then<A,B,F> as futures::future::Future>::poll::hf2f2235efc995af8
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/then.rs:32
  27:     0x56166dd13d0e - <tokio_timer::timeout::Timeout<T> as futures::future::Future>::poll::hef596ea5b83d868e
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-timer-0.2.12/src/timeout.rs:179
  28:     0x56166d5b2d29 - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::h751ad044ef18d459
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/map_err.rs:30
  29:     0x56166f3600e8 - <alloc::boxed::Box<F> as futures::future::Future>::poll::h9a2dafa0406a21ca
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/future/mod.rs:113
  30:     0x56166f2b6d14 - futures::task_impl::Spawn<T>::poll_future_notify::{{closure}}::h66b2361b3f08f928
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:329
  31:     0x56166f2b6fc4 - futures::task_impl::Spawn<T>::enter::{{closure}}::h40eb32079fbfc0b5
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:399
  32:     0x56166f2b72b4 - futures::task_impl::std::set::hb41125b58e9af010
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/std/mod.rs:83
  33:     0x56166f2b6f5f - futures::task_impl::Spawn<T>::enter::hd5c2809b69b8a9d5
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:399
  34:     0x56166f2b6bb5 - futures::task_impl::Spawn<T>::poll_fn_notify::he5b36de1c8dca92b
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:291
  35:     0x56166f2b6ce8 - futures::task_impl::Spawn<T>::poll_future_notify::hcd64ced47b0b6e87
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:329
  36:     0x56166f2a18f2 - tokio_threadpool::task::Task::run::{{closure}}::hbb8313cd0c806579
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/task/mod.rs:145
  37:     0x56166f2d1179 - core::ops::function::FnOnce::call_once::h198b3cbedf48e4dd
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ops/function.rs:227
  38:     0x56166f2b3349 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hc3bd0173501609d3
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panic.rs:315
  39:     0x56166f2dda59 - std::panicking::try::do_call::h1086c9b800841e32
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panicking.rs:292
  40:     0x56166f38d78a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:80
  41:     0x56166f2dd8c0 - std::panicking::try::hdd9fdac19e51ec5e
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panicking.rs:271
  42:     0x56166f2b3981 - std::panic::catch_unwind::h079cc8026634b159
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panic.rs:394
  43:     0x56166f2a118e - tokio_threadpool::task::Task::run::h6831b0793e4b7fd5
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/task/mod.rs:130
  44:     0x56166f2ceb21 - tokio_threadpool::worker::Worker::run_task2::h7034b2b3e952b87e
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:567
  45:     0x56166f2ce394 - tokio_threadpool::worker::Worker::run_task::hbe62cd2a5b54aa02
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:459
  46:     0x56166f2cde8e - tokio_threadpool::worker::Worker::try_steal_task::ha1665f9b5b6fdd8a
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:416
  47:     0x56166f2cd46a - tokio_threadpool::worker::Worker::try_run_task::hf2eb66ed39db5bc4
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:301
  48:     0x56166f2cd2a6 - tokio_threadpool::worker::Worker::run::h7077827df884c91d
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:241
  49:     0x56166f223d7c - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::{{closure}}::{{closure}}::{{closure}}::hf721c4a853808f85
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-0.1.22/src/runtime/threadpool/builder.rs:390
  50:     0x56166f227867 - tokio_timer::timer::handle::with_default::hfbc83167fa17d256
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-timer-0.2.12/src/timer/handle.rs:74
  51:     0x56166f223dd9 - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::{{closure}}::{{closure}}::h3ea195fad1db2363
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-0.1.22/src/runtime/threadpool/builder.rs:382
  52:     0x56166f21a5cb - tokio_timer::clock::clock::with_default::hdfe31d48e49674e3
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-timer-0.2.12/src/clock/clock.rs:125
  53:     0x56166f223e1a - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::{{closure}}::h30071b9e43092ded
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-0.1.22/src/runtime/threadpool/builder.rs:381
  54:     0x56166f21ddfb - tokio_reactor::with_default::hc7f9a3528f5af219
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-reactor-0.1.11/src/lib.rs:210
  55:     0x56166f223eb5 - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::h1a82250f31b96fbb
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-0.1.22/src/runtime/threadpool/builder.rs:380
  56:     0x56166f2c1d43 - tokio_threadpool::callback::Callback::call::hf3fd3aca8d05c233
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/callback.rs:22
  57:     0x56166f2ccf41 - tokio_threadpool::worker::Worker::do_run::{{closure}}::{{closure}}::had071ff60d24c9c7
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:127
  58:     0x56166f2ad65b - tokio_executor::global::with_default::{{closure}}::h609d715f476946a3
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-executor-0.1.9/src/global.rs:221
  59:     0x56166f2c90e4 - std::thread::local::LocalKey<T>::try_with::h7d8b655ab2fecfe6
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/local.rs:262
  60:     0x56166f2c8c33 - std::thread::local::LocalKey<T>::with::h6b5840c06899ac7e
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/local.rs:239
  61:     0x56166f2ad543 - tokio_executor::global::with_default::h96f81e64e05344ca
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-executor-0.1.9/src/global.rs:190
  62:     0x56166f2ccfed - tokio_threadpool::worker::Worker::do_run::{{closure}}::h5481440a9cb470e8
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:125
  63:     0x56166f2c8f92 - std::thread::local::LocalKey<T>::try_with::h74f71eac8f63a42a
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/local.rs:262
  64:     0x56166f2c8c9e - std::thread::local::LocalKey<T>::with::h943345803b1377a7
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/local.rs:239
  65:     0x56166f2cce05 - tokio_threadpool::worker::Worker::do_run::h9d6ae295c2f6ce24
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/worker/mod.rs:116
  66:     0x56166f2da8d0 - tokio_threadpool::pool::Pool::spawn_thread::{{closure}}::hfdbb294a84f291a1
                               at /home/rinor/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823/tokio-threadpool-0.1.17/src/pool/mod.rs:345
  67:     0x56166f2dc0fe - std::sys_common::backtrace::__rust_begin_short_backtrace::hcf42fe591cee80bd
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/sys_common/backtrace.rs:126
  68:     0x56166f2bfb92 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::hedb48e6bd8b94b74
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/mod.rs:470
  69:     0x56166f2b3312 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h768772e213e542e4
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panic.rs:315
  70:     0x56166f2ddae3 - std::panicking::try::do_call::h1a7b8f30ec9ccd55
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panicking.rs:292
  71:     0x56166f38d78a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:80
  72:     0x56166f2dd6f6 - std::panicking::try::hbd411e273a35f4a8
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panicking.rs:271
  73:     0x56166f2b39c2 - std::panic::catch_unwind::h4b7418d72ca983ca
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/panic.rs:394
  74:     0x56166f2bf976 - std::thread::Builder::spawn_unchecked::{{closure}}::ha15b4ce39046557b
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/mod.rs:469
  75:     0x56166f2d1074 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h5356c3708fbaa6e7
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ops/function.rs:227
  76:     0x56166f3799ff - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h483711add4ba2330
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/liballoc/boxed.rs:922
  77:     0x56166f38caf0 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h7605b45eb29ed0be
                               at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/liballoc/boxed.rs:922
  78:     0x56166f38caf0 - std::sys_common::thread::start_thread::h557b0c2dc4449562
                               at src/libstd/sys_common/thread.rs:13
  79:     0x56166f38caf0 - std::sys::unix::thread::Thread::new::thread_start::h7c2a7f9b68fe4bba
                               at src/libstd/sys/unix/thread.rs:79
  80:     0x7f1367980fa3 - start_thread
  81:     0x7f13678954cf - clone
  82:                0x0 - <unknown>
thread 'block2' panicked at 'chain pull root candidate 850718aacf2757e77dd8cd7dc7eb00a10c84778f9a4c5a217b80ade0a9fc5a59 was previously cached', jormungandr/src/blockchain/candidate.rs:484:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

Full log itn_v1-chain_pull_root_candidate_20191216.log

@NicolasDP NicolasDP merged commit a804e4b into master Dec 17, 2019
@NicolasDP NicolasDP deleted the debug-candidate-map branch December 17, 2019 09:56
@mzabaluev
Copy link
Contributor Author

Thanks @rinor for repro guidance. I have fixed the problem in #1389.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subsys-block block creation/acceptation related
Projects
None yet
4 participants