-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Block building within the same wasm memory? #10557
Comments
It would seem to me that a panic in the runtime is already a "game breaking" problem to have in the code, and thus we should not design a less efficient system in the case that we can better handle an already broken code path. If saving a single wasm instance will provide a significant performance increase, I think it makes perfect sense. |
You can not proof that there is never a panic anywhere. So, we need to take care of this. |
We already assume a panic in the runtime is a DDOS vulnerability to a chain. Does it actually matter then the scale of the DDOS? |
Here the problem could be that you stop block production on all validators. This is really bad! Currently, when you have one failing tx, nothing will happen. But, if we can not safely role back, it means the entire block needs to be thrown away. Yeah, we could restart and skip transactions, but this needs to be really thought through :P A panic could for example also happen because some storage entry can not be decoded anymore. There are tons of reasons why the runtime can panic. I know that we try to prevent it, but we can not proof this. |
The client can heuristically pass a package of transactions into the runtime in the single API call as well. For example, if the client is almost sure that all of the transactions in the pool can fit in the next block.
Is this the main reason for the current design? Also, isn't there any way for wasm to handle panics internally, i.e. something like |
I did not participate in designing this, so I am not sure about the original intention, but yes using separate instances in the block builders definitely solves this. And yes, the present-day wasm cannot really handle unwinding correctly, even with |
Proper try catch will also require this for Parachains: #10222 |
At a high level, we should not expect verifiers to be the same code as provers. We all know this for crypto and storage of course, but it extends into the block logic too. In particular, your verifier looks radically different from your builder anytime your block logic involves some NP-hard problem like an integer program, or a space flavor like NL-hard problems, or not even slow but simply faster if given some witness. We'll need real persistent memory for batch verification, from which most crypto benefits but which becomes essential for many zk proof systems. In these, you've many transactions each with their own proofs provided by users, but the block builder performs some computation for merging these proofs. It's morally like preparing the PoV but should happen within a single chain. A batch verifier would first collect curve points or G_T elements from each transaction or inherent, and then run some batch proof checker on all of them plus some additional inherent data. Importantly, we do not serialize these collected curve points or G_T elements because doing so securely is slow and upstream maintainers would fight you tooth & nail to never expose faster insecure serialization. Also anti-MEV measures could exploit batching so transactions from one block cannot be placed into a different block, without seeing the original transaction anyways. Anyways.. We should definitely do in-memory storage, but ideally we should've some story for when parachain teams logic really differs between block builders and verfiers, like say a game which approximately solves an integer program at each iteration. It's possible that story becomes frameless runtimes like tuxedo, but a less radical solution would be replacing the It's likely cosmos is way ahead of us here, even assuming their core SDK team made similar design choices, penumbra would've forked this sort of functionality into their ecosystem by now. |
This is not necessarily a feature request or call to action, but just writing down the thoughts on this topic.
Whenever substrate imports a block, it would call a runtime API function exposed as
execute_block
. Under the hood, it would initialize the state of the block, pick and run each and every extrinsic and then finalize the block, and all within the same wasm instance. This means that all memory is persistent within the two invocations. Simply speaking, changes to memory inon_initialize
will be visible inon_finalize
.However, In contrast to this, while building a block, each state will be its own runtime call: initializing the block will be one, applying an extrinsic will be another.
This means that FRAME or other runtime code cannot assume that the memory is persistent between the calls.
There is a number of reasons why this may be desirable:
on_initialize
andon_finalize
without touching storage. It happens fairly often when we need to do something inon_finalize
. For example, remove N elements from a list that satisfy some condition. However, usingon_finalize
requireson_initialize
to return weight to-be-consumed byon_finalize
, implying thaton_initialize
will need to see how many items satisfy the predicate thus getting N. However, even thoughon_initialize
did the work,on_finalize
will need to run the same code to decide which elements to prune. It would be good ifon_initialize
could just communicate toon_finalize
what items it needs to remove.This also ties back to the issue of wasm instance spawning overhead. During the recent work on #10244 we found out that right now an instance call overhead is at least ≈50µs. If we take our target of 1000 tps for polkadot we get 12k tx per parablock, thus in cumulus it would take 600ms only on wasm instance spawning overhead, a good chunk of time by any means. While this is not critical and possibly we will get bottlenecked somewhere else, still something to keep in mind.
One approach to tackle this is to simply keep an instance between the runtime API calls, instead of creating a new one each time.
Similar issue was discussed between me and @rphmeier where we came up with an idea where instead of external iteration we could use internal. That is, it's when the runtime controls the block filling. As a strawman: the block building interface would look like a single call into the runtime. That call will do the initialization of the block, then fetching the extrinsics and applying them and then finalization. The fetching part is the most interesting here: the runtime calls a specific non-deterministic function which returns the next transaction. That probably has an entire can of tradeoffs that should be thought through, like how do we handle timeouts and so on.
One problem that prevents us from making the memory persistent between the stages of block execution is that some extrinsics can panic. Although this is not normal, this can potentially happen. When that happens, we want to make sure that the already possible DoS vector is not amplified by the implementation details. Giving a runtime call a new wasm instance is handy because it can destroy it and the block authorship module could just move on. If we were to preserve the memory between calls we need to figure how to recover from such a situation quickly.
This now brings us to the classic ideas like paritytech/polkadot-sdk#370. With integration of wasmtime we figured out that we probably should employ mmap/CoW techniques. In contemporary history, we are thinking about resurrecting the CoW approach to drive down the wasm spawning latency. Perhaps, the very same mechanism may allow us to implement the last part of paritytech/polkadot-sdk#370 effeciently?
The text was updated successfully, but these errors were encountered: