Skip to content
This repository has been archived by the owner on Jul 14, 2023. It is now read-only.

Transient storages #11

Closed
wants to merge 5 commits into from
Closed

Transient storages #11

wants to merge 5 commits into from

Conversation

cheme
Copy link

@cheme cheme commented Apr 5, 2023

I started drafting a ppp about https://github.com/paritytech/substrate/issues/12577, based on my current implementation and changes to the issue in https://github.com/cheme/substrate/tree/transient and part of the refactoring that is not related to this PPPs in paritytech/substrate#13006.

I open the PPPs for discussion, but at this point there is too many open question to publish it as is.

I fill the last part with these questions cc @gavofyork, @bkchr and all.

Ps: also certainly a lot of english issues at this point.


Additionally content such as events often are used in a log based manner (append only) with possibly a larger size than usual content.

Hashing through host function involves passing all data at once, and is not memory efficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this sentence, it seems that one alternative could be to provide new host functions that allow hashing data progressively, which seems tremendously more simple than all this machinery.

If this alternative is not viable, this section really should explain why.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I propose in the paragraph "Implementation of Blob storage hashing". This part is not really strictly needed for transient storage, but it did not make sense to me to have blob (potentially big) hashing with the current api.

Maybe I should extract it in a separate PPP (progressive hashing host function).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the "Motivation" and/or "Alternatives" section should explain what problem is being solved, but also why this specific design was chosen and not a different one. The objective is to make sure that this isn't an XY problem.

If the problem that is being solved is just that "hashing through host function isn't memory efficient", then the solution in this RFC is way overkill.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more another issue that I tackled when implementing and force push in this PPP, it is a bit confusing to have all in it.
Similarily the ordered map solve a precise issue, and blob a slightly different one.
Clearly would make sense to me to have the transient ordered map in a first time and blob in a second time: could be two different PPPs (even if this helps factoring).
I think I will extract the hashing part as a first step, not sure about splitting between ordered map and blob.

Copy link
Contributor

@tomaka tomaka Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarily the ordered map solve a precise issue, and blob a slightly different one.

But again, my point is that it should be written in the PPP which issue is being solved.
It's not possible to have an opinion on whether a proposal is good if you don't know which use-cases it has and problems it solves.


### Implementation of Btree storage

- `ext_btree_storage_new` with parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all the functions named btree? This implies a binary tree, whereas the implementation doesn't actually need to use a binary tree.
Naming them like tree or ext_transient_storage_* would be more appropriate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, it is good to make it clear that the structure is same as rust btree in term of api and capability, but it is indeed incorrect.

What do you think of "ordered_map" (actually it is a bit long but no better idea right now)?

This implies that the storage must support commiting and reverting transaction with `ext_storage_commit_transaction` or `ext_storage_rollback_transaction`.
This transactional support is both at transient storage content and at transient storage definition (a delete transient storage will be restore on rollback).

Btree and blob are using a specific `Mode`, either `drop` or `archive` passed respectively as the byte 0 or 1. When using `drop` the data will not be send from the runtime executor to the calling client. When using `archive` the committed state of the transient storage will be passed as a change set to the client calling runtime executor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using archive the committed state of the transient storage will be passed as a change set to the client calling runtime executor.

Isn't that an implementation detail that isn't relevant here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we could stick to "archive indicate that information should be storable and fetch-able from the client".

- value : a pointer size to the value of the content to insert.
Returns false if there is no btree storage defined for this `name`, true otherwhise (success).

This insert a new key value content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This insert a new key value content.
This insert a new key value content. Does nothing if the btree storage of this `name` doesn't exist.

Needs to be explicited.


This operation cost is high, the implementation do not try to avoid copy.

- `ext_btree_storage_rename` with parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function necessary? Especially if transient storages aren't stored across blocks, I have trouble seeing why you would need to rename a storage.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought only to have a file like api, but thought of a possible usage.

You got a blob define globally: let s say system pallet defines "log" blob.
Then every extrinsic can access it and append to it.
But if a pallet want to use code from other pallet but not appending to log, then it just need to rename "log" do something on another "log" and afterward restore "log", at pretty much 0 cost.
So in this case to isolate a storage and restore it afterward. Same thing with potentially switching context for the remaining of a block processing.

But I would not say it is strictly necessary. And considering it is the more involved implementation detail (to keep it 0 cost), I would not drop the idea of not having it.


- `ext_btree_storage_root32_item` with parameters:
- name : a pointer size to the name of a transient storage to rename.
- structure: `Root32Structure` passed as a byte, 0 for `SubstrateDefault` which is the same merkle trie implementation as the substrate trie.
Copy link
Contributor

@tomaka tomaka Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are Root32Structure and SubstrateDefault?
I shouldn't have to read the Substrate code in order to understand the spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind "Root32Structure" is to allow multiple way to calculate a merkle root from key value content.

"SubstrateDefault" is just using the merkle trie V1 that we currently have in state.
But this is not a great merkle structure, clearly a binary trie will be way better (smaller proof), but this is straight forward implementation as a start.

- name : a pointer size to the name of a transient storage to rename.
- key: a pointer size to the previously accessed key.
- count: a u32 indicating the maximum number of key to return.
Returns a scale encoded optional sized array of keys (rust `Option<Vec<Vec<u8>>>`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns a scale encoded optional sized array of keys (rust `Option<Vec<Vec<u8>>>`).
Returns a scale encoded optional sized array of keys (rust `Option<Vec<Vec<u8>>>`). Returns a SCALE-encoded `None` either if there is no transient btree with that name or if the `key` is the last key of that transient btree.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intention was more to:
Returns a SCALE-encoded Some(vec![]) if the key is the last key of that transient storage.

@burdges
Copy link

burdges commented May 7, 2023

Why do encodings or storage like accesses here?

We do not want persistence in paritytech/polkadot-sdk#245 so why not pass an arbitrary rust type somehow? At a high level we've two-ish concerns:

  1. Are tx going to be verified in parallel? Interior mutability flavor? etc.
  2. We roll back tx during block building, right?

We need determinism so we cannot give parachains interior mutability per se, nor even permit sequence dependent code. We could permit any code for which polkadot enforces deterministic sequencing, but there is zero reason for this sequence to be the same as during block creation.

We want roughly a spawn/fork method that creates a new thread of the transient storage for a pallet, and a merge/join method that pulls together these threads. All/many tx fork off their own storage during block creation, which then all merge before on_finalize, so block creation could omit some forks. Actual blocks otoh contain instructions which given an explicit sequence of fork and merge calls, assign transactions to individual forks, and merge all forks before on_finalize.

It simply a bug that halts your block production if you've transient storage whose merge code works for num_tx transient storages during block creation but fails when iteratively applying tx to num_threads < num_tx transient storages during block verification.

@cheme
Copy link
Author

cheme commented May 8, 2023

We do not want persistence in paritytech/polkadot-sdk#245

There is still the archive mode (could be very useful for events), where we allow archiving transient storage for a given block (state at the end of block processing only).
Now that I think about it, could make sense to have a host function that allows a chain to archive transient storage at random point during block processing (eg every end of extrinsic), would be just a noop call for cumulus and a chain specific management on parachain (main usecase would be storing events of each extrinsic, but we can still do storage this way by storing at "event_prefix_" ++ extrinsic_number for a given blob name).

Why do encodings or storage like accesses here?

Main thing is to share info between extrinsic I think. We usually try to isolate extrinsic executions, but actually it is not always doable in practice.

So one could use a pointer over the wasm memory instead of this transient host storage, as long as we are guaranteed to be using the same executor (so the same wasm memory):
Pros:

  • faster (no memory copy through host functions call)

Cons:

  • require some mechanism for single extrinsic evaluation (block building).
  • root calculation or hash calculation then would need either passing memory anyway to run on host or run on wasm
  • transaction support not implemented

but yes could also just instantiate transient transactional overlay in on_init and pass it to all extrinsic during block processing directly as a function parameter. Would make this wasm memory part of the spec: so a single way to implement/store the transactional infos (which would probably be following rust storage and memory alignment, probably a bit difficult to specify).
Yes would make a chunk of wasm memory specific layout part of the chain specific, I am not sure it is that bad (different chains could use different variants), but can end up being hard to maintain.
If I was to do it, I would limit this memory to a simple &mut Vec that get passed between all extrinsic and then up to implementation to write some data into it (all btreemap<name, blob> all transactional history and others). But that would be a really raw way of doing it and not user friendly.
Using host functions provide an api and some user friendliness I hope.

More generally using storage primitives similar to the existing ones makes things easy to think about.
Worth noticing also that with cumulus we likely will be using wasm memory and only call host function for hashes (so the other proposal to aggregate hasher: #12), as it is currently the case for the transactional storage of the state.

> Are tx going to be verified in parallel? Interior mutability flavor? etc.

Same problematic as with storage host function: the api forbid it. Strategy that allows evaluating against storage also applies to transient storage (in a way it makes things simpler) , like design for thread (split and act on state at split then optimistically try reconcile with a deterministic sequence of join (or same with borrowing part of state and transient storages on split) will work the same way with transient storage as with the trie storage.

> We roll back tx during block building, right?

I don't remember for sure, but yes we do have (or had) this try an extrinsic over a context that is not exactly the same one as the one for the block building. AKA running individual transaction out of sequence is not guaranteed to get same result as in the actual block but quite likely (except if your runtime do stuff as branching code execution over extrinsic number).
Usually things that are shared between two extrinsic should be as write/append only.

We want roughly a spawn/fork method that creates a new thread of the transient storage for a pallet

The way I see thing, for thread, things are the same as with storage: base your read access on the state of the transaction overlay when the state start and write locally to the thread, then on thread joining flush change into the base state in a deterministic order.
For things to make sense though, one should avoid concurrency issue by avoiding two read access at a same location from two thread (can be check on join (optimistically), or else borrow read access (exclusive to write) on spawn (I like the second one better)).

What is the current story with threads? Last time the idea was polkadot manage threads at a parachain level and threads during a single state transition are not needed, we even did remove the experimental threads host function from substrate code base.

TLDR; on the pure design aspect of this proposal, I think using host function in a similar way as the state trie is good to have: make things easier. Main question could be why not passing directly wasm memory: I would say that we have transaction overlay for storage so this proposal just use the same overlay storage as for merkle state (host memory for substrate, wasm memory for cumulus).

@burdges
Copy link

burdges commented May 8, 2023

Is archive mode for regular execution, like maybe forensics, or just debugging? If debugging then maybe other tools make more sense. Anyways..

I've three cases in mind:

A cryptographic scheme could employ time memory trade offs, like 30kb precomputed base point tables. We typically think these should be stored in the PVF, but some like halo2 make sense to pre-generate via a post-build hook into the PVF, and some make sense to run per block. A game using VRFs fits the latter case.

A cryptographic batch verifier would accumulate information from each extrinsic. In general, some of this information would be stored in a Vec<T> but T might not be zero cost to serialize (in safe code), while also some of this information would be accumulated in a Shake128 or similar.

A parachain could make block level choices which impact execution, some of which could accumulate information about the extrinsics in ways vaguely like batch verification. An on-chain game might process whole "rooms" in individual blocks, so initially it loads the room state from the chain, then repeatedly updates the room state, and finally writes it back to the chain.

Pros:
faster (no memory copy through host functions call)

It's not primarily memory copies and hashing being done for each extrinsic, although those certainly turn your nice fast rust code into a scripting language.

It's more the mental overhead for the developers, projects, etc. It's unfortunate if a chain needs to find and hire a cryptography engineer because they cannot use off-the-shelf crates. It's worse if they cannot find such a person because those people prefer ecosystems like cosmos where they can use standard tricks over ecosystems where they spend their time rewriting nice optimized code to be inefficient.

Cons:
require some mechanism for single extrinsic evaluation (block building).

Isn't the issue more that my suggestion breaks single extrinsic evaluation? this is why I proposed this fork/merge interface.

root calculation or hash calculation then would need either passing memory anyway to run on host or run on wasm

We do not calculate any state roots because it's all transient.

transaction support not implemented

See above, but imho the alternative is an external tx sequencer, aka we inherit more of ETH's problems.

More generally using storage primitives similar to the existing ones makes things easy to think about.

Using storage primitives more similar to everything else in Rust makes using external crates easier.

For things to make sense though, one should avoid concurrency issue by avoiding two read access at a same location from two thread (can be check on join (optimistically)

There is no substrate style "storage location" in the design I'm proposing, just single-threaded in-memory data structures with specific semantics, and the block author specifies the spawn/fork v merge/join sequence that ensures determinism.

We've discussed deterministic multi-threaded execution before, which yes obeys a similar model, aka instructions in the block say fork and join. In join, we'd permit threads to read from the same storage, but we abort the block if a thread writes to a storage that another thread accesses.

We'd likely use this forks and joins for both during block verification, but not during block creation. This in-memory stuff needs to work for individual tx during block building, but threaded accesses to the regular storage brings other constraints.

@cheme
Copy link
Author

cheme commented May 8, 2023

Is archive mode for regular execution

yes, but since this is transient data (no need to access it in later execution), it can be opt in opt out.

We do not calculate any state roots because it's all transient.

We have primitives to do so in the proposal (hash of blob and root of ordered storage), this way you can write the root of a transient storage in state and provide services to access the corresponding state or at least check proofs against it (archiving mode being the usual way to store the block content as it can run synchronously with block import, but could be indexed differently too, probably want some flexibility here).

The main and more obvious use case really is storing events and storing their root in state. Then it is doable to get proof of event occurring in a block and at the same time not requiring your client to actually store the state (like if you do not want to serve them just don store them, since it is not needed to evaluate future state transition that is perfectly fine).

Notice that using a standard rust btree we can still get a root, just this would be wasm memory and we may want to hash in the host (it is actually what is done with cumulus).

a post-build hook into the PVF, and some make sense to run per block

Sounds like a write once global variable. In general does your use case involve supporting transaction or is it more just exposing a global memory to extrinsics.

Could almost just use global wasm memory.

But then when processing single extrinsic you would not want to run it every time: writing it in a transient blob during on_init would work as all single extrinsic run over the state post block initializing.
But yes then you may need some serializing if what you store is a bit complex.
And will go through a bit of memory copying, not sure how bad it can be.
Using a wasm global variable that you init from the transient storage when doing single extrinsic evaluation (one mem copy per extrinsic), but use as is when running a full block (there on_init do not even need to write the trensiant storage).

Is having specific execution path for build, import and verify something good to have? It easily sounds like a good footgun (I would want it personally).

It's more the mental overhead for the developers, projects, etc.

Am I wrong if I say that solution to your issue would just be to allow using wasm global variable (which may just need to inject the wasm executor with the wasm memory state at the end of the init block)?
(actually sounds simpler than loading from transient storage).

Isn't the issue more that my suggestion breaks single extrinsic evaluation?

I don't think we have single extrinsic evaluation, what we have is a "likely to pass in a block if if it passes alone".
For instance block number is stored as a state entry and set back to default value at the end of block, so already there is a state associated with the previous extrinsics. IE nothing prevent a chain from setting context in an extrinsic and use it again in a next one (like we currently do for event, usually we want to have write only operation, but nothing currently prevent from reading events of previous extrinsic).
But yes transient storage might facilitate such mechanism, kind of the point, because abusing state storage change overlay to pass around info is a bit awkward (and incurs some unneeded state query, but it is just a detail).

Using storage primitives more similar to everything else in Rust makes using external crates easier.

I have a hard time picturing it (except if just some global wasm variable which may really not be difficult to put in place, but I would need a wasm executor specialist to tell me if the cost of saving/injecting the wasm memory post initialization is too big).
Otherwise, not really sure how it could be easier, for instance if I choose to use btreemap, I could try to plug it to a special allocator that access this special shared mem location and then try to apply some transactional storage overlay on top of this memory location.

There is no substrate style "storage location" in the design I'm proposing, just single-threaded in-memory data structures with specific semantics, and the block author specifies the spawn/fork v merge/join sequence that ensures determinism.

Except with threads/worker, I don t really see a determinism issue.

Generally getting some good state access isolation sounds good thing to strive for (for multithread or more single extrinsic isolation), but I am not sure if it fits here. I mean it is already needed for state storage and would be needed as much with transient storage as proposed here.

@burdges
Copy link

burdges commented May 9, 2023

Yes, a static mut or similar works, but not a write once type. We've no global memory now though, right?

I figured this fork/join construct permits multi-threaded tx, but yes they'd hit regular storage too, so maybe no point without multi-threading for regular storage.

@cheme
Copy link
Author

cheme commented May 9, 2023

We've no global memory now though, right?

I would think it works, but it is a bit out of my scope (last relevant issue was paritytech/substrate#11232 where the globals switch from a legacy save and load to using native wasmtime pooling).
Then the question is if it is saved between executor (just looking at the code in place I would have hope it could but it is not really a part I know correctly).

@cheme
Copy link
Author

cheme commented May 9, 2023

so maybe no point without multi-threading for regular storage.

Could have to provide guarantee there is no interaction between extrinsic in a bloc (but would require quite the refactoring in the tx pool changes). but just a personal opinion here.

@burdges
Copy link

burdges commented May 9, 2023

That's what fork and join would do. It's more complex than single threaded though

@cheme
Copy link
Author

cheme commented May 17, 2023

@burdges, I gave some more thought about your suggestion.

And guess yes, there is an alternate design that could really make sense.

Idea would be to store in wasm memory (global), so no host function. This means that runtime (frame should tool it in its macro) would be responsible to call the transaction primitive on this wasm storage at the same time as it calls the host storage transactions.

To be able to archive data, some payload primitive should be use, these are restricted to
ordered_map_archive_key_value
blob_archive_chunk
ordered_map_archive_root
blob_archive_hash
to be call on finalize (or when needed).
Root calculation will only call hashing function (trie code in wasm), and hash calculation only call the hasher on chunks (with proposal paritytech/substrate#12).

Codewise it requires changes to frame, some testing and probably adaptation on using global wasm variable, and the archive, and hashing specific host functions.
Regarding current branch, the new storage structure in the overlay is still needed, but would be exposed as global wasm variable (eg a wasm environmental as with cumulus) .
But that is still a very different direction with more incertitude (will keep thinking about it a bit I think).

@burdges
Copy link

burdges commented May 17, 2023

Do we already have access to this global wasm memory? I suppose no but if yes we'd access it via a thread local or even static mut?

@cheme
Copy link
Author

cheme commented May 17, 2023

I did not find anything blocking access to global memory .
At least with cumulus we expose global wasm memory through environmental crate wihch use thread_local for non wasm and a simple static over a refcell for wasm https://github.com/paritytech/environmental/blob/02d89d5233b9330d618a2add5e4f1fd9651813c9/src/lib.rs#L78).
But pvf is a single wasm execution, would need to ensure that when we evaluate a transaction in transaction pool, we access the memory from after on_init, which as mentioned I believe we do.

@cheme
Copy link
Author

cheme commented May 26, 2023

I am currently working on this direction, in https://github.com/cheme/substrate/tree/transient-global-mem , where all run in wasm memory (with hashing through host function), and we got only limited host function for the archive mode:
https://github.com/cheme/substrate/blob/bac08dcf71f6182bc91761160cad7e12e5944a09/primitives/io/src/lib.rs#L635 .

So at this point this PPP may not be too relevant, and may be significantly changed when the branch proves working properly.

@cheme
Copy link
Author

cheme commented Jun 29, 2023

Ok, I did run test with global memory in branch transient-global-mem.

The design looks way better as we only use host function for the archive mode, but with no support
for transaction.
Actually the host I use are a bit incorrect, it should be a runtime api that the client call after
block execution to get all infos to store (this way no host function just a runtime api).

But I clearly read to fast how executor works, wasm executor in itself seems to be able to reuse
global memory, but in the plumbing of BlockBuilder it a different instance (possibly an
instantiated one but not the one from the previous call) is used for each extrinsics.

So would need a substantial refactoring of block builder, call_executor and many api to allow persisting the
wasm instance between different calls.
From a bit of debugging it does not seems much (pass around the optional instance as an opaque
type), but it can also interfere a bit with the caching system, and generally this may be needed at
different places than just blockbuilder.
There is also the question of the native call that could work fine on environmental with not a lot
of change (warning about mem leaks though), but there might be some more complex cases like
NativeOrWasmStrategy (even if I don t see it immediatly as problematic).

So for now, I will port back the change from transient-global-mem and push transient branch (the one
with this proposal many host functions).
Which mean this PPP is relevant again (unless sharing execution instance around block building and others get in the roadmap).

Clearly maybe it is worth doing the refactoring to allow wasm instance to be pass around during block
building? CC\ @bkchr

@burdges
Copy link

burdges commented Jun 30, 2023

We've seemingly no plans to make the PoVs multi-threaded, so we should likely do this using shared memory:

We add a host call that allocates memory pages in a region excluded from address space layout randomization, with the first one being passed as a pointers: ArrayVec<wasm_usize> between entry points.

In import, these pages are all allocated normally once and passed along between entry points.

In block building, each new transaction entry point reallocates them using copy-on-write. If a transaction panics, then we simply trash all the pages it copied, including its new pointers. If a transaction succeeds, then we trash the old pre-copy pages, again including the old pointers

We then implement an SubTransientAlloc: Allocator so that any Rust type can be passed in this way.

It's now unsafe to place a reference or an owned pointer to a non-SubTransientAlloc type into a SubTransientAlloc type, just like it's unsafe to serializes a pointer and place it into storage. We could surface this unsafety in rustc by duplicating alloc::{collections,vec}, but maybe miri could enforce this outside rustc.

@Polkadot-Forum
Copy link

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/new-host-functions-for-non-merklised-optionally-persisted-data-types-substrate-12577/922/2

@Noc2
Copy link
Contributor

Noc2 commented Jul 14, 2023

We will archive this repository in favor of the new RFCs repository by the Fellowship. So, I'm closing the issue here. I recommend opening a PR at the new repository.

@Noc2 Noc2 closed this Jul 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants