Iterator refactoring #181

koute · 2023-01-30T12:16:02Z

This PR simplifies and refactors the iterators in trie-db.

The changes can be summarized as follows:

Added TrieDBNodeIterator-like TrieDBRawIterator and made it lifetimeless (instead of storing a &TrieDB it now takes it as a parameter during the iteration)
SuspendedTrieDBKeyIterator was removed since the TrieDBRawIterator is a complete replacement for it (with the added bonus that it can be iterated on)
TrieDBIterator::next and TrieDBKeyIterator::next were renamed and moved into TrieDBRawIterator with no changes to the logic (except making them use &TrieDB passed as a parameter instead of the one stored in self, and cleaning them up slightly)
TrieDBIterator and TrieDBKeyIterator now use TrieDBRawIterator under the hood
A bunch of #[inline]s were added to make recreating a &TrieDB on each iteration into a zero-cost operation
All of the relevant crate versions were bumped

This makes further refactoring of storage iterators in substrate possible.

cheme

About the PR, removing &db from Raw/Node iterator looks like a good change to me (no need for the suspended, and surely calling it raw makes it clear one can do something wrong with this iterator).

What are the plans for the substrate optim, I remember planning to use the suspended struct in host function to use it again if the start key used in a following next_key call is the same as the previously return key (but don't find back my branch)?

Or is there an idea to use a host api that keep state of iterator (I got in https://github.com/cheme/substrate/tree/transient a host api that keeps a hasher state in memory could be runing similarily)?

Regarding possible substrate optimization, there is one I would find interesting, that would be to switch sp-state-machine backend trait to use &mut on all accesses, thus removing the need for inner mutability (for trie cache and recorder). I started in this direction in paritytech/substrate#13006, but only covered Externalities trait (switching trie backend requires other changes, at least one existing clone to fetch runtime, but it can certainly be avoided). But I think this would not touch the trie crate (recorder and cache being already &mut).

cheme · 2023-01-30T12:50:09Z

trie-db/src/iterator.rs

+		&mut self,
+		db: &TrieDB<L>,
+	) -> Option<
+		Result<(NibbleVec, Option<TrieHash<L>>, Rc<OwnedNode<DBValue>>), TrieHash<L>, CError<L>>,


Not related to this pr (just I was a bit surprise when reading the existing code), but really looks like this function should return Result<(&NibbleVec, Option<TrieHash>, Rc<OwnedNode>), TrieHash, CError>,`, quite some key copy involved here. Probably could also get rid of the Rc.

cheme · 2023-01-30T12:53:50Z

trie-db/src/iterator.rs

+				Ok((mut prefix, _, node)) => {
+					let maybe_value = match node.node() {
+						Node::Leaf(partial, value) => {
+							prefix.append_partial(partial.right());


Not related to this PR either, but the key building should be done on the raw iter key_nibbles directly.

cheme · 2023-01-30T13:03:28Z

trie-db/src/triedb.rs

 	}
 }

 impl<'a, 'cache, L: TrieLayout> TrieIterator<L> for TrieDBIterator<'a, 'cache, L> {
 	/// Position the iterator on the first element with key >= `key`
 	fn seek(&mut self, key: &[u8]) -> Result<(), TrieHash<L>, CError<L>> {
-		TrieIterator::seek(&mut self.inner, key)
+		self.raw_iter.seek_prefix(self.db, key).map(|_| ())


I am wondering if the function for raw_iter seek_prefix should be rename (I am maybe the one that give this ~ naming in the first place ). Here it just does a seek key which is what we need, but I had to check the code to be sure it is not the variant that seek a key and then remove nodes from the stack so it limits remaining iteration to the prefix.

cheme · 2023-01-30T13:38:11Z

TrieDBIterator and TrieDBKeyIterator now use TrieDBRawIterator under the hood

wasn't it the case already?

cheme · 2023-01-30T13:39:58Z

This makes further refactoring of storage iterators in substrate possible.

Does the suspended approach not work (I remember putting something in place but probably drop it at some point, not sure why)?

koute · 2023-01-30T13:53:56Z

What are the plans for the substrate optim, I remember planning to use the suspended struct in host function to use it again if the start key used in a following next_key call is the same as the previously return key (but don't find back my branch)?
[...]
Does the suspended approach not work (I remember putting something in place but probably drop it at some point, not sure why)?

For now I'm mostly just simplifying the way we do iteration; e.g. the Backend trait in state-machine has something like ~8 methods which do iteration, some using an Fn callback and some just collecting into a Vec, and I'm essentially replacing them all with a single method that just returns a raw iterator (plus two extra methods which return dedicated iterators for keys and pairs which are themselves just tiny wrappers around the raw iterator).

So it is essentially the suspend approach, it just needs these changes to make it fully possible (not everything was originally exported the first time I tried it IIRC) and simpler.

In general for now my main objective is to just simplify and remove dangerous methods (those which return a Vec, which make our RPC nodes explode), but I also want to eventually look into speeding up storage iteration/accesses altogether.

Regarding possible substrate optimization, there is one I would find interesting, that would be to switch sp-state-machine backend trait to use &mut on all accesses, thus removing the need for inner mutability (for trie cache and recorder).

Hmm.... is this actually going to affect the performance in a substantial way?

TrieDBIterator and TrieDBKeyIterator now use TrieDBRawIterator under the hood

wasn't it the case already?

Previously they had their dedicated next methods; now those were moved to the raw iterator, so now they're truly just a thin wrappers with no logic of their own.

cheme · 2023-01-30T14:19:58Z

Hmm.... is this actually going to affect the performance in a substantial way?

Not sure about perf, actually recorder and cache are already behind a refcell in the trie crate.
So probably the recorder is already accessed without lock, not sure about the local cache (may still be a useless rwlock there for the local caching, really not sure).

Would still make sense to me to just plain &mut ptr for read access operation and get rid of as much inner mutability as possible, but that is a different question.

koute · 2023-01-30T15:31:11Z

@cheme Okay, I resurrected the fuzz tests (they didn't compile due to too old version of a dependency in Cargo.toml), renamed the seek_prefix to seek, made next_raw_item return references, and I've simplified the iterator methods a little more (I've reduced the indentation levels, removed unnecessary expects, removed the extra internal enum in next_raw_item).

The logic's unchanged and I didn't touch the extra clones. You're right that we should be able to refactor it so that it appends to the key_nibbles directly, but from what I can see that'd be a more involved change so for now I left it as-is for a future PR.

cheme · 2023-01-31T12:52:35Z

yes the fuzzer is quite often a bit behind, thanks.

You're right that we should be able to refactor it so that it appends to the key_nibbles directly, but from what I can see that'd be a more involved change so for now I left it as-is for a future PR

I did check a bit, we probably want to return the key_nibbles after the partial (append the partial on descend), which seems fine to do, but at the same time the change will require to modify trie_codec.rs which can be a bit too many changes.

davxy

Not a trie guru. But codewise looks good

trie-db/src/triedb.rs

trie-db/src/iterator.rs

cheme · 2023-02-03T09:33:04Z

Missed you by a few second, but trie-bench changelog can also be updated (not really important tbh), otherwhise latest changes still looks good.

koute · 2023-02-03T09:39:29Z

but trie-bench changelog can also be updated

Ah, sorry, I didn't realize that one's also released on crates.io. I'll update it.

koute added 3 commits January 30, 2023 18:42

Iterator refactoring

5a4eadc

Bump versions

5377005

Update changelog

a815264

koute requested review from andresilva, kpp, melekes, cheme, michalkucharczyk, skunert, bkchr, dmitry-markin, davxy and altonen January 30, 2023 12:16

cheme approved these changes Jan 30, 2023

View reviewed changes

koute added 10 commits January 31, 2023 00:20

Simplify fetch_value

136cd55

Return references from TrieDBRawIterator::next_raw_item

47a8f1a

Rename TrieDBRawIterator::seek_prefix to TrieDBRawIterator::seek

4948cac

Further simplify next_item and next_key

21b6107

Further simplify next_raw_item: remove IterStep::YieldNode

6e9e6d3

Further simplify next_raw_item: remove IterStep::PopTrial

468c908

Further simplify next_raw_item: get rid of the enum and expects

624ed14

Bump memory-db in fuzz tests to make them compile again

94f785e

Simplify next_item and next_key

f7098b8

Update changelog

4d8bda6

koute mentioned this pull request Jan 31, 2023

Rework storage iterators paritytech/substrate#13284

Merged

davxy approved these changes Feb 2, 2023

View reviewed changes

trie-db/src/triedb.rs Outdated Show resolved Hide resolved

trie-db/src/triedb.rs Outdated Show resolved Hide resolved

trie-db/src/iterator.rs Outdated Show resolved Hide resolved

koute added 6 commits February 3, 2023 15:50

Merge impls

5e00072

Merge imports for feature = "std"

c2f39c5

Remove unused field in TrieDB

c78324a

Gate clone_append_optional_slice_and_nibble on std

2f518bd

Remove hash_count from the test

ceddac9

Update the changelog: prepare for a new release of trie-db

68f78a5

koute merged commit 4e571cc into paritytech:master Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterator refactoring #181

Iterator refactoring #181

koute commented Jan 30, 2023 •

edited

Loading

cheme left a comment

cheme Jan 30, 2023

cheme Jan 30, 2023

cheme Jan 30, 2023

cheme commented Jan 30, 2023

cheme commented Jan 30, 2023

koute commented Jan 30, 2023

cheme commented Jan 30, 2023

koute commented Jan 30, 2023

cheme commented Jan 31, 2023

davxy left a comment

cheme commented Feb 3, 2023

koute commented Feb 3, 2023 •

edited

Loading

Iterator refactoring #181

Iterator refactoring #181

Conversation

koute commented Jan 30, 2023 • edited Loading

cheme left a comment

Choose a reason for hiding this comment

cheme Jan 30, 2023

Choose a reason for hiding this comment

cheme Jan 30, 2023

Choose a reason for hiding this comment

cheme Jan 30, 2023

Choose a reason for hiding this comment

cheme commented Jan 30, 2023

cheme commented Jan 30, 2023

koute commented Jan 30, 2023

cheme commented Jan 30, 2023

koute commented Jan 30, 2023

cheme commented Jan 31, 2023

davxy left a comment

Choose a reason for hiding this comment

cheme commented Feb 3, 2023

koute commented Feb 3, 2023 • edited Loading

koute commented Jan 30, 2023 •

edited

Loading

koute commented Feb 3, 2023 •

edited

Loading