-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another possibility for hash_tree_root of dynamic lists #1115
Comments
Some comments on the Rationale part, with possible suggestions:
Also, I considered a similar idea during Edcon in sydney, but not directly applied to merkle trees; i.e. use a bitstring to identify which elements in a structure are present (effectively the same as a generalized index, if you only consider 1 element), but for multiple elements. If ssz-encoded messages are prefixed with such a thing, you effectively implement the ability to null/union everything you want (reason for bringing up the idea, nulls/unions are still not implemented :( ). But for large lists it is not practical for the same reasons as a linear merkle structure is not. Edit TLDR: misunderstood the structure. It's not linear, evert level keeps expanding more, to allow for bigger additions. But high level leafs still have some cost benefits over lower level leafs. And here's a quick helper function for calculating the next power of 2 for merkleization, for in future pseudo-code/experiments. I updated the merkle tree code in the SSZ typing draft PR, but that is still stuck in a limbo status after I changed back to testing. https://github.com/ethereum/eth2.0-specs/blob/08faa86d706c31cd9690cb483d0df32f66696396/test_libs/pyspec/eth2spec/utils/merkle_minimal.py |
To be clear, there is no linear length blowup like this; the increase in length is only one single hash (this is because the number of leaves at each level of the tree still goes up by 2 per level). If Merkle proofs were length 8 before, on average they'll be ~length 9.
Ooh, fun! Probably not the best thing to put in code that's meant for people to read ( |
I think I found a much more elegant way to write next_power_of_2 in python in when working on the phase 1 execution: def next_power_of_2(v: int) -> int:
return 1 << v.bit_length() |
@dankrad Nice, knew about a few other scripting languages that tracked bitlength for large integers, but forgot about it here. Here's some testing + benching code: https://gist.github.com/protolambda/9ad3f46665cb9bdfdb38599cbc626c30 Bitlength outperforms mine by 2x, and the old one by 15x Now back to merkle-tree discussion :) |
Another more stylistic (but experimental and somewhat broken) approach: Act like it is a fixed-length list of max. size (e.g. 32), and do a normal binary-tree merkle-root. Very similar to what we currently have with deposit-tree. However, we wrap the hash-function, to make smaller lists less expensive. This new hash-function definition would be: def combine_chunks(a, b, level, index_at_level, length_at_level): # level of the combination node, index of the combination node, length of the level
# check if b is out of bounds
if index_at_level + 1 == length_at_level:
assert b == ZERO_CHUNK
return a # just propagate a if b is zero.
elif level < MAX_LEVELS:
return hash(a + b) # hash of concatenated chunks, like currently
else:
return hash(a + b + to_chunk(index_at_level))
# Note: last two returns could be combined, see comments on consistency. note: verification is not exactly the same, one needs to mix in the index at leaf-level based on type Now, we can ignore the empty part of the tree, and just supply a bunch of zeroes (not higher order hashes of zero) for the branch part that completes it to a R
/ \
/ \ 3
/ \ 0
... 0
/ \
/ \ 0
/\ /\
A B C 0 Note that we are still mixing in the length, to differentiate a list ending in zeroes with the same list without (some of) these zeroes. And the zeroes in the merkle-proof can be compressed where necessary. And the hash with zero is essentially free, so that's great. (side-effect: hashing default lists filled with 0 will be super fast 🎉) Now leafs have an easily identifiable place, and have "the same" proof-length, and the same general index, at all times. Open question: if one would agree to standardize the compression of the zero chunks in the proof, we essentially are using the length data to read the proof (very much the same as we had before the unbalanced tree idea, now just supporting static indices without extra hashing). I would put the TLDR:
SHA-256So, standard merkle input of 2 chunks is: Light clients / Verification flowNow, we can have a light client ask for a piece of data, with a proof. "hey full node, give me data at general index 1234 please". The light client then verifies the proof like a normal merkle tree, but additionally mixes in the expected index in the dynamic list during verification, whenever it passes a node that is part of the leaf-level of a dynamic list. Pro: Stable generalized indices, simple proof construction, easy proof compression, and length-independent verification. Con: verification is more complex. But if we ask for data, we are not asking for it because of random reasons, we know the location + typing we are asking for, so we can deal with the verification just fine. Consistent complexity, no typingWe could make the thing more consistent with unnecessary mix-ins: mix in the general index at every non-out-of-bounds node combination. Now, a verifier doesn't have to care about the type anymore, it can just repeatedly mix in the index. For free, if this index is within 447 bits. Consistency with normal verificationInstead of mixing in a third argument at leaf combination level, we could also do an extra hash, and mix in the index of the pair at that level. This makes it valid to do the current merkle-proof verification. At the cost of 1 hash per pair of leafs. |
Ok, let's summarize all the variants of merkleization, for dynamic-length lists: Status Quo
Unbalanced Merkle Tree
Virtually balanced Merkle TreeNote: wrapped special-case hash-function introduces many edge-cases, design variants need to be adjusted still For all sub-variants applies:
Leaf-pair index mix-in
Consistent index mix-in
1-hash extra
Sparse merkle-tree alike
SourcesGeneralized indices: eth2 light client specs > generalized index 0-based generalized indices: eth2 specs Issue 1008 SHA-256 pseudocode, performance related: on SHA-2-family wikipedia Eth2 exec-spec merkle code: Credits to Vitalik for pushing so many different ideas forward. Hope my alternative + this summary helps to get to a balanced, easy, small and constant index supporting merkle tree design. |
What do you mean by "the index" here? The index in the list? I would not call it "standard verification" if the generalized index alone isn't enough to fully define the verification path + sequence of operations required to verify a branch, so if there's extra data being added in at some levels of the tree that breaks the invariant... |
I think it can be both. It's really just more an idea than a verified/tested approach. We can start breaking + fixing things when the basic idea is there.
Hmm, yes, hash function is different still. But it's more "standard" than adding a additional data to the hash input of combining leafs (like the other variants). Instead It's like a normal mix-in. |
Closing in favour of issue #1160 which has now achieved rough consensus :) |
Epistemic status: very uncertain if this is a good idea but IMO worth talking about the issues.
Status quo
In English: make a Merkle tree of the data, padding it with zeroes to make the length an exact power of two. Then take the hash of the root of that tree together with the length of the list.
In code:
Example with three-item list:
Alternative option
In English: make an imbalanced tree where the top left node is the length, then the right->left node points to the first 2 elements, then the right->right->left node points to the next 4 elements, and so forth, so at any depth N there's a root of a tree of 2**(N-1) of the elements.
In code:
Example with three-item list:
Rationale
Currently there is a perfect correspondence between SSZ path (eg.
obj -> obj.member[17].other_member[3]
) and tree path (the steps of where you descend left or right from the root to get to that specific value) in all cases except for one: that of dynamic lists. This increases complexity for light client protocol implementers because:A proof of a single value in an SSZ tree may require multiple logical branches rather than just one to determine what the length is so that the verifier can determine the depth to prove a value (technically the length will always be along the Merkle branch that proves any item, but it's still extra complexity to extract it)
pop
orappend
operations to lists, if they cross the power-of-two boundary, require a rebalancing of the tree that changes the paths to every item in an entire subtree.This proposal makes it so that the correspondence between SSZ path and tree path is always one-to-one, and any operation is a single Merkle branch update (or two Merkle branch updates to update the length for an
append
orpop
).Weaknesses:
The text was updated successfully, but these errors were encountered: