-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reforming dynamic lists in SSZ #1160
Comments
I would like to make an alternative proposal: Replace list by a type
The reason why I would propose this is that I think it is preferable that the type is aware of its current dynamic length. The code for this is much easier to write, reason about, and check statically and dynamically. It is also consistent with the other changes we are making to SSZ (like unions and bitlists) which go into the direction of externalising typing work into SSZ and making it work "naturally" as you would expect from a modern programming language. |
I can see how this can be valuable. But there is one question: do we want length mixed in as an item at the top, or as an element of the list? Huffman theory would say that mixing length in at the top is only optimal if half of all queries to the list are queries of its length, but that seems implausibly high. It would also say that reserving the first element of the array to represent length is only optimal if a length query is only as common as a query to a randomly selected element, which seems implausibly low. Which of these two extremes is less bad? The former is better from the PoV of not changing things as much as possible, the latter is better from the PoV of simplicity. |
I'm not sure, but if we assume that Merkle proofs typically access one element, and every proof hast to come with a proof of the length of the list, then actually half the accesses are to the length. |
Ah, I suppose if we want our list get proofs to return what |
See if we can describe the new extended SSZ better/more minimally (or at least my idea of it, all just an idea): Any collection is defined as
Bitfields require the There are 4 different types, specialized in
Offsets always count the bytes up to (but not including) the element.
**: Optimized away within merkleization in real-world implementations.
|
I'm very confused by your post... at least it does not feel like a simplification to me ;)
Mixing in the length is always on the right |
No, but I posted this serialization earlier in the chat, as it is consistent with offsets, and simple. But improvements are very welcome. (also see note on ordering of indices)
See edits, that is what I did previously. But that's not consistent with the current verify-bitfield behavior, which is like a little endian int. Also, a big little endian integer sounds more consistent too me. But considered big-endian for formatting/reading reasons (prefer it too, as there's no "gap" in the bits when you align a 9 bit integer in bytes)
Whoops, wrote it a bit quick, but you get the idea |
Oh yes, forgot it's little endian. Then it should be on the right. |
I may add information here to help others who just arrive. What is an SSZ Partial?An SSZ partial is an object that can stand in for an SSZ object in any function, but which only contains some of the elements in the SSZ object. It also contains Merkle proofs that prove that the values included in the SSZ partial actually are the values from the original SSZ object; this can be verified by computing the Merkle root of the SSZ partial and verifying that it matches the root of the original SSZ object. Source: https://github.com/ethereum/research/tree/master/ssz_research/partials |
Addressed in #1180 |
Background reading: #1115
From my experience implementing SSZ partials (https://github.com/ethereum/eth2.0-specs/blob/ssz-impl-rework/test_libs/pyspec/eth2spec/utils/ssz/ssz_partials.py), I've come to the conclusion that the fact that paths do not correspond to specific generalized indices (as generalized indices depend on the depth of a tree which for dynamic-sized lists is dynamic) leads to a large amount of complexity in the SSZ partials implementation.
append
andpop
and the complexities around rebalancing lists around powers of 2 are particularly problematic.My proposed solution to this is as follows. From the perspective of SSZ hashing, we remove lists, so the only data types are (i) base types, (ii) fixed-size vectors, (iii) containers (we could also add unions by hashing a
Union[A, B... Z]
identically toContainer(a=Vector[A, 1], b=Vector[B, 1] ... z=Vector[Z, 1])
).All existing lists in the spec (validator list in the state, transaction lists in blocks...) get converted to fixed-size lists of some size (for lists in blocks, we know the maximums already; for the validator set we can pick a very generous limit, eg. 2**40). Hence, from the perspective of hashing, all data types become constant-sized types, and so the generalized index that a given path (eg.
state -> state.validator_registry[123514].pubkey
) corresponds to is always the same value.However, in the SSZ vector data type, we now add a flag, "serialization type". To start off, the serialization types are FULL and DYNAMIC; a third one to be added later is SPARSE. A FULL vector is serialized like vectors are today; a DYNAMIC vector is serialized by serializing the items in the vector up until the highest nonzero item the same way that a list is serialized today. This makes DYNAMIC vectors essentially equivalent to current lists in terms of serialization, except that a maximum length must be specified. A SPARSE vector would be serialized by prepending the list of items with a (item, position) table of what all of the nonzero items are. Current vectors would become vectors with a FULL serialization type, current lists would become vectors with a DYNAMIC or possibly SPARSE serialization type (if DYNAMIC is used, then it may make sense to add a validation rule in those cases that verifies that there is no non-empty object that appears after any empty object in the list).
The text was updated successfully, but these errors were encountered: