-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider possible improvements to the SSZ spec before phase0 is launched #1916
Comments
Hey, thanks for opening this issue, this is very constructive 👍 Some of these we have already discussed to some extent before, but I will share my thoughts in depth here so others can review. Reduce the size of every variable-size container by 4 bytes.This is more of debate to choose between consistency and optimization, there are definitely arguments for both. I can think of:
And the most common examples, Attestation And yes, I hope you are right this is a gazillion byte difference, but that's more of a thing to do with widespread usage. The variance in snappy ompression of different values will likely already be higher.
IMHO it doesn't break the property, And bitfields, as well as TLDR: I prefer consistency for now, unless others make a compelling new argument why we need this. Also, thanks for tracking and documenting this optimization idea 👍 Null-value optimisationRegardless of the whole null debate, I think it's still a good idea to support explicitly nullable types. And while we are it it with modifications related to And once we have a null type for serialization, we should also look at hash-tree-root. Declaring it as an always fully zero
I think this condition is sufficient, This definitely deserves a draft / proposal somewhere. Maybe in the new SSZ repo, but I wouldn't mind a PR here, as the new SSZ repo has received mixed support signals. And right now Resolve a contradiction in the SSZ List limit type
The idea is that like in Go and some other languages, we can defer saying anything about it until it's used anywhere. But agree it's not very clean, and offsets are a problem, while merkleization fits fine (256 bit integer length mix-ins, although not fully used in practice).
You are right, and that should be fixed. With the current validator size the practical limit would be:
Yes, one nice side-effect here is that the tree is big enough to later fit other types on it, without breaking the existing merkleization. It's still very far away, but it offers some flexibility. We can ignore it for now though, and should focus on the offsets part of the problem. One theoretical option for offsets is that we group elements to Meanwhile, the sparse-merkle-tree issue is still open, and collecting dust. If we had a place for a more concrete draft, we may make more progress there. For phase 0 it is no hard requirement though. |
During the development of the SSZ implementation in Nim, I've paid great attention to the size of the generated code. Since Nim has the ability to compute a lot of metadata about the serialized type at compile time, it does allow me to produce very short and specialized code for selective traversal (reaching a particular field of interest) and full de-serialization of specific types. I believe our implementation will be quite popular with smart contract authors trying to produce the shortest possible WebAssembly code for extracting data from SSZ records or for verifying merkle proofs. With the 4-byte reduction in place, the code that our implementation can generate doesn't become longer, but rather it becomes shorter. The reason for this is that the optimisation only modifies some offset positions that are computed at compile-time and it does remove an unnecessary verification step which results in slight code size reduction. |
Awesome, that sounds very promising for smart-contracts. I still wonder about the general case without compile time optimization here though. Where the contract uses an ssz library or some other type of abstraction, to read data based on more dynamic traversal. And I would like containers and vectors to be consistent: if we're removing it from a container, then For now let's minimize changes that affect testnets, but maybe later we can introduce this optimization if it gets welcomed by other implementers. |
Well, Removing the first offset from a variable-size list is not possible in the same way, because the offset carries information (it determines the length of the list). |
Yes I understand. What I am thinking of is that for vectors, having the first (strictly speaking unnecessary) offset makes sense for other purposes. Avoiding an edge-case before the pointer math involved in the lookup, and consistency with lists. We could change vector to stay consistent with containers, but I'm unsure about that trade-off. So I'm undecided, but looking for stability. Let's please avoid affecting testnets for now, and ask others for feedback in the meantime. The other two points are better targets for improvements right now 👍 |
Add specification for EIP-6475 support for SSZ. Remerkleable impl: https://eips.ethereum.org/assets/eip-6475/tests.py We could possibly change all planned usage of `Union` with `Optional`, and introduce the conceptually more complex `Union` once needed. `Optional` serialization can be more optimized than `Union`. Discussion: https://ethereum-magicians.org/t/eip-6475-ssz-optional/12891 This PR builds on prior work from: - @zah at ethereum#1916
One of the design goals of SSZ is that it should make it easier for other blockchains to work with merkle proofs referencing Eth2 consensus objects. Once phase0 is launched, we can expect various official SSZ records to start appearing in third party databases. This would significantly increase the difficulty of coordinating upgrades to the SSZ spec (due to the limited forward compatibility provisions in SSZ, a lot of applications may get broken in the process). Due to this, I think we should consider introducing some final refinements and optimisations to the SSZ spec before phase0 is launched:
1) Reduce the size of every variable-size container by 4 bytes.
Every variable-size container (i.e. record with fields) consists of a fixed-size section storing the offsets of the variable-size fields.
The offset of the first such field currently has only one valid value - it must be equal to the length of the fixed-size section. The implementations are expected to check for this, because otherwise there might be some unused bytes in the SSZ representation which is considered an invalid encoding.
The motivation for not allowing unused bytes is that this would break the property
deserialize(serialize(x)) == x
which is quite useful for fuzzing. For completeness, I would mention that if unused bytes were allowed, a very limited form of forward-compatibility will be present - it would be possible to add a new field at the end of a record without breaking older readers. Since SSZ upgrades require coordination and all long-term storage applications should also feature an out-of-band version tag, this limited form of forward-compatibility was considered unnecessary.In other words, since the first offset has only one valid value that is completely derived from the type schema, the offset carries no information and can be omitted from the representation. The result will be that every variable-size container will be 4 bytes shorter. Admittedly, 4 bytes are not much, but if we consider the long expected life of the SSZ spec and great multitude of places where SSZ records might appear, some quick back-of-the-envelope calculation estimated the total cost savings in bandwidth and storage to amount to roughly 1 gazillion bytes :P
2) Null-value optimisation (a.k.a better support for pointer types and
Option[T]
)The SSZ spec defines union types that can discriminate between
null
and a possible value. Let's call such typesNullable
. Since theNullable
types have variable size, their length in bytes can be zero (just like how we encode zero-length lists with two consecutive offsets with the same value). I propose the addition of the following two special rules:null
value of aNullable
union is encoded as zero bytes.serialized_type_index
.Please note that in most programming languages, the unions described above can be mapped to frequently used types such as
Option[T]
or a pointer type. During the development of theblocks_by_range
protocol, an earlier version was suggesting that missing blocks should be indicated in the response as adefault(T)
encoding of theBeaconBlock
type. This was semantically equivalent to using anOption[T]
type, but it would have been considerably more inefficient. The design of the protocol was refined in later versions to not require this form of response, but I think that if one of the very first protocols was that close to using and benefiting from theOption[T]
type, we can expect more protocols to appear in the future that will benefit as well.3) Resolve a contradiction in the SSZ List limit type
The SSZ spec doesn't specify what is the type of the list size limit. This leads to something that can be described as a slight contradiction in the current specs:
The size limit of the validator registry is set to 1099511627776 (2^40). On the other hand, the maximum size in practice is limited in the encoding to the difference of two offset values. Since the offset values are encoded as
uint32
, the maximum size in practice cannot be larger than 2^32. Perhaps the intention for the size limit is that it should only affect the merkle hash computation, but the spec would do nice to clarify this.The text was updated successfully, but these errors were encountered: