y-binary structure analysis #392
darkskygit
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently, I have implemented the parser of ybinary(update) in #389.
There are several known problems, which I will write here for future improvements to the format:
format without analytic expressions
In mathematics, a solution that can be expressed by a analytic expression is called a analytic solution, and I use it to describe a file format: if a file format can read any data in a file container with a fixed logic, I call it has analytic expression. MKV is a typical format that has a analytic expression, you can read any data stored in MKV with a fixed read logic.
ybinary, on the other hand, has not analytic expression. If you want to read ybinary, you first need to read multiple variable-length integers, which describe how many times you need to loop through the item, and then there is a lot of variable-length integer read/write logic in the process of reading the item, which means that if any one byte is corrupted in the ybinary, the whole ybinary will no longer be readable, which greatly increases the risk of store ybinary for a long time.
can only be read in full
ybinary is stateful: ybinary does not store the complete crdt item state, some state depends on the state in the previously loaded crdt item, which means if you want to read or write a crdt item in ybinary, you have to read the whole ybinary into memory.
No file or value level self-checking
As mentioned earlier, ybinary has no analytic expression, which means that any byte corruption in binary will result in the corruption of the whole binary. Since binary does not contain checksum on the value, we cannot know whether ybinary is corrupted and from where until we read it.
Complexity of decoding
ybinary uses a lot of variable-length integer encoding to save the values, which consumes a lot of cpu resources when coding and decoding.
in contrast, fixed-length binary numbers can be decoded at high speed on any platform (including js runtime, using dataview)
the variable-length integer encoding used by ybinary contains conditional judgments in the encoding and decoding logic, which makes it several times slower than fixed-length binary numbers even if it is optimized on the native platform.
P.S. We have previously discussed some of the design of the parser in this issue: #383
Beta Was this translation helpful? Give feedback.
All reactions