y-binary structure analysis #392

darkskygit · 2023-04-18T17:56:10Z

darkskygit
Apr 18, 2023
Maintainer

Currently, I have implemented the parser of ybinary(update) in #389.

There are several known problems, which I will write here for future improvements to the format:

format without analytic expressions

In mathematics, a solution that can be expressed by a analytic expression is called a analytic solution, and I use it to describe a file format: if a file format can read any data in a file container with a fixed logic, I call it has analytic expression. MKV is a typical format that has a analytic expression, you can read any data stored in MKV with a fixed read logic.

ybinary, on the other hand, has not analytic expression. If you want to read ybinary, you first need to read multiple variable-length integers, which describe how many times you need to loop through the item, and then there is a lot of variable-length integer read/write logic in the process of reading the item, which means that if any one byte is corrupted in the ybinary, the whole ybinary will no longer be readable, which greatly increases the risk of store ybinary for a long time.
can only be read in full

ybinary is stateful: ybinary does not store the complete crdt item state, some state depends on the state in the previously loaded crdt item, which means if you want to read or write a crdt item in ybinary, you have to read the whole ybinary into memory.
No file or value level self-checking

As mentioned earlier, ybinary has no analytic expression, which means that any byte corruption in binary will result in the corruption of the whole binary. Since binary does not contain checksum on the value, we cannot know whether ybinary is corrupted and from where until we read it.
Complexity of decoding

ybinary uses a lot of variable-length integer encoding to save the values, which consumes a lot of cpu resources when coding and decoding.

in contrast, fixed-length binary numbers can be decoded at high speed on any platform (including js runtime, using dataview)

the variable-length integer encoding used by ybinary contains conditional judgments in the encoding and decoding logic, which makes it several times slower than fixed-length binary numbers even if it is optimized on the native platform.

P.S. We have previously discussed some of the design of the parser in this issue: #383

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

y-binary structure analysis #392

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

y-binary structure analysis #392

darkskygit Apr 18, 2023 Maintainer

Replies: 0 comments

darkskygit
Apr 18, 2023
Maintainer