Front-End engineering is an art as well as a science. JSON is fast, sure! But it is very verbose and increases the size of the relevant message.
So, we are going to implement our own format:
"Smaller Object Notation Offering Frequent Advantages" aka SONOFA
VInt(n) -> Variable Integer of n Bits.
- If n were 5, and we ready 5 bits in order from a stream, it would look like: YABCD
- Y is called the extension bit. If Y == 0, the remaining bits ABCD form the number.
- But if Y == 1, read another byte: YEFGHIJK. If this new Y is zero, we stop. if not, we continue step 3. Let's assume it stopped, we now have a 11 or more bit number: ABCD EFGHIJK
- If we know that the type is signed, just sign extend the number to get the final output. (fill the left side with A)
Read a byte ABCDEFGH. The first three bits ABC tell us what type value is following after this byte.
String - 000[YXXXX]
Number - 001[0/1][YXXX] 1 Signifies FP + Read VInt8 for Exponent
Hash/Ob - 010[YXXXX]
Array - 011[YXXXX]
Unused - 100
Unused - 101
Unused - 110
Custom - 111XXXXX (0..29) - User Defined Types
Null/F - 11111110
T - 11111111
if ABC is 000, then we have a String.
String - 000[YXXXX]
YXXXX is VInt(5) but always unsigned and the length k of the String in Characters that follows.
If the Length is k characters, we read k characters as UTF-8. Read http://en.wikipedia.org/wiki/UTF-8 on how to decode a UTF-8 stream.
We also maintain a global String dictionary during encode/decode. As soon as we find a String, we place it into the dictionary, and give it an index, starting from 0.
So if we read "foo", "bar", "choco", "foo" in order, we refer to their indexes as 0, 1, 2, 0.
If the Length k is zero, we read VInt(8) and read an integer x, and the decoded string will be dictionary[x].
If ABC is 001, then we have a Number.
If D is 0, we are looking at a integral (short/int/long/BigInteger) type. If D is 1, we are looking at a floating point (float/double/BigDecimal) type.
The remaining bits YXXX are VInt(4) which specify length of the type. Then we read a signed number of the length above. In case we read a floating point value, we read a VInt(8) signed value of exponent.
If ABC is 010, then we have a Object/Hash.
YXXXX is VInt(5) but unsigned, the number of elements of this Object to be read = k.
It's followed by reading k Key/Value pairs.
We read k, then we read v.
k can be either an Integer (refering to an old String in the global Dictionary) or a String.
v is a value, could be any of the above.
If ABC is 011, then we have an Array.
YXXXX is VInt(5) but unsigned, the number of elements in this array to be read = l.
It's followed by reading l Values.
If ABCDEFGH were to equal 11111110, we read a NULL type. During serialization, it is perfectly okay to omit NULL values - but sometimes it is required in code. So we fit that requirement here.
If it is impossible to deserialize a field, skip that field.
But if the field were a Boolean value, it would assume the value false.
The field were an Integer or Numeric, it would assume the value zero.
If it were a pointer, it would assume the value NULL.
If ABCDEFGH were 11111111, it is a True (Boolean) value.
If the field were a Boolean value, it would assume the value true.
The field were an Integer or Numeric, it would assume the value 1.
If ABC were 111, and DEFGH was between 0 and 29 inclusive, then we use it to serialize/deserialize a user type. This is left to the user's option, and we leave details for this.
If ABC were 100, 101 or 110 - they are treated as reserved extensions to SONOFA that will be available in later version.