-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
247 lines (177 loc) · 8.86 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<title>Introducing PBON</title>
<style>
div.figure {
font-size: x-large;
font-weight: bold;
}
pre {
background: #faf8f0;
padding: 0.5em 1em;
border: 1px solid #bebab0;
}
</style>
</head>
<body>
<h1>Introducing PBON</h1>
<p>PBON (Portable Binary Object Notation) is a light-weight binary data interchange format
inspired by <a href="http://json.org">JSON</a>.</p>
<p>PBON is built on the same structures as JSON, namely:</p>
<ul>
<li>A collection of key/value pairs (realized as an <em>object</em> in some programming languages)</li>
<li>An ordered list of values (realized as an <em>array</em> in some programming languages)</li>
</ul>
<p>The primary difference between JSON and PBON is that PBON is a more compact format that allows binary
data (including <a href="http://en.wikipedia.org/wiki/Unicode">Unicode</a> strings) to be represented
directly in the encoding without <a href="http://en.wikipedia.org/wiki/Escape_character">escaping</a>.
The trade-off is that PBON loses some human-readability to achieve this compactness.</p>
<p>PBON prefers the simplicity of JSON over implementing the most compact binary representation possible,
in order to maintain some human-readability and to make implementation straightforward.</p>
<p>PBON supports backward compatibility by enabling implementations to skip values with unrecognized keys,
for example, key/value pairs added as part of a newer version of an object.</p>
<p>In PBON, an <em>object</em> is an unordered set of key/value pairs. An object begins with a
left brace { and ends with a right brace }.</p>
<div class="figure"><div>object</div><img src="/images/object.png" alt="" /></div>
<p>A <em>key</em> is a positive integer encoded as a variable-length integer.</p>
<!--<p>A <em>variable-length integer</em> is a base-128 encoded integer in
<a href="http://en.wikipedia.org/wiki/Endianness">big-endian</a> octet order,
with a continuation bit set in the most significant (7th) bit for all but the last octet (see
<a href="http://en.wikipedia.org/wiki/Variable-length_quantity">variable-length quantity</a>
for an example). Negative values are stored as the one's-complement of the value with the
sign in the 2nd most significant (6th) binary digit of the first octet.</p>-->
<p>An <em>array</em> is an ordered collection of values. An array begins with a
left bracket [ and ends with a right bracket ].</p>
<div class="figure"><div>array</div><img src="/images/array.png" alt="" /></div>
<p>A <em>value</em> can be binary, a string, an integer, a float, an object, an array,
true (t) or false (f) or null (~).</p>
<div class="figure"><div>value</div><img src="/images/value.png" alt="" /></div>
<p>A <em>binary</em> value is a length-prefixed sequence of bytes.</p>
<p>A <em>length-prefix</em> is a non-negative integer value, encoded as variable-length integer.</p>
<p>A <em>string</em> is a length-prefixed sequence of <a href="http://www.utf-8.com/">UTF-8</a> encoded characters.</p>
<p>An <em>integer</em> is a length-prefixed base-256 encoded value in <a href="http://en.wikipedia.org/wiki/Endianness">big-endian</a>
order. Negative integer values are stored using the bitwise complement of the negative value with the most significant bit
(the sign bit) set to 1.</p>
<p>A <em>float</em> is a length-prefixed <a href="http://en.wikipedia.org/wiki/IEEE_754-2008">IEEE 754</a> encoded
floating point value in <a href="http://en.wikipedia.org/wiki/Endianness">big-endian</a> order.</p>
<h2>Variable-length integer</h2>
<p>A <em>variable-length integer</em> is stored as an initial byte followed by one or more trailing bytes.</p>
<p>The most-significant bit of each byte (C) is a continuation indicator, which is set to 1 for all but the last byte.</p>
<p>The second most-significant bit of the first byte (S) is reserved as the sign indicator, which is set to 1 for negative values.</p>
<!-- <p><em>Note that this makes it possible to determine the sign of a variable-length value by checking only the first byte of the value,
which can be helpful when reading from a stream.</em></p>-->
<p>The remaining bits (Vn) contain the value in big-endian byte order (most significant digits first).</p>
<div class="figure"><img src="/images/varint.png" alt="" /></div>
<p>Negative values are transformed using a bitwise complement operation before encoding.
This ensures that small negative numbers will also occupy a small amount of encoded space.</p>
<p>For example, here's the value 1. It's a single byte so the continuation bit (C) is not set,
and it's a positive value so the sign bit (S) is not set:</p>
<pre>
<span style="color:red;font-weight:bold;">00</span>00 0001
</pre>
<p>And here's the value 300:</p>
<pre>
1000 0010 0010 1100
</pre>
<p>To calculate this value, start with the binary encoding of 300 and split it into 7-bit groups
starting from the least-significant digit:</p>
<pre>
300 → 10 0101100
</pre>
<p>And finally, set the sign bit (S) to 0 to indicate a positive value and the continuation bit of each byte (C) except the last to 1:</p>
<pre>
→ <span style="color:red;font-weight:bold;">10</span>000010 <span style="color:red;font-weight:bold;">0</span>0101100
</pre>
<p>Here's the value -300:</p>
<pre>
1100 0010 0010 1011
</pre>
<p>To encode a negative number, start with the binary encoding of the value, take the bitwise complement,
and split it into 7-bit groups starting from the least-significant digit:</p>
<pre>
-300 → 1111 1110 1101 0100
~(-300) → 1 0010 1011
→ 10 0101011
</pre>
<p>And finally, set the sign bit (S) to 1 to indicate a negative value and the continuation bit of each byte (C) except the last to 1:</p>
<pre>
→ <span style="color:red;font-weight:bold;">11</span>000010 <span style="color:red;font-weight:bold;">0</span>0101011
</pre>
<h2>A complete example</h2>
<p>Let's start with the following simple message:</p>
<pre>
class Message1
{
string Name = "Foo";
}
</pre>
<p>This message would be encoded as the following bytes:</p>
<pre>
7B 01 03 46 6F 6F 7D
</pre>
<p>Let's break this down:</p>
<p>The first and last bytes (7B ... 7D) are the braces { } that surround every object.</p>
<p>The second byte (01) is the first member key (1) encoded as a variable-length integer.</p>
<p>The third byte (03) is the length of the string member value to follow (3 bytes), also encoded as a variable-length integer.</p>
<p>And finally, bytes 4-6 are the UTF-8 encoded bytes of the string (46 6F 6F).</p>
<p>Now let's add a second member with an integer value:</p>
<pre>
class Message2
{
string Name = "Foo";
int Score = 100;
}
</pre>
<p>This message could be encoded as the following bytes:</p>
<pre>
7B 01 03 46 6F 6F <span style="color:red;font-weight:bold;">02 <span style="color:blue;font-weight:bold;">01 64</span></span> 7D
</pre>
<p>The 7th byte (02) is the new member key (2) encoded as a variable-length integer.</p>
<p>The 8th byte (01) is the length of the integer member value to follow (1 byte), also encoded as a variable-length integer.</p>
<p>And finally, the 9th byte (64) is the base-256 encoded value 100.</p>
<p>Now let's say that we want an array of scores:</p>
<pre>
class Message3
{
string Name = "Foo";
int[] Scores = new int[] { 1, 2, 3 };
}
</pre>
<p>This message could be encoded as the following bytes:</p>
<pre>
7B 01 03 46 6F 6F <span style="color:red;font-weight:bold;">03 5B <span style="color:blue;font-weight:bold;">01 01 01 02 01 03</span> 5D</span> 7D
</pre>
<p>The 7th byte (03) is the new member key (3) encoded as a variable-length integer.</p>
<p><em>Note that we chose 3 as the member key so as not to conflict with the definition of
the previous Message2 class and possibly break backward compatibility.</em></p>
<p>The 9th and 16th bytes (5B ... 5D) are the brackets [ ] that surround every array.</p>
<p>Bytes 10-15 are the values in the array, each of which is the length of the encoded value
(encoded as a variable-length integer) followed by the integer value (encoded as base 256).</p>
<h2>ABNF</h2>
<pre>
object = "{" members "}" / "{}" / null
members = pair / pair members
pair = key value
key = varint ; > 0
array = "[" elements "]" / "[]" / null
elements = value / value elements
value = binary / string / integer / float / object / array / true / false / null
-----
binary = length octets
length = varint ; >= 0
string = length utf-8 ; length is utf-8 octet count
integer = length base-256 ; big endian
float = length ieee-754 ; big endian
true = "t"
false = "f"
null = "~"
varint = variable-length-integer ; Big-endian variable-length quantity, continuation bit in MSB of each octet, sign in bit 6 of first octet
</pre>
<h2>Implementations</h2>
<ul>
<li><a href="https://github.com/hotchai/serialize.net">Serialize.NET (C#)</a></li>
</ul>
</body>
</html>