Simple Binary Encoding

SBE is build around flyweight pattern, it is all about reuse object to reduce memory pressure on JVM.
Class generated by SbeTool (we call them Flyweight)
SBE flyweights behaves a bit like a stencil, you position it over a wall (byte array) at the right place (offset) and then you can paint (encode) very quickly!
SBE is using array as underlying storage and fields are packed in it.

What is basic Example in SBE

Basic Sample

Order of encoding

Within the fields section, fields are encoded in the order specified by schema.
Then repeating groups, again in the order specified in the schema.
Finally variable length fields, in the order specified by the schema.

Order of encoding - API

Developer encode and decode in the order specified by the schema. Failing to do so could at best reduce performance, at worst return invalid data during decoding or corrupt data in the buffer during encoding.
API might let encode and decode out of order. But there are plans to improve that and throw errors if detected an invalid sequence
That constraint helps simplifying the flyweight design and make it more hardware friendly.

Encoder / Decoder

Encoder/Decoder does no allocation or very less(i.e in case of String).
SBE recommends to use direct/offheap buffer to take GC completely out of picture.
Buffer can be allocated at thread level and can be used for decoding and encoding of message.
Decoder has to know very little metadata about message(i.e offset and size).

What are 5 fields related to MessageHeaderEncoder/Decoder

static ByteOrder BYTE_ORDER
static int ENCODED_LENGTH
static int SCHEMA_ID
static int SCHEMA_VERSION
static String SEMANTIC_VERSION

SBE Flyweights vs DTO

SBE does not work with DTO: the flyweight writes directly to the underlying buffer during encoding and reads directly from the buffer during decoding.
When we write orderId = 72 in the order flyweight, what it does is encode 72 in its byte representation (which depends of the orderId primitive type and of the endianess) and store it directly in the underlying buffer.
Flyweights can be reused indefinitely, to encode and decode different messages. But it is not threadsafe
When you decode a field of one of the primitive types, nothing is allocated, it’s only a stack operation.

How to decode array?

when you decode a field of type array you do not get a new array allocated and given back you:
Provide your own buffer (that you can reuse on your side) and the flyweight will copy data to your buffer.
Again, this allows your system to not allocate.
1. Why limiting or preventing allocation? To limit or suppress GCs, which will slow down your encoding and decoding operations

Google Protocol Buffer vs SBE

SBE is significantly faster, but there is a more subtle aspect:
GPB allocates so it will trigger GCs and slow down the overall system. This is another big advantage for SBE.

Fast array access

Reading integers of different sizes from a byte array in C++ is simple: apply an offset to your byte pointer, cast the pointer to the type you need and dereference, job done.
To work around those (performance) limitations Java uses the Unsafe class, which basically perform pointer operations under the hood and gets inlined (resulting in the same assembly code than C++)

How to access different size of integers in C++

#include <iostream>
#include <cstdint>

int main() {
    // Example byte array (could be filled with data from a file, network, etc.)
    uint8_t byteArray[] = {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08};
    
    // Pointer to the beginning of the array
    uint8_t* ptr = byteArray;
    
    // Read an 8-bit integer (1 byte)
    uint8_t value8 = *reinterpret_cast<uint8_t*>(ptr);
    std::cout << "8-bit value: " << static_cast<uint32_t>(value8) << std::endl;
    
    // Advance the pointer by 1 byte to read the next value
    ptr += sizeof(uint8_t);
    
    // Read a 16-bit integer (2 bytes)
    uint16_t value16 = *reinterpret_cast<uint16_t*>(ptr);
    std::cout << "16-bit value: " << value16 << std::endl;
    
    // Advance the pointer by 2 bytes to read the next value
    ptr += sizeof(uint16_t);
    
    // Read a 32-bit integer (4 bytes)
    uint32_t value32 = *reinterpret_cast<uint32_t*>(ptr);
    std::cout << "32-bit value: " << value32 << std::endl;

    return 0;
}

Endianess

Endianess specifies the order bytes are stored for a given primitive type. Most hardware use little endian and network historically use big endian.
In little-endian format, the least significant byte (LSB) is stored at the smallest memory address, and the most significant byte (MSB) is stored at the largest.
C++ uses a macro to apply endianess, which compiles to a single x86 instruction bswap.
The bswap instruction on x86 architectures is a highly efficient way to swap the byte order of a 16-bit, 32-bit, or 64-bit integer, converting it between little-endian and big-endian formats.
Java uses integer.reverse, which gets optimized away as well as bswap.

SBE ENcoding

Header
Block Fields
Repeating Group
1. Var length fields (sub-field of repeating group)
Var length fields (root fields)

When Encoded Length Isn’t Encoded Length

final int encodedLength = nosEncoder.encodedLength();
final byte[] bytes = new byte[encodedLength];
buffer.getBytes(0, bytes); //why we need this?
final DirectBuffer readBuffer = new UnsafeBuffer(bytes);
wrapDecoder(headerDecoder, nosDecoder, readBuffer, 0);
System.out.println(nosDecoder);

java.lang.IndexOutOfBoundsException: index=262 length=22 capacity=276
We get an exception because encoder.encodedLength() excludes the header length. The whole byte array is required to decode the message, not just the body.

How to determine the encoded length if we only have the encoded buffer?

Unfortunately, the decoder has to traverse to the end of the message to get the encoded length.
One other way is to remember the encoded length at the time that the message was encoded and pass it along with the encoded buffer as a method parameter.

Sample code to find encodedLength

   skipGroup(nosDecoder.allocations(), allocDec -> {
       skipGroup(allocDec.nestedParties(), partyDec -> {
                  partyDec.nestedPartyDescription();
       });
       allocDec.allocDescription();
   });
   nosDecoder.traderDescription();
   nosDecoder.orderDescription();
   //decoder encoded length at end of message = actual encoded Length
   encodedLengthFromDecoder = headerDecoder.encodedLength() + nosDecoder.encodedLength();

The Moving Repeating Group

Don't invoke same method multiple times expecting same result (it is buffer reader, and moves pointer)
Unless the field is a fixed length field, every field subsequent to the mutated field needs to be encoded again.
Remember the limit just before encoding it, then use the limit to backtrack later.

   //I want to change trader description later so remember the limit here
   final int limit = nosEncoder.limit();
   nosEncoder.traderDescription("TRADER-1");
   nosEncoder.orderDescription("ORDER DESC");
   nosEncoder.limit(limit);
   nosEncoder.traderDescription("TRADER-00001");
   //Everything subsequent to the above needs to be encoded again
   nosEncoder.orderDescription("ORDER DESC");

The Semi-Forbidden Schema Evolution

Code that uses SBE also tends to reuse the buffers to reduce allocations.
Even though we don’t care about the last field, the buffer may contains some bytes from the previous message that encroaches on the new field when we encode the new message.
Use base64 encoder to compare SBE strings

   final int encodedLength = headerEncoder.encodedLength() + nosEncoder.encodedLength();

   final byte[] bytes = new byte[encodedLength];
   buffer.getBytes(0, bytes);
   
   final String base64EncStr = Base64.getEncoder().encodeToString(bytes);
   System.out.println(base64EncStr);
   final byte[] decoderBytes = Base64.getDecoder().decode(base64EncStr);
   final DirectBuffer decoderBuffer = new UnsafeBuffer(decoderBytes);
   wrapDecoder(headerDecoder, nosDecoder, decoderBuffer, 0);
   final String decoderToString = nosDecoder.toString();
   System.out.println(decoderToString);

Encoding int and float value in SBE

public TradeEncoder customerId(final long value)
{    buffer.putLong(offset + 8, value, java.nio.ByteOrder.LITTLE_ENDIAN);    return this;}

public void writeFloat(float v) throws IOException {
    if (pos + 4 <= MAX_BLOCK_SIZE) {        Bits.putFloat(buf, pos, v);        pos += 4;    } else {        dout.writeFloat(v);    }
}
 
public void writeLong(long v) throws IOException {
    if (pos + 8 <= MAX_BLOCK_SIZE) {        Bits.putLong(buf, pos, v);        pos += 8;    } else {        dout.writeLong(v);    }
}
 
public void writeDouble(double v) throws IOException {
    if (pos + 8 <= MAX_BLOCK_SIZE) {        Bits.putDouble(buf, pos, v);        pos += 8;    } else {        dout.writeDouble(v);    }
}

Sample public SBE interface

Binance-Simple Binary Encoding (SBE) FAQ
curl -v -sX GET -H "Accept: application/sbe" -H "X-MBX-SBE: 1:0" 'https://api.binance.com/api/v3/exchangeInfo?symbol=BTCUSDT' > output.txt

Reference

Design principles for SBE, the ultra-low latency marshaling API
SBE Gotchas
iLink 3 - Simple Binary Encoding-Owned by CME Group

Generate Anki

mdanki SBE.md sbe.apkg --deck "Mohan::DeepWork::Encoding::SBE"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SBE.md

SBE.md

Simple Binary Encoding

What is basic Example in SBE

Order of encoding

Order of encoding - API

Encoder / Decoder

What are 5 fields related to MessageHeaderEncoder/Decoder

SBE Flyweights vs DTO

How to decode array?

Google Protocol Buffer vs SBE

Fast array access

How to access different size of integers in C++

Endianess

SBE ENcoding

When Encoded Length Isn’t Encoded Length

How to determine the encoded length if we only have the encoded buffer?

The Moving Repeating Group

The Semi-Forbidden Schema Evolution

Encoding int and float value in SBE

Sample public SBE interface

Reference

Generate Anki

Files

SBE.md

Latest commit

History

SBE.md

File metadata and controls

Simple Binary Encoding

What is basic Example in SBE

Order of encoding

Order of encoding - API

Encoder / Decoder

What are 5 fields related to MessageHeaderEncoder/Decoder

SBE Flyweights vs DTO

How to decode array?

Google Protocol Buffer vs SBE

Fast array access

How to access different size of integers in C++

Endianess

SBE ENcoding

When Encoded Length Isn’t Encoded Length

How to determine the encoded length if we only have the encoded buffer?

The Moving Repeating Group

The Semi-Forbidden Schema Evolution

Encoding int and float value in SBE

Sample public SBE interface

Reference

Generate Anki