You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SBE is build around flyweight pattern, it is all about reuse object to reduce memory pressure on JVM.
Class generated by SbeTool (we call them Flyweight)
SBE flyweights behaves a bit like a stencil, you position it over a wall (byte array) at the right place (offset) and then you can paint (encode) very quickly!
SBE is using array as underlying storage and fields are packed in it.
Within the fields section, fields are encoded in the order specified by schema.
Then repeating groups, again in the order specified in the schema.
Finally variable length fields, in the order specified by the schema.
Order of encoding - API
Developer encode and decode in the order specified by the schema. Failing to do so could at best reduce performance, at worst return invalid data during decoding or corrupt data in the buffer during encoding.
API might let encode and decode out of order. But there are plans to improve that and throw errors if detected an invalid sequence
That constraint helps simplifying the flyweight design and make it more hardware friendly.
Encoder / Decoder
Encoder/Decoder does no allocation or very less(i.e in case of String).
SBE recommends to use direct/offheap buffer to take GC completely out of picture.
Buffer can be allocated at thread level and can be used for decoding and encoding of message.
Decoder has to know very little metadata about message(i.e offset and size).
What are 5 fields related to MessageHeaderEncoder/Decoder
static ByteOrder BYTE_ORDER
static int ENCODED_LENGTH
static int SCHEMA_ID
static int SCHEMA_VERSION
static String SEMANTIC_VERSION
SBE Flyweights vs DTO
SBE does not work with DTO: the flyweight writes directly to the underlying buffer during encoding and reads directly from the buffer during decoding.
When we write orderId = 72 in the order flyweight, what it does is encode 72 in its byte representation (which depends of the orderId primitive type and of the endianess) and store it directly in the underlying buffer.
Flyweights can be reused indefinitely, to encode and decode different messages. But it is not threadsafe
When you decode a field of one of the primitive types, nothing is allocated, it’s only a stack operation.
How to decode array?
when you decode a field of type array you do not get a new array allocated and given back you:
Provide your own buffer (that you can reuse on your side) and the flyweight will copy data to your buffer.
Again, this allows your system to not allocate.
Why limiting or preventing allocation? To limit or suppress GCs, which will slow down your encoding and decoding operations
Google Protocol Buffer vs SBE
SBE is significantly faster, but there is a more subtle aspect:
GPB allocates so it will trigger GCs and slow down the overall system. This is another big advantage for SBE.
Fast array access
Reading integers of different sizes from a byte array in C++ is simple: apply an offset to your byte pointer, cast the pointer to the type you need and dereference, job done.
To work around those (performance) limitations Java uses the Unsafe class, which basically perform pointer operations under the hood and gets inlined (resulting in the same assembly code than C++)
How to access different size of integers in C++
#include<iostream>
#include<cstdint>intmain() {
// Example byte array (could be filled with data from a file, network, etc.)uint8_t byteArray[] = {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08};
// Pointer to the beginning of the arrayuint8_t* ptr = byteArray;
// Read an 8-bit integer (1 byte)uint8_t value8 = *reinterpret_cast<uint8_t*>(ptr);
std::cout << "8-bit value: " << static_cast<uint32_t>(value8) << std::endl;
// Advance the pointer by 1 byte to read the next value
ptr += sizeof(uint8_t);
// Read a 16-bit integer (2 bytes)uint16_t value16 = *reinterpret_cast<uint16_t*>(ptr);
std::cout << "16-bit value: " << value16 << std::endl;
// Advance the pointer by 2 bytes to read the next value
ptr += sizeof(uint16_t);
// Read a 32-bit integer (4 bytes)uint32_t value32 = *reinterpret_cast<uint32_t*>(ptr);
std::cout << "32-bit value: " << value32 << std::endl;
return0;
}
Endianess
Endianess specifies the order bytes are stored for a given primitive type. Most hardware use little endian and network historically use big endian.
In little-endian format, the least significant byte (LSB) is stored at the smallest memory address, and the most significant byte (MSB) is stored at the largest.
C++ uses a macro to apply endianess, which compiles to a single x86 instruction bswap.
The bswap instruction on x86 architectures is a highly efficient way to swap the byte order of a 16-bit, 32-bit, or 64-bit integer, converting it between little-endian and big-endian formats.
Java uses integer.reverse, which gets optimized away as well as bswap.
SBE ENcoding
Header
Block Fields
Repeating Group
Var length fields (sub-field of repeating group)
Var length fields (root fields)
When Encoded Length Isn’t Encoded Length
finalintencodedLength = nosEncoder.encodedLength();
finalbyte[] bytes = newbyte[encodedLength];
buffer.getBytes(0, bytes); //why we need this?finalDirectBufferreadBuffer = newUnsafeBuffer(bytes);
wrapDecoder(headerDecoder, nosDecoder, readBuffer, 0);
System.out.println(nosDecoder);
We get an exception because encoder.encodedLength() excludes the header length. The whole byte array is required to decode the message, not just the body.
How to determine the encoded length if we only have the encoded buffer?
Unfortunately, the decoder has to traverse to the end of the message to get the encoded length.
One other way is to remember the encoded length at the time that the message was encoded and pass it along with the encoded buffer as a method parameter.
Sample code to find encodedLength
skipGroup(nosDecoder.allocations(), allocDec -> {
skipGroup(allocDec.nestedParties(), partyDec -> {
partyDec.nestedPartyDescription();
});
allocDec.allocDescription();
});
nosDecoder.traderDescription();
nosDecoder.orderDescription();
//decoder encoded length at end of message = actual encoded LengthencodedLengthFromDecoder = headerDecoder.encodedLength() + nosDecoder.encodedLength();
The Moving Repeating Group
Don't invoke same method multiple times expecting same result (it is buffer reader, and moves pointer)
Unless the field is a fixed length field, every field subsequent to the mutated field needs to be encoded again.
Remember the limit just before encoding it, then use the limit to backtrack later.
//I want to change trader description later so remember the limit herefinalintlimit = nosEncoder.limit();
nosEncoder.traderDescription("TRADER-1");
nosEncoder.orderDescription("ORDER DESC");
nosEncoder.limit(limit);
nosEncoder.traderDescription("TRADER-00001");
//Everything subsequent to the above needs to be encoded againnosEncoder.orderDescription("ORDER DESC");
The Semi-Forbidden Schema Evolution
Code that uses SBE also tends to reuse the buffers to reduce allocations.
Even though we don’t care about the last field, the buffer may contains some bytes from the previous message that encroaches on the new field when we encode the new message.