-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Loading status checks…
Add docs
Sergiu Marin
authored and
Sergiu Marin
committed
May 30, 2024
1 parent
019d280
commit 06285ee
Showing
1 changed file
with
116 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
Sonic follows the proactor model of asynchronous completion. For any asynchronous operation, such as a read/write from/to a socket, the user provides a callback which will be invoked at some point in the future, when the operation completes, either successfully, or through an error. In some cases, the completion handler will be invoked immediately as the operation can complete within the asynchronous call. | ||
|
||
For example: | ||
```go | ||
b := make([]byte, 4096) // buffer to read into | ||
conn.AsyncRead(b, func(err error, n int) { | ||
// if err == nil, then n is the number of bytes read | ||
// if err != nil, then the read failed and n >= 0 | ||
}) | ||
``` | ||
|
||
Note that the user can invoke another asynchronous operation in the provided callback: | ||
```go | ||
var onRead func(error, int) | ||
onRead = func(err error, n int) { | ||
conn.AsyncRead(b, onRead) | ||
} | ||
conn.AsyncRead(b, onRead) // the first read | ||
``` | ||
|
||
Since the first read can complete immediately, `onRead` can get called before the first `AsyncRead` returns. This also holds for the following `AsyncRead`s done in the callback. In other words, if multiple reads can complete immediately, we keep growing the call stack as we never get the change to return from an `AsyncRead`. The solution is to limit the number of consecutive immediate reads - if we hit the limit, we asynchronously schedule the next read, hence popping the call frames. For example, see `conn.go`. | ||
|
||
# The central IO component | ||
|
||
The `IO struct` (also called the IO context) is the main entity which provides users with the ability to schedule asynchronous operations. There must be only one `IO` per goroutine. Multiple `IO`s can exist within a program. | ||
|
||
Any construct which must provide asynchronous operations, such as `conn.go`, takes an instance of an IO context. | ||
|
||
Since the IO context is the central pillar of Sonic, every program's main function will end with a call to run the IO. For example: | ||
|
||
```go | ||
ioc.Run() // runs until the program is terminated, yielding the CPU between async operations | ||
|
||
for { ioc.PollOne() } // runs until the program is terminated, polling for IO, thus never yielding the CPU between IO operations | ||
``` | ||
|
||
Under the hood, the IO context uses a platform specific event notification system to get informed when an IO operation can be completed (`epoll` for Linux, `kqueue` for BSD/macOS). In short, this event notification systems allow us to register a socket for read/write events. Calling `epoll` after registering a socket might return an event on that socket, telling us whether it's ready to be read. | ||
|
||
# Networking constructs | ||
|
||
## Transports | ||
|
||
- TCP client/server: `conn.go`, `listen_conn.go` | ||
- connection oriented UDP peer: `packet.go` and `listen_packet.go` | ||
- UDP multicast: see the `multicast` package, which supports IPv4 and source IP filtering. | ||
|
||
## Codecs | ||
|
||
Codecs are meant to sit on top of stream based protocols. Currently, the only stream-based transport in Sonic is a TCP connection. | ||
|
||
Stream transports do not give any guarantee on the number of bytes read. A single socket read might end up with 0 or 100 bytes. This is in contrast with packet based transports, like UDP(/multicast), in which every read returns a complete packet. If your buffer is big enough, you'll read the whole packet. If your packet is smaller than a packet, you'll read as much as possible from the packet and the rest will be discarded. | ||
|
||
To help with writing application layer protocols on top of stream based transports, such as WebSocket, Sonic provides users with the notion of a `codec` and `codec stream`. | ||
|
||
A `codec` is simply an interface with two functions: | ||
- `Encode(Item, *ByteBuffer) error`: encodes the given `Item` into the `ByteBuffer` | ||
- `Decode(*ByteBuffer) (Item, error)`: decodes and returns the `Item` from the `ByteBuffer`, or, if there are not enough bytes, returns `nil, sonicerrors.ErrNeedMore`. | ||
|
||
A `codec stream` takes a user defined `codec` and implements read functions that return an `Item` and write functions that write an `Item`. A `codec stream` automatically handles short reads as notified by the `codec` through `sonicerrors.ErrNeedMore`. | ||
|
||
See `codec/frame.go` for a sample codec. | ||
|
||
## Buffers | ||
|
||
Sonic offers three buffer types: | ||
- `ByteBuffer`: a FIFO like buffer on which `codec`s are based. Not extremely efficient as data is copied on some operations needed by `codec`s | ||
- `MirroredBuffer`: a zero-copy `ByteBuffer` - this was recently added and needs a bit more code to fully replace the `ByteBuffer` | ||
- `BipBuffer`: a zero-copy FIFO buffer best suited for packet based communication | ||
|
||
### BipBuffer in Packet Transports | ||
|
||
The purpose of the `BipBuffer` is to offer an efficient way to store packets in the event of loss, while the missing packets are replayed. | ||
|
||
Say each packet is 1 byte and carries a sequence number. Packet loss occurs when the next received sequence number does not follow the previous one. For example, if we receive `1 2 4` then we miss packet `3`. We must buffer `4` and everything that follows it until we replay packet `3` (through a TCP feed, for example). | ||
|
||
To store packets until the missing ones are replayed, we use a `BipBuffer`. Say | ||
`| | | | |` is a bip buffer which can hold 4 bytes. Reading `1 2` goes as follows: | ||
``` | ||
|1| | | | | ||
|2| | | | | ||
``` | ||
|
||
Reading `4 5 6` then results in buffering each packet | ||
``` | ||
|4|5|6|x| --> the next read is done in x | ||
``` | ||
|
||
If we somehow get 3 again, we can then iterate over the `BipBuffer` to process `4 5 6`. All subsequent reads will be done in the place of `x` as long as they're in sequence. If `x` is out of sequence, then it's stored there and we read the next packet in the first byte slot. | ||
|
||
If the `BipBuffer` is not full, then it acts like a circular buffer: if the next packet does not fit at the end of the buffer and there's enough space at the beginning, then it will be put there. If not, then an error will be returned - the user can then decide what to do in the event of a full buffer. | ||
|
||
Packet loss is the most common reliability issue in exchange communications. Packet re-ordering happens rarely, if ever. As such, we can use the zero-copy nature of a `BipBuffer` to efficiently store out of sequence packets until we replay the missing ones. | ||
|
||
#### Efficiently replaying packets | ||
|
||
Say we have the packets `1 3 5 7` - we miss `2 4 6`. In a program, we will process `1` and then queue `3 5 7` until `2` is received - when that happens, we are left with `5 7` at which point we wait for `4` etc. | ||
|
||
An efficient program will issue replay requests as gaps happen, and not when they're encountered. That means that in our example above, we issue a replay request for `2` after `3`, for `4` after `5` and for `6` after `7`, as they're read from the network. The innefficient alternative is to issue replay requests as the packets are read from the `BipBuffer`. | ||
|
||
This can be achieved by keeping track of two sequence numbers: | ||
- `present`: the last valid sequence number read from the network. In the case of `1 3 5 7`, this will be `1`. | ||
- `future`: the sequence number assuming all missing packets are delivered. In the case of `1 3 5 7`, this will be `7` | ||
|
||
The flow is as follows: | ||
``` | ||
read 1: present=1 future=1 | ||
read 3: present=1 future=3, replay 2 | ||
read 5: present=1 future=5, replay 4 | ||
read 7: present=1 future=7, replay 6 | ||
2 is replayed: present=3 future=7 | ||
4 is replayed: present=5 future=7 | ||
6 is replayed: present=7 future=7 | ||
``` | ||
|
||
We know we don't miss anything if `present == future`. We know we miss some packets if `present < future`. We can never have `present > future`. We know whether we need to replay something after reading a packet by comparing its sequence number with `future`. Say we read `seq`. We then request packets from `future + 1` to `seq - 1` if `seq > future`. |