Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add notes describing Synapse's streams #16015

Merged
merged 9 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/16015.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add a internal documentation page describing the ["streams" used within Synapse](https://matrix-org.github.io/synapse/v1.90/development/synapse_architecture/streams.html).
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@
- [Cancellation](development/synapse_architecture/cancellation.md)
- [Log Contexts](log_contexts.md)
- [Replication](replication.md)
- [Streams](development/synapse_architecture/streams.md)
- [TCP Replication](tcp_replication.md)
- [Faster remote joins](development/synapse_architecture/faster_joins.md)
- [Internal Documentation](development/internal_documentation/README.md)
Expand Down
126 changes: 126 additions & 0 deletions docs/development/synapse_architecture/streams.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
## Streams

Synapse has a concept of "streams", which are roughly described in [`id_generators.py`](
https://github.com/matrix-org/synapse/blob/develop/synapse/storage/util/id_generators.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me wonders if this should just be a massive docstring at the top of that file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very possibly. (And I didn't even think about the Sphinx thing we have setup...)

).
It is important to understand this, so let's describe them formally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it important to understand these? Maybe include some examples of how they're used or something?

We paraphrase from the docstring of [`AbstractStreamIdGenerator`](
https://github.com/matrix-org/synapse/blob/a719b703d9bd0dade2565ddcad0e2f3a7a9d4c37/synapse/storage/util/id_generators.py#L96
).

A stream is an append-only log `T1, T2, ..., Tn, ...` of facts which grows over time.
Only "writers" can add facts to a stream, and there may be multiple writers.
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved

Each fact has an ID, called its "stream ID".
Readers should only process facts in ascending stream ID order.

Roughly speaking, each stream is backed by a database table.
It should have a `stream_id` column holding stream IDs, plus additional columns
as necessary to describe the fact.
(Note that it may take multiple rows (with the same `stream_id`) to describe that fact.)
Stream IDs are globally unique (enforced by Postgres sequences).
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved

> _Aside_. Less roughly speaking, here are some additional notes on streams' backing tables.
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved
>
> 1. Rich would like to [ditch the backing tables](https://github.com/matrix-org/synapse/issues/13456).
> 2. The backing tables may have other uses.
> For example, the events table serves backs the events stream, and is read when processing new events.
> But old rows are read from the table all the time, whenever Synapse needs to lookup some facts about an event.
> 3. Rich suspects that sometimes the stream is backed by multiple tables, so the stream proper is the union of those tables.

Stream writers can "reserve" a stream ID, and then later mark it as having being completed.
Stream writers need to track the completion of each stream fact.
In the happy case, completion means a fact has been written to the stream table.
But unhappy cases (e.g. transaction rollback due to an error) also count as completion.
Once completed, the rows written with that stream ID are fixed, and no new rows
will be inserted with that ID.

### Current stream ID

We may define a per-writer notion of the "current" stream ID:

> The current stream ID _for a writer W_ is the largest stream ID such that
> all transactions added by W with equal or smaller ID have completed.
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved

Similarly, there is a global notion of current stream ID:

> The current stream ID is the largest stream ID such that
> all facts (added by any writer) with equal or smaller ID have completed.

NB. This means that if a writer opens a transaction that never completes, the current stream ID will never advance beyond that writer's last written stream ID.

For single-writer streams, the per-writer current ID and the global current ID
are the same.
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved
Both senses of current ID are monotonic, but they may "skip" or jump over IDs
because facts complete out of order.
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved

_Example_.
Consider a single-writer stream which is initially at ID 1.

| Action | Current stream ID | Notes |
|------------|-------------------|-------------------------------------------------|
| | 1 | |
| Reserve 2 | 1 | |
| Reserve 3 | 1 | |
| Complete 3 | 1 | current ID unchanged, waiting for 2 to complete |
| Complete 2 | 3 | current ID jumps from 1 -> 3 |
| Reserve 4 | 3 | |
| Reserve 5 | 3 | |
| Reserve 6 | 3 | |
| Complete 5 | 3 | |
| Complete 4 | 5 | current ID jumps 3->5, even though 6 is pending |
| Complete 6 | 6 | |


### Multi-writer streams

There are two ways to view a multi-writer stream.

1. Treat it as a collection of distinct single-writer streams, one
for each writer.
2. Treat it as a single stream.
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved

The single stream (option 2) is conceptually simpler, and easier to represent (a single stream id).
However, it requires each reader to know about the entire set of writers, to ensures that readers don't erroneously advance their current stream position too early and miss a fact from an unknown writer.
In contrast, multiple parallel streams (option 1) are more complex, requiring more state to represent (map from writer to stream id).
The payoff for doing so is that readers can "peek" ahead to facts that completed on one writer no matter the state of the others, reducing latency.

Note that a single multi-writer stream can be viewed in both ways.
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved
For example, the events stream is treated as multiple single-writer streams (option 1) by the sync handler, so that events are sent to clients as soon as possible.
But the background process that works through events treats them as a single linear stream.

Another useful example is the cache invalidation stream.
The facts this stream holds are instructions to "you should now invalidate these cache entries".
We only ever treat this as a multiple single-writer streams as there is no important ordering between cache invalidations.
(Invalidations are self-contained facts; and the invalidations commute/are idempotent).

### Subscribing to streams

We have described streams as a data structure, but not how to listen to them for changes.

Writers track their current position.
At startup, they can find this by querying the database (which suggests that facts need to be written to the database atomically, in a transaction).

Readers need to track the current position of every writer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this probably needs a section of what this implies -- which is that readers only track writers they know about?

At startup, they can find this by contacting each writer with a `REPLICATE` message,
requesting that all writers reply describing their current position in their streams.
This is done with a `POSITION` message.
This communication used to happen directly with the writers [over TCP](../../tcp_replication.md);
nowadays it's done via Redis's Pubsub.
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved

> _Aside._
> We also use Redis as an external, non-persistent key-value store.

The only thing remaining is how to persist facts and advance streams.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a bit more of how things are persisted, I think?

Writers need to track the facts currently awaiting completion.
When writing a fact has completed and no earlier fact is awaiting completion, the writer can advance its current position in that stream.
Upon doing to it emits an `RDATA` message, once for every fact between the old and the new stream ID.

Readers listen for `RDATA` messages and process them to respond to the new fact.
The `RDATA` itself is not a self-contained representation of the fact;
readers will have to query the stream tables for the full details.
Readers must also advance their record of the writer's current position for that stream.

# Summary

In a nutshell: we have an append-only log with a "buffer/scratchpad" at the end where we have to wait for the sequence to be linear and contiguous.