This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add notes describing Synapse's streams #16015
Merged
Merged
Changes from 5 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
5dc63fc
Add notes describing Synapse's streams
3092694
Include new doc in TOC
f2e2036
Changelog
58cb43a
typo fix notation -> notion
04b5eba
Ditch crappy `[*]` notation
4dbe770
less "less roughly speaking"
90cf41f
single multi -> single
a81a4d8
Another feedback pass
a16ba91
Another tweak pass
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Add a internal documentation page describing the ["streams" used within Synapse](https://matrix-org.github.io/synapse/v1.90/development/synapse_architecture/streams.html). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
## Streams | ||
|
||
Synapse has a concept of "streams", which are roughly described in [`id_generators.py`]( | ||
https://github.com/matrix-org/synapse/blob/develop/synapse/storage/util/id_generators.py | ||
). | ||
It is important to understand this, so let's describe them formally. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it important to understand these? Maybe include some examples of how they're used or something? |
||
We paraphrase from the docstring of [`AbstractStreamIdGenerator`]( | ||
https://github.com/matrix-org/synapse/blob/a719b703d9bd0dade2565ddcad0e2f3a7a9d4c37/synapse/storage/util/id_generators.py#L96 | ||
). | ||
|
||
A stream is an append-only log `T1, T2, ..., Tn, ...` of facts which grows over time. | ||
Only "writers" can add facts to a stream, and there may be multiple writers. | ||
DMRobertson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Each fact has an ID, called its "stream ID". | ||
Readers should only process facts in ascending stream ID order. | ||
|
||
Roughly speaking, each stream is backed by a database table. | ||
It should have a `stream_id` column holding stream IDs, plus additional columns | ||
as necessary to describe the fact. | ||
(Note that it may take multiple rows (with the same `stream_id`) to describe that fact.) | ||
Stream IDs are globally unique (enforced by Postgres sequences). | ||
DMRobertson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
> _Aside_. Less roughly speaking, here are some additional notes on streams' backing tables. | ||
DMRobertson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
> | ||
> 1. Rich would like to [ditch the backing tables](https://github.com/matrix-org/synapse/issues/13456). | ||
> 2. The backing tables may have other uses. | ||
> For example, the events table serves backs the events stream, and is read when processing new events. | ||
> But old rows are read from the table all the time, whenever Synapse needs to lookup some facts about an event. | ||
> 3. Rich suspects that sometimes the stream is backed by multiple tables, so the stream proper is the union of those tables. | ||
|
||
Stream writers can "reserve" a stream ID, and then later mark it as having being completed. | ||
Stream writers need to track the completion of each stream fact. | ||
In the happy case, completion means a fact has been written to the stream table. | ||
But unhappy cases (e.g. transaction rollback due to an error) also count as completion. | ||
Once completed, the rows written with that stream ID are fixed, and no new rows | ||
will be inserted with that ID. | ||
|
||
### Current stream ID | ||
|
||
We may define a per-writer notion of the "current" stream ID: | ||
|
||
> The current stream ID _for a writer W_ is the largest stream ID such that | ||
> all transactions added by W with equal or smaller ID have completed. | ||
DMRobertson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Similarly, there is a global notion of current stream ID: | ||
|
||
> The current stream ID is the largest stream ID such that | ||
> all facts (added by any writer) with equal or smaller ID have completed. | ||
|
||
NB. This means that if a writer opens a transaction that never completes, the current stream ID will never advance beyond that writer's last written stream ID. | ||
|
||
For single-writer streams, the per-writer current ID and the global current ID | ||
are the same. | ||
DMRobertson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Both senses of current ID are monotonic, but they may "skip" or jump over IDs | ||
because facts complete out of order. | ||
DMRobertson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
_Example_. | ||
Consider a single-writer stream which is initially at ID 1. | ||
|
||
| Action | Current stream ID | Notes | | ||
|------------|-------------------|-------------------------------------------------| | ||
| | 1 | | | ||
| Reserve 2 | 1 | | | ||
| Reserve 3 | 1 | | | ||
| Complete 3 | 1 | current ID unchanged, waiting for 2 to complete | | ||
| Complete 2 | 3 | current ID jumps from 1 -> 3 | | ||
| Reserve 4 | 3 | | | ||
| Reserve 5 | 3 | | | ||
| Reserve 6 | 3 | | | ||
| Complete 5 | 3 | | | ||
| Complete 4 | 5 | current ID jumps 3->5, even though 6 is pending | | ||
| Complete 6 | 6 | | | ||
|
||
|
||
### Multi-writer streams | ||
|
||
There are two ways to view a multi-writer stream. | ||
|
||
1. Treat it as a collection of distinct single-writer streams, one | ||
for each writer. | ||
2. Treat it as a single stream. | ||
DMRobertson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The single stream (option 2) is conceptually simpler, and easier to represent (a single stream id). | ||
However, it requires each reader to know about the entire set of writers, to ensures that readers don't erroneously advance their current stream position too early and miss a fact from an unknown writer. | ||
In contrast, multiple parallel streams (option 1) are more complex, requiring more state to represent (map from writer to stream id). | ||
The payoff for doing so is that readers can "peek" ahead to facts that completed on one writer no matter the state of the others, reducing latency. | ||
|
||
Note that a single multi-writer stream can be viewed in both ways. | ||
DMRobertson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
For example, the events stream is treated as multiple single-writer streams (option 1) by the sync handler, so that events are sent to clients as soon as possible. | ||
But the background process that works through events treats them as a single linear stream. | ||
|
||
Another useful example is the cache invalidation stream. | ||
The facts this stream holds are instructions to "you should now invalidate these cache entries". | ||
We only ever treat this as a multiple single-writer streams as there is no important ordering between cache invalidations. | ||
(Invalidations are self-contained facts; and the invalidations commute/are idempotent). | ||
|
||
### Subscribing to streams | ||
|
||
We have described streams as a data structure, but not how to listen to them for changes. | ||
|
||
Writers track their current position. | ||
At startup, they can find this by querying the database (which suggests that facts need to be written to the database atomically, in a transaction). | ||
|
||
Readers need to track the current position of every writer. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this probably needs a section of what this implies -- which is that readers only track writers they know about? |
||
At startup, they can find this by contacting each writer with a `REPLICATE` message, | ||
requesting that all writers reply describing their current position in their streams. | ||
This is done with a `POSITION` message. | ||
This communication used to happen directly with the writers [over TCP](../../tcp_replication.md); | ||
nowadays it's done via Redis's Pubsub. | ||
DMRobertson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
> _Aside._ | ||
> We also use Redis as an external, non-persistent key-value store. | ||
|
||
The only thing remaining is how to persist facts and advance streams. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could use a bit more of how things are persisted, I think? |
||
Writers need to track the facts currently awaiting completion. | ||
When writing a fact has completed and no earlier fact is awaiting completion, the writer can advance its current position in that stream. | ||
Upon doing to it emits an `RDATA` message, once for every fact between the old and the new stream ID. | ||
|
||
Readers listen for `RDATA` messages and process them to respond to the new fact. | ||
The `RDATA` itself is not a self-contained representation of the fact; | ||
readers will have to query the stream tables for the full details. | ||
Readers must also advance their record of the writer's current position for that stream. | ||
|
||
# Summary | ||
|
||
In a nutshell: we have an append-only log with a "buffer/scratchpad" at the end where we have to wait for the sequence to be linear and contiguous. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of me wonders if this should just be a massive docstring at the top of that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very possibly. (And I didn't even think about the Sphinx thing we have setup...)