-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove stream_positions
table
#17047
Remove stream_positions
table
#17047
Conversation
…t's near the beginning of the init process. Just set it to the 1 as the default.
…th new SQL looking up the data directly in the associated tables for a given stream writer
…sts will fail at this point
…add some explanation around the rationale
…ike lines on my scrollbar in pycharm)
…xisting_stream() Without a `stream_positions` table to lean on, this only needed a few small tweaks to wording and actual representative positions
@@ -451,7 +443,8 @@ def __init__( | |||
self._current_positions.values(), default=1 | |||
) | |||
|
|||
# For the case where `stream_positions` is not up to date, | |||
# TODO: this needs updating or verifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE: TODO comment
Actually, I believe this block can be removed. _max_seen_allocated_stream_id
is already set just above and will already equal what _persisted_upto_position
is because of what happens inside of _load_current_ids()
.
Either it already is set to the max()
(see new line 519)
or it is already set to the max()
through iteration of _add_persisted_positions()
(see new line 561)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I see my ability to show the spot correctly has not improved with time)
# to be persisted or will default to 1. | ||
# TODO: is this next comment still valid? | ||
# (This can be a problem for e.g. backfill streams where the server has | ||
# never backfilled). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE: TODO comment
I believe this is resolved by the setting of max_stream_id
to 1 at the top of the function. If I'm following the juggling of negatives and positives, this one is always positive?
# If we add back the old "first" then we shouldn't see the persisted up | ||
# to position revert back to 3. | ||
# If we add back the old "first" then we should see the persisted up | ||
# to position revert back to 3, as this writer hasn't written anything since. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I postulate that it actually should be a 3 and not a 6, as "first" hasn't actually written anything since it's initial 3 reflecting it's minimum persisted value.
I assert that the first instance created in this test is proof, since it's
instance name is neither of the writers named and is therefore a 'reader' and not a 'writer', just as this 5th instance is.
This table got added in matrix-org/synapse#8374, does the rationale no longer make sense? |
Thank you for pointing at this, I was not aware of the 'why' behind it. I will have to process what all of that means for a while, I think. Your question is worth a good response. It does appear that the majority of the tables do not have indexes on However, I believe eliminating the excess of dead tuples in Postgres(not to mention the wear on hard drives) in place of a table scan on startup may be worth the effort. I don't suppose you recall how long it was taking for the table scan, as whatever evidence my server has would not be particularly useful next to a larger instance such as matrix.org?(Just as a curiosity) |
I agree that it would be good to find a way of reducing the DB load due to it, just not quite sure how atm 🙂 |
I think I have a different angle to try on this, so I will close it |
The
stream_positions
table is used to track what sequence number(as a stream id) has been last persisted into the database, and only for Postgres. Updating this table seems to be done often, but is only read from during startup. As all the data that is in that table can be discovered in each stream's actualtables
, just use that at startup and don't bother with this extra table.After start-up, this data is maintained in memory the same way it is today.
The motivator for this change:
Let's just...not do this 🤷♂️
Pull Request Checklist
(run the linters)