This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove old rows from the
cache_invalidation_stream_by_instance
table automatically. (This table is not used when Synapse is configured to use SQLite.) #15868Remove old rows from the
cache_invalidation_stream_by_instance
table automatically. (This table is not used when Synapse is configured to use SQLite.) #15868Changes from all commits
6f3b76d
f50c800
19418ec
383b7c9
3ca3d1e
fe7c6a9
eb28389
53e7030
2bef4ef
8ece6c0
71fcc0e
ca52419
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, but I probably would have queried for something like
i.e. to see if the complement of rows covered by the previous query is nonempty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I genuinely do only want to look at the first row after our current marker: adding more conditions like you suggest is either going to cause more work for the DB as it'd have to scan all the remaining rows until finding one (I think?) or in the case of the
OR
condition, it's going to not be able to use any index for that part of the condition.Have to be honest I don't quite follow your suggestion's query, at first I was thinking you wanted
WHERE stream_id > ? AND invalidation_ts <= ?
but altogether not sure. Ignoring that I don't fully grok what you're saying, I thinkOR invalidation_ts > ?
is potentially problematic as we don't have an index oninvalidation_ts
— I can't see how the query planner would make a generally efficient plan for the query :).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought you were deleting things based on both stream_id and invalidation_ts. If so, it seemed odd to not take both into account when working out if had deleted everything that you wanted to.
I see now that you're only deleting things based on stream_id, using the invalidation_ts only to select a sensible upper limit on stream_id.
Paranoid note: it is not necessarily the case that x.stream_id > y.stream_id implies x.invalidation_ts > y.invalidation_ts (consider e.g. one worker is slow to complete transactions or has a laggy clock). I wouldn't expect this to have any meaingful effect in practice because any drifts should be small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could easily determine the outcome of the deletion.
There is this to see how many rows were deleted if we're interested in that boilerplate.
(not saying we have to use it as the current way is simple, just cumbersome).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered checking the count of deleted rows but these streams can technically have gaps so you don't actually know how many rows are to be deleted, only the upper bound.
DELETE ... RETURNING ...
was an option too but it felt less obvious to me and I wanted it to be 'obviously correct', whereas I wouldn't have as much confidence writing something that way