-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add back fast path for non-gappy syncs #17064
Conversation
PR #16942 removed an invalid optimisation that avoided pulling out state for non-gappy syncs. This causes a large increase in DB usage. c.f. #16941 for why that optimisation was wrong. However, we can still optimise in the simple case where the events in the timeline are a linear chain without any branching/merging of the DAG.
synapse/handlers/sync.py
Outdated
is_linear_timeline = all(len(e.prev_event_ids()) <= 1 for e in batch.events) | ||
if is_linear_timeline and not batch.limited: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you considered how this behaves in longer-lived forks:
E1
↗ ↖
| S2
| ↑
E3 |
↑ |
--|------|---- <- prev sync
| |
E4 E5
↑ |
--|------|---- <- this sync
| |
↖ /
E6 (the distant future)
E4 and E5 both have single prev events, but I'm not convinced it is safe to drop the state delta between the forks here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state returned is currently (timeline_start | timeline_end) - previous_timeline_end - timeline_contains
. If we have a linear chain with no gaps, my assumption is that timeline_start == previous_timeline_end
and timeline_end == timeline_start + timeline_contains
, which then all cancels out.
But bleurghghghghghg you're right that we need to actually check that all the events are in the same chain and point to the previous timeline end. BLEURGH.
(Though does the current code sensibly work in this case?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Though does the current code sensibly work in this case?)
WHO KNOWS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More constructively: it would be good to add a test for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought about this over lunch:
If we have a linear chain with no gaps, my assumption is that
timeline_start == previous_timeline_end
I think with long lived forks this may or may not be true, depending on if the the end and start are part of the same fork. In the present code, we'll come to the wrong answer if that is the case, and in the new code it will come to the wrong answer either way.
Given that, I'm somewhat tempted to take accept that and fix the performance regression. The other option is to do an extra DB hit to check if the event ID corresponding to previous_timeline_end
matches the prev event of the start of the timeline.
I think if we want to remove all these edge cases we'll need to change all this to try and use the current state (based on the extremities at the time), though that's quite a big change (but perhaps we can do it for sliding sync?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to optimise for the linear case.
PR element-hq#16942 removed an invalid optimisation that avoided pulling out state for non-gappy syncs. This causes a large increase in DB usage. c.f. element-hq#16941 for why that optimisation was wrong. However, we can still optimise in the simple case where the events in the timeline are a linear chain without any branching/merging of the DAG. cc. @richvdh
Forget a line, and an empty batch is trivially linear. c.f. element-hq#17064
No significant changes since 1.105.0rc1. - Stabilize support for [MSC4010](matrix-org/matrix-spec-proposals#4010) which clarifies the interaction of push rules and account data. Contributed by @clokep. ([\#17022](element-hq/synapse#17022)) - Stabilize support for [MSC3981](matrix-org/matrix-spec-proposals#3981): `/relations` recursion. Contributed by @clokep. ([\#17023](element-hq/synapse#17023)) - Add support for moving `/pushrules` off of main process. ([\#17037](element-hq/synapse#17037), [\#17038](element-hq/synapse#17038)) - Fix various long-standing bugs which could cause incorrect state to be returned from `/sync` in certain situations. ([\#16930](element-hq/synapse#16930), [\#16932](element-hq/synapse#16932), [\#16942](element-hq/synapse#16942), [\#17064](element-hq/synapse#17064), [\#17065](element-hq/synapse#17065), [\#17066](element-hq/synapse#17066)) - Fix server notice rooms not always being created as unencrypted rooms, even when `encryption_enabled_by_default_for_room_type` is in use (server notices are always unencrypted). ([\#17033](element-hq/synapse#17033)) - Fix the `.m.rule.encrypted_room_one_to_one` and `.m.rule.room_one_to_one` default underride push rules being in the wrong order. Contributed by @Sumpy1. ([\#17043](element-hq/synapse#17043)) - Refactor auth chain fetching to reduce duplication. ([\#17044](element-hq/synapse#17044)) - Improve database performance by adding a missing index to `access_tokens.refresh_token_id`. ([\#17045](element-hq/synapse#17045), [\#17054](element-hq/synapse#17054)) - Improve database performance by reducing number of receipts fetched when sending push notifications. ([\#17049](element-hq/synapse#17049)) * Bump packaging from 23.2 to 24.0. ([\#17027](element-hq/synapse#17027)) * Bump regex from 1.10.3 to 1.10.4. ([\#17028](element-hq/synapse#17028)) * Bump ruff from 0.3.2 to 0.3.5. ([\#17060](element-hq/synapse#17060)) * Bump serde_json from 1.0.114 to 1.0.115. ([\#17041](element-hq/synapse#17041)) * Bump types-pillow from 10.2.0.20240125 to 10.2.0.20240406. ([\#17061](element-hq/synapse#17061)) * Bump types-requests from 2.31.0.20240125 to 2.31.0.20240406. ([\#17063](element-hq/synapse#17063)) * Bump typing-extensions from 4.9.0 to 4.11.0. ([\#17062](element-hq/synapse#17062))
PR #16942 removed an invalid optimisation that avoided pulling out state for non-gappy syncs. This causes a large increase in DB usage. c.f. #16941 for why that optimisation was wrong.
However, we can still optimise in the simple case where the events in the timeline are a linear chain without any branching/merging of the DAG.
cc. @richvdh