-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-flake MSC3030 backfill test with Synapse worker replication #492
De-flake MSC3030 backfill test with Synapse worker replication #492
Conversation
Spawning from matrix-org/synapse#14028 (comment) The reason they failed before was because we're fighting against stale caches across workers racing while waiting for invalidation, see matrix-org/synapse#13185 (comment) > Here is what happens: > > 1. `serverB` has `event1` stored as an `outlier` from previous requests (specifically from MSC3030 jump to date pulling in a missing `prev_event` after backfilling) > 1. Client on `serverB` calls `/messages?dir=b` > 1. `serverB:client_reader1` accepts the request and drives things > 1. `serverB:client_reader1` has some backward extremities in range and requests `/backfill` from `serverA` > 1. `serverB:client_reader1` processes the events from backfill including `event1` and puts them in the `_event_persist_queue` > 1. `serverB:master` picks up the events from the `_event_persist_queue` and persists them to the database, de-outliers `event1` and invalidates its own cache and sends them over replication > 1. `serverB:client_reader1` starts assembling the `/messages` response and gets `event1` out of the stale cache still as an `outlier` > 1. `serverB:client_reader1` responds to the `/messages` request without `event1` because `outliers` are filtered out > 1. `serverB:client_reader1` finally gets the replication data and invalidates its own cache for `event1` (too late, we already got the events from the stale cache and responded)
Fixes matrix-org/synapse#13944? |
@@ -203,6 +203,16 @@ func TestJumpToDateEndpoint(t *testing.T) { | |||
contextResResBody := client.ParseJSON(t, contextRes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes matrix-org/synapse#13944?
@squahtx I don't think this fixes that issue. The error there is:
msc3030_test.go:200: CSAPI.MustDoFunc response return non-2xx code: 403 Forbidden - body: {"errcode":"M_FORBIDDEN","error":"You don't have permission to access that event."}
which indicates we also need to retry the /context
request above until it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in a separate PR: matrix-org/synapse#14096
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the fact that we have to do this a bug in Synapse? From the description it sounds like this is all happening within a single request to the client_reader.
Are there any spec guidelines on how /messages is supposed to behave WRT backfill?
I think this is more a discussion to be had on matrix-org/synapse#13185. The current Synapse behavior doesn't feel quite right but since there is no spec/guidelines, this could be assumed to be correct as well.
I don't think so. The spec just says this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for the review @squahtx and @DMRobertson 🐦 |
De-flake MSC3030 backfill test with Synapse worker replication
Spawning from matrix-org/synapse#14028 (comment)
CI failure: https://github.com/matrix-org/synapse/actions/runs/3182998161/jobs/5189731097#step:6:15343 (from discussion)
Why did this test fail with workers before?
The reason they failed before was because we're fighting against stale caches across workers racing while waiting for invalidation, see matrix-org/synapse#14211
Dev notes
Complement test run failure with relevant debug logs