-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that pending to-device events are sent over federation at startup #16925
Conversation
When picking destination servers that we need to wake up for a retry, we need to be mindful of destinations that we have *never* successfully sent to. This can manifest either as a null `last_successful_stream_ordering`, or even no row in `destinations` at all. Hence, we need to left-join on `destinations` rather than inner-joining, and we need to treat a null `last_successful_stream_ordering` the same as 0.
When considering which destinations need waking up for a retry, also look for those that have outstanding to-device messages.
We don't really want people to have to wait 60 seconds for their to-device messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise lgtm
run_as_background_process( | ||
"wake_destinations_needing_catchup", | ||
self._wake_destinations_needing_catchup, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a now
param to looping_call
(as the underlying twisted thing supports it), that way if the initial run takes a long time a second run won't get started by the looping call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum. Turns out this screws up the type checker.
In order to add the new parameter without breaking existing code, it needs to be a named parameter (because excess positional parameters are passed to the wrapped function). However, it turns out that ParamSpec
is incompatible with additional keyword args (https://peps.python.org/pep-0612/#id2 actually mentions this: "Note that this also why we have to reject signatures of the form (*args: P.args, s: str, **kwargs: P.kwargs)
.)
So, three options:
- Add a positional parameter to
looping_call
, and update all existing calls tolooping_call
(55 of them, according to my IDE). - Add a new method
looping_call2
(or something) which takes a positionalnow
parameter. Use it here, and deprecatelooping_call
. - Add a new method
looping_call_now
, which is exactly the same aslooping_call
except for the obvious. (The implementation of this will probably involve a private equivalent tolooping_call2
, shared betweenlooping_call_now
andlooping_call
).
My instinct is number 3, but happy with whatever you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I took an executive decision.
The tests are now failing due to db connection pool shenanigans. #17017 should fix it. |
# Synapse 1.104.0 (2024-04-02) ### Bugfixes - Fix regression when using OIDC provider. Introduced in v1.104.0rc1. ([\#17031](element-hq/synapse#17031)) # Synapse 1.104.0rc1 (2024-03-26) ### Features - Add an OIDC config to specify extra parameters for the authorization grant URL. IT can be useful to pass an ACR value for example. ([\#16971](element-hq/synapse#16971)) - Add support for OIDC provider returning JWT. ([\#16972](element-hq/synapse#16972), [\#17031](element-hq/synapse#17031)) ### Bugfixes - Fix a bug which meant that, under certain circumstances, we might never retry sending events or to-device messages over federation after a failure. ([\#16925](element-hq/synapse#16925)) - Fix various long-standing bugs which could cause incorrect state to be returned from `/sync` in certain situations. ([\#16949](element-hq/synapse#16949)) - Fix case in which `m.fully_read` marker would not get updated. Contributed by @SpiritCroc. ([\#16990](element-hq/synapse#16990)) - Fix bug which did not retract a user's pending knocks at rooms when their account was deactivated. Contributed by @hanadi92. ([\#17010](element-hq/synapse#17010)) ### Updates to the Docker image - Updated `start.py` to generate config using the correct user ID when running as root (fixes [\#16824](element-hq/synapse#16824), [\#15202](element-hq/synapse#15202)). ([\#16978](element-hq/synapse#16978)) ### Improved Documentation - Add a query to force a refresh of a remote user's device list to the "Useful SQL for Admins" documentation page. ([\#16892](element-hq/synapse#16892)) - Minor grammatical corrections to the upgrade documentation. ([\#16965](element-hq/synapse#16965)) - Fix the sort order for the documentation version picker, so that newer releases appear above older ones. ([\#16966](element-hq/synapse#16966)) - Remove recommendation for a specific poetry version from contributing guide. ([\#17002](element-hq/synapse#17002)) ### Internal Changes - Improve lock performance when a lot of locks are all waiting for a single lock to be released. ([\#16840](element-hq/synapse#16840)) - Update power level default for public rooms. ([\#16907](element-hq/synapse#16907)) - Improve event validation. ([\#16908](element-hq/synapse#16908)) - Multi-worker-docker-container: disable log buffering. ([\#16919](element-hq/synapse#16919)) - Refactor state delta calculation in `/sync` handler. ([\#16929](element-hq/synapse#16929)) - Clarify docs for some room state functions. ([\#16950](element-hq/synapse#16950)) - Specify IP subnets in canonical form. ([\#16953](element-hq/synapse#16953)) - As done for SAML mapping provider, let's pass the module API to the OIDC one so the mapper can do more logic in its code. ([\#16974](element-hq/synapse#16974)) - Allow containers building on top of Synapse's Complement container is use the included PostgreSQL cluster. ([\#16985](element-hq/synapse#16985)) - Raise poetry-core version cap to 1.9.0. ([\#16986](element-hq/synapse#16986)) - Patch the db conn pool sooner in tests. ([\#17017](element-hq/synapse#17017)) ### Updates to locked dependencies * Bump anyhow from 1.0.80 to 1.0.81. ([\#17009](element-hq/synapse#17009)) * Bump black from 23.10.1 to 24.2.0. ([\#16936](element-hq/synapse#16936)) * Bump cryptography from 41.0.7 to 42.0.5. ([\#16958](element-hq/synapse#16958)) * Bump dawidd6/action-download-artifact from 3.1.1 to 3.1.2. ([\#16960](element-hq/synapse#16960)) * Bump dawidd6/action-download-artifact from 3.1.2 to 3.1.4. ([\#17008](element-hq/synapse#17008)) * Bump jinja2 from 3.1.2 to 3.1.3. ([\#17005](element-hq/synapse#17005)) * Bump log from 0.4.20 to 0.4.21. ([\#16977](element-hq/synapse#16977)) * Bump mypy from 1.5.1 to 1.8.0. ([\#16901](element-hq/synapse#16901)) * Bump netaddr from 0.9.0 to 1.2.1. ([\#17006](element-hq/synapse#17006)) * Bump pydantic from 2.6.0 to 2.6.4. ([\#17004](element-hq/synapse#17004)) * Bump pyo3 from 0.20.2 to 0.20.3. ([\#16962](element-hq/synapse#16962)) * Bump ruff from 0.1.14 to 0.3.2. ([\#16994](element-hq/synapse#16994)) * Bump serde from 1.0.196 to 1.0.197. ([\#16963](element-hq/synapse#16963)) * Bump serde_json from 1.0.113 to 1.0.114. ([\#16961](element-hq/synapse#16961)) * Bump types-jsonschema from 4.21.0.20240118 to 4.21.0.20240311. ([\#17007](element-hq/synapse#17007)) * Bump types-psycopg2 from 2.9.21.16 to 2.9.21.20240311. ([\#16995](element-hq/synapse#16995)) * Bump types-pyopenssl from 23.3.0.0 to 24.0.0.20240311. ([\#17003](element-hq/synapse#17003)) # Synapse 1.103.0 (2024-03-19) No significant changes since 1.103.0rc1. # Synapse 1.103.0rc1 (2024-03-12) ### Features - Add a new [List Accounts v3](https://element-hq.github.io/synapse/v1.103/admin_api/user_admin_api.html#list-accounts-v3) Admin API with improved deactivated user filtering capabilities. ([\#16874](element-hq/synapse#16874)) - Include `Retry-After` header by default per [MSC4041](matrix-org/matrix-spec-proposals#4041). Contributed by @clokep. ([\#16947](element-hq/synapse#16947)) ### Bugfixes - Fix joining remote rooms when a module uses the `on_new_event` callback. This callback may now pass partial state events instead of the full state for remote rooms. Introduced in v1.76.0. ([\#16973](element-hq/synapse#16973)) - Fix performance issue when joining very large rooms that can cause the server to lock up. Introduced in v1.100.0. Contributed by @ggogel. ([\#16968](element-hq/synapse#16968)) ### Improved Documentation - Add HAProxy example for single port operation to reverse proxy documentation. Contributed by Georg Pfuetzenreuter (@tacerus). ([\#16768](element-hq/synapse#16768)) - Improve the documentation around running Complement tests with new configuration parameters. ([\#16946](element-hq/synapse#16946)) - Add docs on upgrading from a very old version. ([\#16951](element-hq/synapse#16951)) ### Updates to locked dependencies * Bump JasonEtco/create-an-issue from 2.9.1 to 2.9.2. ([\#16934](element-hq/synapse#16934)) * Bump anyhow from 1.0.79 to 1.0.80. ([\#16935](element-hq/synapse#16935)) * Bump dawidd6/action-download-artifact from 3.0.0 to 3.1.1. ([\#16933](element-hq/synapse#16933)) * Bump furo from 2023.9.10 to 2024.1.29. ([\#16939](element-hq/synapse#16939)) * Bump pyopenssl from 23.3.0 to 24.0.0. ([\#16937](element-hq/synapse#16937)) * Bump types-netaddr from 0.10.0.20240106 to 1.2.0.20240219. ([\#16938](element-hq/synapse#16938))
- Fix regression when using OIDC provider. Introduced in v1.104.0rc1. ([\#17031](element-hq/synapse#17031)) - Add an OIDC config to specify extra parameters for the authorization grant URL. IT can be useful to pass an ACR value for example. ([\#16971](element-hq/synapse#16971)) - Add support for OIDC provider returning JWT. ([\#16972](element-hq/synapse#16972), [\#17031](element-hq/synapse#17031)) - Fix a bug which meant that, under certain circumstances, we might never retry sending events or to-device messages over federation after a failure. ([\#16925](element-hq/synapse#16925)) - Fix various long-standing bugs which could cause incorrect state to be returned from `/sync` in certain situations. ([\#16949](element-hq/synapse#16949)) - Fix case in which `m.fully_read` marker would not get updated. Contributed by @SpiritCroc. ([\#16990](element-hq/synapse#16990)) - Fix bug which did not retract a user's pending knocks at rooms when their account was deactivated. Contributed by @hanadi92. ([\#17010](element-hq/synapse#17010)) - Updated `start.py` to generate config using the correct user ID when running as root (fixes [\#16824](element-hq/synapse#16824), [\matrix-org#15202](element-hq/synapse#15202)). ([\#16978](element-hq/synapse#16978)) - Add a query to force a refresh of a remote user's device list to the "Useful SQL for Admins" documentation page. ([\#16892](element-hq/synapse#16892)) - Minor grammatical corrections to the upgrade documentation. ([\#16965](element-hq/synapse#16965)) - Fix the sort order for the documentation version picker, so that newer releases appear above older ones. ([\#16966](element-hq/synapse#16966)) - Remove recommendation for a specific poetry version from contributing guide. ([\#17002](element-hq/synapse#17002)) - Improve lock performance when a lot of locks are all waiting for a single lock to be released. ([\#16840](element-hq/synapse#16840)) - Update power level default for public rooms. ([\#16907](element-hq/synapse#16907)) - Improve event validation. ([\#16908](element-hq/synapse#16908)) - Multi-worker-docker-container: disable log buffering. ([\#16919](element-hq/synapse#16919)) - Refactor state delta calculation in `/sync` handler. ([\#16929](element-hq/synapse#16929)) - Clarify docs for some room state functions. ([\#16950](element-hq/synapse#16950)) - Specify IP subnets in canonical form. ([\#16953](element-hq/synapse#16953)) - As done for SAML mapping provider, let's pass the module API to the OIDC one so the mapper can do more logic in its code. ([\#16974](element-hq/synapse#16974)) - Allow containers building on top of Synapse's Complement container is use the included PostgreSQL cluster. ([\#16985](element-hq/synapse#16985)) - Raise poetry-core version cap to 1.9.0. ([\#16986](element-hq/synapse#16986)) - Patch the db conn pool sooner in tests. ([\#17017](element-hq/synapse#17017)) * Bump anyhow from 1.0.80 to 1.0.81. ([\#17009](element-hq/synapse#17009)) * Bump black from 23.10.1 to 24.2.0. ([\#16936](element-hq/synapse#16936)) * Bump cryptography from 41.0.7 to 42.0.5. ([\#16958](element-hq/synapse#16958)) * Bump dawidd6/action-download-artifact from 3.1.1 to 3.1.2. ([\#16960](element-hq/synapse#16960)) * Bump dawidd6/action-download-artifact from 3.1.2 to 3.1.4. ([\#17008](element-hq/synapse#17008)) * Bump jinja2 from 3.1.2 to 3.1.3. ([\#17005](element-hq/synapse#17005)) * Bump log from 0.4.20 to 0.4.21. ([\#16977](element-hq/synapse#16977)) * Bump mypy from 1.5.1 to 1.8.0. ([\#16901](element-hq/synapse#16901)) * Bump netaddr from 0.9.0 to 1.2.1. ([\#17006](element-hq/synapse#17006)) * Bump pydantic from 2.6.0 to 2.6.4. ([\#17004](element-hq/synapse#17004)) * Bump pyo3 from 0.20.2 to 0.20.3. ([\#16962](element-hq/synapse#16962)) * Bump ruff from 0.1.14 to 0.3.2. ([\#16994](element-hq/synapse#16994)) * Bump serde from 1.0.196 to 1.0.197. ([\#16963](element-hq/synapse#16963)) * Bump serde_json from 1.0.113 to 1.0.114. ([\#16961](element-hq/synapse#16961)) * Bump types-jsonschema from 4.21.0.20240118 to 4.21.0.20240311. ([\#17007](element-hq/synapse#17007)) * Bump types-psycopg2 from 2.9.21.16 to 2.9.21.20240311. ([\#16995](element-hq/synapse#16995)) * Bump types-pyopenssl from 23.3.0.0 to 24.0.0.20240311. ([\#17003](element-hq/synapse#17003))
Fixes #16680, as well as a related bug, where servers which we had never successfully sent an event to would not be retried.
In order to fix the case of pending to-device messages, we hook into the existing
wake_destinations_needing_catchup
process, by extending it to look for destinations that have pending to-device messages. The federation transmission loop then attempts to send the pending to-device messages as normal.Suggest review commit-by-commit.