Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's not explicitly clear that Devices and (Live) Sync Tokens are meant to exist on a 1-to-1 basis #1416

Closed
ShadowJonathan opened this issue Jan 26, 2023 · 7 comments
Labels
clarification An area where the expected behaviour is understood, but the spec could do with being more explicit

Comments

@ShadowJonathan
Copy link
Contributor

ShadowJonathan commented Jan 26, 2023

Context

This came up while discussing MSC3391 in #synapse-dev, context here.

Synapse stores a table of user devices that have seen a particular delete of account data, to keep track which devices have yet outstanding notices, and to delete the account-data table entry when the last one had received it.

I mused about a possible edge-case of multiple sync-tokens per device not receiving this information, and proposed tracking the account-data stream id in the sync token itself. (Which had other problems, but is irrelevant to contextualising this).

I got told that this wasn't necessary, as devices were expected to only hold one sync-token live, which got underscored with a pointer to to-device message handling, where old to-devices are removed from the database once a device calls sync on the next_batch sync-token.

In my eyes, this above would make it unworkable (or at the very least, very undesirable) to use multiple sync tokens per device, or that a sync token is used more than once, as it effectively causes undefined or undesirable behaviour.

Issue

Currently, it seems, that matrix expects devices to only regard a single sync token usable at any time, replaced by the next_batch sync token when it is received in a sync response.

This is only weakly suggested in the spec, and is not really enforced by homeservers. I myself have made the mistake of sometimes (ab)using multiple sync tokens (per device) in complement (which means I probably need to give it a once-over and abstract these multi-synctoken usecases with devices), and I suspect some libraries or other are using this as well.

Especially to-device, and maybe soon the behaviour from MSC3391 (deleting account data, and getting notified of it), rely on a client never re-using a sync token (after already continuing to use its next_batch successor), and calling sync on two sync tokens linked to the same device, at the same time.

This could be a potential source for bugs and confusion. If the spec would be extra strict, and encourage homeserver implementations to add more-strict checks on sync tokens (i.e. "one sync token is alive at a time, per device"), it could remove a class of desynchronisation bugs (where things such as to-device, presence, or other "falls between the cracks"), and could possibly help make syncing easier to manage.

@ShadowJonathan ShadowJonathan added the clarification An area where the expected behaviour is understood, but the spec could do with being more explicit label Jan 26, 2023
@anoadragon453
Copy link
Member

I myself have made the mistake of sometimes (ab)using this in complement (which means I probably need to give it a once-over and abstract these multi-synctoken usecases with devices), and I suspect some libraries or other are using this as well.

I added a helper method to Complement to help managing multiple devices for a single user easier: matrix-org/complement#576

@richvdh
Copy link
Member

richvdh commented Jan 30, 2023

This came up while discussing MSC3391 in #synapse-dev, context here, where I mused about a possible edgecase of sync state being tracker per-device instead of being encoded in the sync token itself, and being told that this is not the case, as To-Device messages already clear themselves after a next-batch synctoken is used, effectively making it not possible or workable that a synctoken is used more than once, or that multiple synctokens per device are "live" at the same time.

I'm afraid I'm struggling to parse this four-line sentence. Please can you update the description with a clearer summary of what the issue is. Including context of where it was discussed is fine, but don't make me read through it all before even explaining what the problem is.

Devices and Sync tokens are not meant to exist on a 1:1 basis. A single device will see many sync tokens over the course of its lifetime.

@ShadowJonathan
Copy link
Contributor Author

@richvdh i've reworded that paragraph, and my reasoning.

Devices and Sync tokens are not meant to exist on a 1:1 basis. A single device will see many sync tokens over the course of its lifetime.

Alive sync tokens, maybe I should punctate that in some points.

@ShadowJonathan ShadowJonathan changed the title It's not explicitly clear that Devices and Sync Tokens are meant to exist on a 1-to-1 basis It's not explicitly clear that Devices and (Live) Sync Tokens are meant to exist on a 1-to-1 basis Jan 30, 2023
@richvdh
Copy link
Member

richvdh commented Jan 30, 2023

Thanks, the issue is much clearer now. However:

matrix expects devices to only regard a single sync token usable at any time,

I don't think this is correct.

  • Firstly and most simply: there must be at least two sync tokens usable at any time, since a server cannot be sure that a client has successfully received the response to /sync. Hence the server does not know if the client will repeat the previous /sync request with the old sync token, or make a new one with the new sync token.

  • Secondly, sync tokens can be passed as pagination tokens to endpoints other than /sync (most commonly, /messages) and there is no rule that says this must be the most recent /sync token. (There is no firmly-specified "maximum age" that the server must support, nor, as far as I can recall, is the expected behaviour if a token is "too old".)

  • Thirdly, servers must also support clients supplying older tokens to /sync. For example, every few minutes, element-web persists its state, including the latest sync token, to non-volatile storage. Obviously, in general, that means that when the client is restarted, it will start from a point that is not the most recent sync token.

    As you mention, to-device messages are a complication here, and yes clients have to make sure they are processed (and their results, or the messages themselves persisted to non-volatile storage) before making another /sync request with the returned next_batch.

@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented Jan 30, 2023

Argh, that is indeed not as simple as I saw it first, but this gives me a distinct impression that the platonic ideal (for sync tokens) feels in conflict with itself, if its used for both KOTH (King-of-the-hill) style one-token-lives functionality (e.g. to-device), but also as a more-universal pagination or resumption token (e.g. /messages, non-most-recent sync tokens in clients).

I feel like this is good feedback for sliding sync, or at least its E2EE extension, as right now I don't see an easy way out of this kind-of messiness, and instead this should inform future API design to not cause these kinds of complications.

Hence the server does not know if the client will repeat the previous /sync request with the old sync token, or make a new one with the new sync token.

I tried to clarify, but my definition of a "live" token in this case is "whatever token the client will be using for its next sync request, only replaced by the next_batch token in the server's sync response", i.e. the next "generation" of that token.

An additional caveat would be - in my opinion - that a device cant "backtrack" (to prevent something like to-device messages getting dropped server-side) several "generations" of tokens, and that it can only use the current or next generation of the sync token, progressing those in lock-step with the server.

@richvdh
Copy link
Member

richvdh commented Jan 30, 2023

Argh, that is indeed not as simple as I saw it first, but this gives me a distinct impression that the platonic ideal (for sync tokens) feels in conflict with itself, if its used for both KOTH (King-of-the-hill) style one-token-lives functionality (e.g. to-device), but also as a more-universal pagination or resumption token (e.g. /messages, non-most-recent sync tokens in clients).

<shrug> It works fine, even if it's not a platonic ideal. (And yes, I believe sliding sync is changing this.)

I'm now unclear if this issue is complaining that the spec is not clear about how the endpoint works (in which case, contributions welcome), or that you'd have designed the API differently (in which case, maybe wait for sliding sync)?

@ShadowJonathan
Copy link
Contributor Author

I think it's a bit of both, but I'll wait for sliding sync, as my problem with the spec is mostly together with worry that its continued mis-interpretation would result in new implementations making these mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification An area where the expected behaviour is understood, but the spec could do with being more explicit
Projects
None yet
Development

No branches or pull requests

3 participants