Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2190: Allow appservice bots to use /sync #2190

Open
wants to merge 2 commits into
base: old_master
Choose a base branch
from

Conversation

tulir
Copy link
Member

@tulir tulir commented Jul 24, 2019

Rendered

Signed-off-by: Tulir Asokan <[email protected]>

Copy link
Member

@turt2live turt2live left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a common sense approach to me

Copy link
Contributor

@Half-Shot Half-Shot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think this proposal probably needs someone to weight in on why the feature was disabled in the first place, from the synapse team.

@ara4n
Copy link
Member

ara4n commented Jul 24, 2019

afaik it was disabled only because nobody used it and it bitrotted, so rather than fix something nobody used, it was removed, especially as it made /sync even more complicated. i am well up for restoring it though if someone will use it (it was my idea in the first place, iirc)

@turt2live
Copy link
Member

There's lots of green:

@mscbot fcp merge

@mscbot
Copy link
Collaborator

mscbot commented Aug 12, 2019

This FCP proposal has been cancelled by #2190 (comment).

Team member @mscbot has proposed to merge this. The next step is review by the rest of the tagged people:

Concerns:

Once at least 75% of reviewers approve (and there are no outstanding concerns), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for information about what commands tagged team members can give me.

@mscbot mscbot added proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period. disposition-merge labels Aug 12, 2019
@erikjohnston
Copy link
Member

erikjohnston commented Aug 12, 2019

I think this was disabled because it had a tendency to take out synapse, especially when the freenode bridge randomly did it. It may be less of an issue if we only fetch events for the AS virtual user, but even then I would imagine that if e.g. the freenode bridge did this then it'd take out one of our synchrotrons. This originally manifested as the fact that bridges would occasionally and randomly decide to call /sync and take out the homeserver, (which was doubly annoying as we didn't know that those calls were so heavy weight until they happened).

To flip this on its head: why do we want this? Would this be rendered useless if sometime in the future we mandated limits to sync (e.g. forcing lazy loading, or only returning the most recent ten rooms, etc)? Should we extend the AS APIs instead? Do we need to better document how to use the existing helper APIs?

In general I'd prefer when we created new APIs that they were designed to be "limited" from the get go, i.e. that they can be implemented such that they will always return at most N bytes of data, thus allowing us to avoid edge case "killer requests" that easily takes out the service. That clearly hasn't happened with /sync, but I'd rather not have ASes relying on being able to get all data at once.

That's not to say this shouldn't happen, but if we do implement /sync then I'd either a) want good reason to or b) make it clear that AS authors shouldn't necessarily rely on being able to get all their state at once via /sync

@KitsuneRal
Copy link
Member

AIUI, the proposal boils down to /sync and /events resolving each bot's as_token to the respective "@sender_localpart:example.org" by default instead of forcing to pass user_id=@sender_localpart:example.org. Unless I'm mistaking the freenode bridge was taking homeserver out because of timeline calculation for a huge number of users matching the namespace. The proposal explicitly rules this out, proposing to keep the same /sync semantics for users (authenticated by access_token) and bots (authenticated by as_token).

@Half-Shot
Copy link
Contributor

@KitsuneRal tbh even if you only matched sender_localpart, the freenode bridge would match something like 10k rooms. Not that I think it's a huge concern, there is no reason for bridges to call /sync during normal operation. The only purpose I can see behind it is for debugging (and EDUs, perhaps. Though that's being solved separately)

@anoadragon453
Copy link
Member

It's worth mentioning that even though Synapse's old functionality was to return /sync data of all appservice users when the appservice bot requested /sync, the Freenode bot is still in every single bridged Freenode room ever created, which would definitely be a lot of data.

While it would be nice to just extend the AS API to help here, we have to be careful to not forget about it as we continue to put more and more features down /sync and nowhere else.

@turt2live
Copy link
Member

honestly if the freenode bridge hits /sync, we've done something terribly wrong. Can we stop optimizing the proposal for a bridge that won't use sync?

@erikjohnston
Copy link
Member

honestly if the freenode bridge hits /sync, we've done something terribly wrong. Can we stop optimizing the proposal for a bridge that won't use sync?

There's nothing in this MSC that says that bridges shalt not use this endpoint? If bridges are meant to use it then we need to worry about big bridges, if bridges aren't meant to use it then we should figure out what it is meant to be used for.

This feature was removed because it was causing major service outages, I think for this to be re-added we need to have some guarantees that it won't cause similar issues.

@turt2live
Copy link
Member

There shouldn't be anything in the MSC/spec that says things should(n't) use a particular endpoint. I don't think it's realistic for us to solve operational problems in the spec. From an ops perspective, I wouldn't recommend that a server owner deploy a popular bridge which uses /sync for performance reasons (unless their server implementation is optimized for it, somehow). Similarly, I wouldn't expect that a bridge even touch /sync if it knows it is the wrong choice for what they're doing.

It's a strange restriction on appservices: the sender_localpart can't sync, but a user in an equal number of rooms which lives in the users namespace can? Do we just need another MSC for an error code on /sync to say "nah, your account is too large"?

@erikjohnston
Copy link
Member

Operational issues are caused by the fact the spec says we must support unbounded data fetching requests, and so the spec can fix those operational issues by not mandating we support those APIs. I don't think it is unreasonable to take into account implementation and operational concerns when considering what to put in the spec?

Again, I would quite like to hear use cases for /sync. It seems unhelpful to have APIs that only work if the appservice are "small" enough? Does that mean if a bridge/bot you're hosting becomes popular it suddenly stops working because it was relying on /sync?

@turt2live
Copy link
Member

One use case would be go-neb: it doesn't use appservice transactions but fits really nicely for reserving a namespace of users. Because the spec says that the sender_localpart is required and should not be in the namespace, the registration should be reserving some arbitrary ID and some namespace of users for it to use. Although it doesn't make use of it now, it doesn't seem unreasonable that the sender_localpart user become a management interface for the bots running in the namespace. For ease of development, that would be a great time to use sync as it is currently used for everything else (the management bot becomes a plug-and-play bot like the rest of them).

Admittedly, this is not a strong use case. If we're adamant that appservices should not be syncing then we should disable sync for all the users in the namespaces, not just the sender_localpart. It's a silly restriction to block one user but not the others which can easily be in just as many rooms.

@erikjohnston
Copy link
Member

One use case would be go-neb: it doesn't use appservice transactions but fits really nicely for reserving a namespace of users. Because the spec says that the sender_localpart is required and should not be in the namespace, the registration should be reserving some arbitrary ID and some namespace of users for it to use. Although it doesn't make use of it now, it doesn't seem unreasonable that the sender_localpart user become a management interface for the bots running in the namespace. For ease of development, that would be a great time to use sync as it is currently used for everything else (the management bot becomes a plug-and-play bot like the rest of them).

I don't really follow why the management user ID would want to call sync in that case?

Admittedly, this is not a strong use case. If we're adamant that appservices should not be syncing then we should disable sync for all the users in the namespaces, not just the sender_localpart. It's a silly restriction to block one user but not the others which can easily be in just as many rooms.

To a certain extent I agree, though I think its more likely that the sender_localpart will be in a lot of rooms compared with their ghosts.

@tulir
Copy link
Member Author

tulir commented Aug 28, 2019

If bridges are meant to use it then we need to worry about big bridges, if bridges aren't meant to use it then we should figure out what it is meant to be used for.

Bridges don't need to be big. This is meant as a first step for user-hostable appservices

@erikjohnston
Copy link
Member

Bridges don't need to be big. This is meant as a first step for user-hostable appservices

Absolutely true, but they do often have a tendency to grow big after a while.

What do you mean by user-hostable appservices?

@tulir
Copy link
Member Author

tulir commented Aug 28, 2019

What do you mean by user-hostable appservices?

Appservices hosted on the user's machine instead of on the server. When this is combined with future things like e2ee and some kind of dynamic appservice registration, users could have their own bridges without trusting the server. Even without e2ee, it could be used to avoid being banned from a hostile platform due to many users using the same bridge.

@ericmigi
Copy link

any chance we could get this merged @ara4n?

@ara4n
Copy link
Member

ara4n commented Mar 29, 2020

@ericmigi sorry - missed this. as per our chat, it sounds like you're not blocked on it currently, but it's still on our radar as part of upcoming catchups on our MSC backlog.

@tulir
Copy link
Member Author

tulir commented Mar 29, 2020

The proxy I made for dynamic appservice allocation also removes the immediate need for this change, but it would still be nice to have so other people could use end-to-bridge encryption without having to use the proxy or other hacks.

@erikjohnston
Copy link
Member

@mscbot concern whether this is something we should be encouraging ─ #2190 (comment)

@mscbot mscbot added the unresolved-concerns This proposal has at least one outstanding concern label Mar 23, 2021
@turt2live
Copy link
Member

so we've discussed this as a spec core team and have come to the conclusion that Matrix isn't quite ready for this change just yet. Appservices tend to have extremely large accounts for the sender_localpart user, and regardless of server implementation that means /sync is potentially harmful to a server's survival.

While the spec doesn't typically concern itself with the performance of endpoints to drive whether or not they are allowed, in this case it's clear that the appservice spec is trying to give a different route for appservices to use. We think that the immediate use case would be solved by other MSCs (#2409 for example), and that /sync needs more work before we can consider large appservice accounts being able to sync. There are workarounds to this spec limitation currently possible, which allows appservices to sync if they really want to.

For sync improvements, we're looking at something like paginated sync (https://github.com/matrix-org/matrix-doc/issues/3076 - I thought we had an issue for this already, but apparently not) where we can reduce the impact of large accounts syncing. We're less concerned with users in an appservice namespace syncing because they are typically in fewer rooms, though we acknowledge the inconsistency here.

Process-wise, this means we're changing tracks to postpone this MSC until after "sync improvements" are made, where we can re-evaluate this MSC's purpose in that world.

@mscbot fcp cancel

@mscbot mscbot added proposal-in-review and removed disposition-merge proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period. labels Mar 25, 2021
@turt2live
Copy link
Member

@mscbot fcp postpone

@mscbot
Copy link
Collaborator

mscbot commented Mar 25, 2021

Team member @turt2live has proposed to postpone this. The next step is review by the rest of the tagged people:

Once at least 75% of reviewers approve (and there are no outstanding concerns), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for information about what commands tagged team members can give me.

@mscbot mscbot added disposition-postpone An MSC that has been postponed proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period. and removed proposal-in-review labels Mar 25, 2021
@turt2live turt2live removed the unresolved-concerns This proposal has at least one outstanding concern label Mar 25, 2021
@erikjohnston erikjohnston removed their request for review March 25, 2021 15:44
@uhoreg uhoreg removed their request for review March 25, 2021 15:49
@Half-Shot Half-Shot self-requested a review March 28, 2021 00:06
@mscbot
Copy link
Collaborator

mscbot commented Apr 6, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@mscbot mscbot added final-comment-period This MSC has entered a final comment period in interest to approval, postpone, or delete in 5 days. and removed proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period. labels Apr 6, 2021
@mscbot
Copy link
Collaborator

mscbot commented Apr 11, 2021

The final comment period, with a disposition to postpone, as per the review above, is now complete.

@mscbot mscbot added finished-final-comment-period and removed disposition-postpone An MSC that has been postponed final-comment-period This MSC has entered a final comment period in interest to approval, postpone, or delete in 5 days. labels Apr 11, 2021
@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021
@Half-Shot Half-Shot removed their request for review July 26, 2022 23:53
@dbkr dbkr removed their request for review November 3, 2022 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal proposal-postponed
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

10 participants