Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Deleting devices is *incredibly* resource-hungry #2704

Open
richvdh opened this issue Nov 23, 2017 · 6 comments
Open

Deleting devices is *incredibly* resource-hungry #2704

richvdh opened this issue Nov 23, 2017 · 6 comments
Labels
T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Nov 23, 2017

Last night a user tried to delete a few hundred devices. This caused a significant increase in CPU usage, both on the main synapse and synchrotrons. Replication essentially stopped working and the whole system had to be restarted

@richvdh
Copy link
Member Author

richvdh commented Nov 23, 2017

At 23:52:16,357 or so, synchrotron 1's replication stream gets closed. When it reconnects, it is soon closed again with

2017-11-22 23:52:23,631 - synapse.replication.tcp.protocol - 290 - ERROR -  - [anon-UGuxs] Remote reported error: u'Failed to keep up'

It then goes into a period of 100% cpu usage, and never really recovers after that.

@michaelkaye
Copy link
Contributor

This is still an issue - sending device list notification to a few hundred devices lead to the main synapse pausing stream replication for ~2-3mins. I think it's handled better than in Nov 2017 in that the system as a whole recovered from the issue and didn't sit spinning CPU, but we shouldn't be blocking replication for this sort of length of time.

@richvdh
Copy link
Member Author

richvdh commented Nov 21, 2020

A user took out matrix.org this evening by apparently deleting 34000 devices. First of all it chomped database for two hours with lots of this sort of thing:

2020-11-21 02:25:48,160 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad0f} could not serialize access due to concurrent update
2020-11-21 02:25:48,457 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad19} could not serialize access due to concurrent update
2020-11-21 02:25:48,698 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad3d} could not serialize access due to concurrent update
2020-11-21 02:25:48,837 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad7e} could not serialize access due to concurrent update
2020-11-21 02:25:49,007 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad9d} could not serialize access due to concurrent update

(there were 8 concurrent requests, all attempting to do the same thing)

image

... and then the whole process fell over, presumably when it stopped deleting access tokens and started telling other people in the room about the updates, or something:

image

@richvdh
Copy link
Member Author

richvdh commented Nov 21, 2020

part of the problem here is that we let users accumulate tens of thousands of devices in the first place (#8263), but it feels like our handling here is suboptimal.

@clokep clokep added the A-Performance Performance, both client-facing and admin-facing label Nov 23, 2020
@matrix-org matrix-org deleted a comment from richvdh Jan 13, 2022
@H-Shay H-Shay added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. and removed A-Performance Performance, both client-facing and admin-facing labels Jan 13, 2022
@djmaze
Copy link

djmaze commented Feb 19, 2022

Same here. The removal of about 1 thousand devices bogged down my server for about 1 hour and lead to recurring client timeouts.

To me this sounds like a venerable DoS attack vector, especially for servers with open registration. I really think this should be fixed (probably by setting limits as proposed in #8263).

@babolivier
Copy link
Contributor

Related: #7721

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

6 participants