-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Deleting devices is *incredibly* resource-hungry #2704
Comments
At 23:52:16,357 or so, synchrotron 1's replication stream gets closed. When it reconnects, it is soon closed again with
It then goes into a period of 100% cpu usage, and never really recovers after that. |
This is still an issue - sending device list notification to a few hundred devices lead to the main synapse pausing stream replication for ~2-3mins. I think it's handled better than in Nov 2017 in that the system as a whole recovered from the issue and didn't sit spinning CPU, but we shouldn't be blocking replication for this sort of length of time. |
A user took out matrix.org this evening by apparently deleting 34000 devices. First of all it chomped database for two hours with lots of this sort of thing:
(there were 8 concurrent requests, all attempting to do the same thing) ... and then the whole process fell over, presumably when it stopped deleting access tokens and started telling other people in the room about the updates, or something: |
part of the problem here is that we let users accumulate tens of thousands of devices in the first place (#8263), but it feels like our handling here is suboptimal. |
Same here. The removal of about 1 thousand devices bogged down my server for about 1 hour and lead to recurring client timeouts. To me this sounds like a venerable DoS attack vector, especially for servers with open registration. I really think this should be fixed (probably by setting limits as proposed in #8263). |
Related: #7721 |
Last night a user tried to delete a few hundred devices. This caused a significant increase in CPU usage, both on the main synapse and synchrotrons. Replication essentially stopped working and the whole system had to be restarted
The text was updated successfully, but these errors were encountered: