Deleting devices is incredibly resource-hungry #2704

richvdh · 2017-11-23T10:49:38Z

Last night a user tried to delete a few hundred devices. This caused a significant increase in CPU usage, both on the main synapse and synchrotrons. Replication essentially stopped working and the whole system had to be restarted

richvdh · 2017-11-23T11:20:36Z

At 23:52:16,357 or so, synchrotron 1's replication stream gets closed. When it reconnects, it is soon closed again with

2017-11-22 23:52:23,631 - synapse.replication.tcp.protocol - 290 - ERROR -  - [anon-UGuxs] Remote reported error: u'Failed to keep up'

It then goes into a period of 100% cpu usage, and never really recovers after that.

michaelkaye · 2019-05-17T14:33:58Z

This is still an issue - sending device list notification to a few hundred devices lead to the main synapse pausing stream replication for ~2-3mins. I think it's handled better than in Nov 2017 in that the system as a whole recovered from the issue and didn't sit spinning CPU, but we shouldn't be blocking replication for this sort of length of time.

richvdh · 2020-11-21T06:29:40Z

A user took out matrix.org this evening by apparently deleting 34000 devices. First of all it chomped database for two hours with lots of this sort of thing:

2020-11-21 02:25:48,160 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad0f} could not serialize access due to concurrent update
2020-11-21 02:25:48,457 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad19} could not serialize access due to concurrent update
2020-11-21 02:25:48,698 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad3d} could not serialize access due to concurrent update
2020-11-21 02:25:48,837 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad7e} could not serialize access due to concurrent update
2020-11-21 02:25:49,007 - synapse.storage.txn - 517 - WARNING - POST-45498929 - [TXN OPERROR] {user_delete_access_tokens-93dad9d} could not serialize access due to concurrent update

(there were 8 concurrent requests, all attempting to do the same thing)

... and then the whole process fell over, presumably when it stopped deleting access tokens and started telling other people in the room about the updates, or something:

richvdh · 2020-11-21T06:32:02Z

part of the problem here is that we let users accumulate tens of thousands of devices in the first place (#8263), but it feels like our handling here is suboptimal.

djmaze · 2022-02-19T21:34:40Z

Same here. The removal of about 1 thousand devices bogged down my server for about 1 hour and lead to recurring client timeouts.

To me this sounds like a venerable DoS attack vector, especially for servers with open registration. I really think this should be fixed (probably by setting limits as proposed in #8263).

babolivier · 2022-02-21T12:17:11Z

Related: #7721

richvdh mentioned this issue Jul 9, 2019

repeated calls to /login kills performance #5647

Open

clokep added the A-Performance Performance, both client-facing and admin-facing label Nov 23, 2020

matrix-org deleted a comment from richvdh Jan 13, 2022

H-Shay added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. and removed A-Performance Performance, both client-facing and admin-facing labels Jan 13, 2022

matrixbot mentioned this issue Dec 21, 2023

Deleting devices is *incredibly* resource-hungry element-hq/synapse#2704

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deleting devices is incredibly resource-hungry #2704

Deleting devices is incredibly resource-hungry #2704

richvdh commented Nov 23, 2017

richvdh commented Nov 23, 2017

michaelkaye commented May 17, 2019

richvdh commented Nov 21, 2020

richvdh commented Nov 21, 2020

djmaze commented Feb 19, 2022 •

edited

Loading

babolivier commented Feb 21, 2022

Deleting devices is *incredibly* resource-hungry #2704

Deleting devices is *incredibly* resource-hungry #2704

Comments

richvdh commented Nov 23, 2017

richvdh commented Nov 23, 2017

michaelkaye commented May 17, 2019

richvdh commented Nov 21, 2020

richvdh commented Nov 21, 2020

djmaze commented Feb 19, 2022 • edited Loading

babolivier commented Feb 21, 2022

Deleting devices is incredibly resource-hungry #2704

Deleting devices is incredibly resource-hungry #2704

djmaze commented Feb 19, 2022 •

edited

Loading