-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Synapse won't start after upgrade attempt from 1.59.1 to 1.62: slow database migration means we get timed out by systemd #13193
Comments
How are you installing synapse? Docker image? Matrix.org debian package? |
Matrix.org debian package : matrix-synapse-py3 package |
I feel like the timeout happen before a work running on the db end If I guess wrong and if ypu're thinking about an other problem and have a solution to share, I'm eager to hear it. |
If I put DEBUG instead of INFO in /etc/matrix-synapse/log.yaml here is the loop I get :
should I stop synapse and run these three sql command myself |
or would it be possible to increase the time given to synapse to start before timeout? |
Yes. Try running
You could edit the systemd config, see e.g. https://unix.stackexchange.com/a/276785/190092. But I'm not sure how that interacts with the matrix.org debian packages---on future upgrades, you might be asked to resolve a conflict between the packaged systemd unit file and the edited one. I'd try to update the DB first. |
@Thatoo please let me know if you manage to update your database and start synapse afterwards. |
We decided to restore a backup to v 1.59.1 and it works |
In case it helps anyone else, I think I had the same problem on update to 1.63.0 (not sure the previous version, perhaps 1.61), some of the log:
Running |
Ooops, thanks. I'ved edited #13193 (comment) to fix this. |
This happened to me too, but in my case it was because a |
Another report of a similar situation in synapse admins today. I could have sworn we thought that we tell systemd that we're READY before starting the migrations... but maybe I misremember. I wonder if we can use EXTEND_TIMEOUT_USEC during a migration to communicate that we're still alive and making forward progress with the migration. |
definitely not: we make sure the database is up to date before starting the http listeners, and we want to start the http listeners before we tell systemd we're ready.
possibly, but note that the problem here is a single big migration (specifically, The postgres equivalent makes a lot of effort to make sure that the migration happens quickly (by pushing most of the hard work to the background), but I didn't really bother for sqlite, because if you've got a big enough sqlite database to notice the delay, you're doing it wrong. So really I think this is yet another case of: we need to do more to discourage the use of sqlite in production (cf #2317, #2917) |
Happened again for me and had to run the database update manually. I think this issue should be re-opened (with a more general title) until a proper fix is in place or documented workaround (or just output in synapse log if something is expected to take a while and may need manual intervention?). |
Description
Synapse keep restarting.
journalctl -xe
indicatesThe unit matrix-synapse.service has entered the 'failed' state with result 'timeout'.
tail -f -n 10 /var/log/matrix-synapse/homeserver.log
indicates as a loop
Steps to reproduce
after update from 1.59.1 to 1.62, synapse wpn't restart
Homeserver
matrix.defis.info
Synapse Version
1.62
Installation Method
No response
Platform
debian 10 proxmox container behind nginx reverse proxy
Relevant log output
Anything else that would be useful to know?
No response
The text was updated successfully, but these errors were encountered: