Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

after upgrade to debian 12 replication for workers is broken #16093

Closed
cremesk opened this issue Aug 9, 2023 · 8 comments
Closed

after upgrade to debian 12 replication for workers is broken #16093

cremesk opened this issue Aug 9, 2023 · 8 comments

Comments

@cremesk
Copy link
Contributor

cremesk commented Aug 9, 2023

Description

hi,

i have a strange Problem today.
I upgrade my matrix-synapse VM from Debian 11 to Debian 12.

Before i upgrade everything works fine!

But after the upgrade i see some strange logs like:

see below

My first guess was that the server time is not right. Since I checked but everything was OK and the time was synchronized correctly.

The second idea was that the Redis connection does not work. However, Redis works without errors and Synapse has also been diligently using the keyspace.

As a next step, I have now commented out the worker setup so that synapse works without workers. Everything seems to be working there too.

Homeserver

envs.net

Synapse Version

1.89.0 and 1.90.0rc1
(i already had the problem with 1.89.0 then I upgraded to 1.90.0rc1 and the logs continued to look like this.)

Installation Method

Debian packages from packages.matrix.org

Database

singe pgsql 13

Workers

Multiple workers

Platform

Debian 12 KVM

Configuration

here my complet config: https://git.envs.net/envs/matrix-conf/src/branch/master/etc/matrix-synapse
i double checked the instance_map, stream_writers and worker configs.

Relevant log output

2023-08-09 15:33:10,214 - synapse.replication.tcp.handler - 733 - WARNING - GET-7981 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:10,278 - synapse.replication.tcp.handler - 733 - WARNING - GET-7984 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:10,346 - synapse.replication.tcp.handler - 733 - WARNING - GET-7987 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:10,480 - synapse.replication.tcp.handler - 733 - WARNING - GET-7993 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:11,025 - synapse.replication.tcp.handler - 733 - WARNING - GET-8020 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:11,594 - synapse.replication.tcp.handler - 733 - WARNING - GET-8048 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:11,756 - synapse.replication.tcp.handler - 733 - WARNING - GET-8061 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:12,119 - synapse.replication.tcp.handler - 733 - WARNING - GET-8080 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:12,287 - synapse.replication.tcp.handler - 733 - WARNING - GET-8093 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:12,330 - synapse.replication.tcp.handler - 733 - WARNING - GET-8095 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:12,766 - synapse.replication.tcp.client - 331 - ERROR - POST-7878 - Timed out waiting for repl stream 'backfill' to reach 77624896 (generic_worker4); currently at: 77620991
2023-08-09 15:33:12,770 - synapse.replication.tcp.handler - 733 - WARNING - GET-8117 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:12,906 - synapse.replication.tcp.client - 331 - ERROR - POST-7885 - Timed out waiting for repl stream 'backfill' to reach 77624896 (generic_worker4); currently at: 77620991
2023-08-09 15:33:12,938 - synapse.replication.tcp.client - 331 - ERROR - POST-7887 - Timed out waiting for repl stream 'caches' to reach 49072379 (generic_worker5); currently at: 49072351
2023-08-09 15:33:13,164 - synapse.replication.tcp.client - 331 - ERROR - POST-7896 - Timed out waiting for repl stream 'caches' to reach 49072379 (generic_worker5); currently at: 49072351
2023-08-09 15:33:13,434 - synapse.replication.tcp.handler - 733 - WARNING - GET-8150 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:13,630 - synapse.replication.tcp.client - 331 - ERROR - POST-7915 - Timed out waiting for repl stream 'caches' to reach 49072379 (generic_worker5); currently at: 49072351
2023-08-09 15:33:14,042 - synapse.replication.tcp.handler - 733 - WARNING - GET-8182 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:14,234 - synapse.replication.tcp.client - 331 - ERROR - PUT-7433 - Timed out waiting for repl stream 'receipts' to reach 158430400 (additional1); currently at: 158428466
2023-08-09 15:33:14,290 - synapse.replication.tcp.client - 331 - ERROR - PUT-7236 - Timed out waiting for repl stream 'account_data' to reach 21971997 (additional1); currently at: 21971866
2023-08-09 15:33:14,293 - synapse.replication.tcp.client - 331 - ERROR - PUT-7236 - Timed out waiting for repl stream 'account_data' to reach 21971997 (additional1); currently at: 21971866
2023-08-09 15:33:14,836 - synapse.replication.tcp.client - 331 - ERROR - POST-7966 - Timed out waiting for repl stream 'backfill' to reach 77624896 (generic_worker4); currently at: 77620991
2023-08-09 15:33:15,208 - synapse.replication.tcp.handler - 733 - WARNING - GET-8245 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:16,312 - synapse.replication.tcp.client - 331 - ERROR - _process_incoming_pdus_in_room_inner-23-$3eVnPoZJ61dwXPvbmOD3QQ2vmv_t3nvcRM7lT0pnbRE - Timed out waiting for repl stream 'events' to reach 175128345 (generic_worker2); currently at: 175127539
2023-08-09 15:33:16,497 - synapse.replication.tcp.handler - 733 - WARNING - GET-8307 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:17,713 - synapse.replication.tcp.handler - 733 - WARNING - GET-8370 - Dropping command as not connected: 'USER_IP'
2023-08-09 15:33:17,716 - synapse.replication.tcp.client - 331 - ERROR - PUT-7496 - Timed out waiting for repl stream 'typing' to reach 338 (additional1); currently at: 0
2023-08-09 15:33:17,766 - synapse.replication.tcp.client - 331 - ERROR - POST-7878 - Timed out waiting for repl stream 'caches' to reach 49072368 (generic_worker4); currently at: 49072351
2023-08-09 15:33:17,906 - synapse.replication.tcp.client - 331 - ERROR - POST-7885 - Timed out waiting for repl stream 'caches' to reach 49072368 (generic_worker4); currently at: 49072351
2023-08-09 15:33:17,939 - synapse.replication.tcp.client - 331 - ERROR - POST-7887 - Timed out waiting for repl stream 'events' to reach 175128330 (generic_worker5); currently at: 175127514
2023-08-09 15:33:17,948 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-291 - Dropping command as not connected: 'LOCK_RELEASED'
2023-08-09 15:33:18,165 - synapse.replication.tcp.client - 331 - ERROR - POST-7896 - Timed out waiting for repl stream 'events' to reach 175128330 (generic_worker5); currently at: 175127514
2023-08-09 15:33:18,179 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-292 - Dropping command as not connected: 'LOCK_RELEASED'
2023-08-09 15:33:18,355 - synapse.replication.tcp.client - 331 - ERROR - PUT-7697 - Timed out waiting for repl stream 'typing' to reach 338 (additional1); currently at: 0
2023-08-09 15:33:18,452 - synapse.replication.tcp.client - 331 - ERROR - POST-8152 - Timed out waiting for repl stream 'backfill' to reach 77624896 (generic_worker4); currently at: 77620991
2023-08-09 15:33:18,631 - synapse.replication.tcp.client - 331 - ERROR - POST-7915 - Timed out waiting for repl stream 'events' to reach 175128330 (generic_worker5); currently at: 175127514
2023-08-09 15:33:18,642 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-293 - Dropping command as not connected: 'LOCK_RELEASED'
2023-08-09 15:33:19,234 - synapse.replication.tcp.client - 331 - ERROR - PUT-7433 - Timed out waiting for repl stream 'to_device' to reach 10441913 (additional1); currently at: 10441702
2023-08-09 15:33:19,837 - synapse.replication.tcp.client - 331 - ERROR - POST-7966 - Timed out waiting for repl stream 'caches' to reach 49072368 (generic_worker4); currently at: 49072351
2023-08-09 15:33:20,319 - synapse.replication.tcp.client - 331 - ERROR - POST-8252 - Timed out waiting for repl stream 'backfill' to reach 77624896 (generic_worker4); currently at: 77620991
2023-08-09 15:33:20,901 - synapse.replication.tcp.client - 331 - ERROR - PUT-7634 - Timed out waiting for repl stream 'typing' to reach 344 (additional1); currently at: 0
2023-08-09 15:33:21,273 - synapse.replication.tcp.client - 331 - ERROR - PUT-7821 - Timed out waiting for repl stream 'typing' to reach 346 (additional1); currently at: 0
2023-08-09 15:33:21,273 - synapse.replication.tcp.client - 331 - ERROR - PUT-7821 - Timed out waiting for repl stream 'typing' to reach 347 (additional1); currently at: 0
2023-08-09 15:33:21,278 - synapse.replication.tcp.client - 331 - ERROR - PUT-7821 - Timed out waiting for repl stream 'typing' to reach 347 (additional1); currently at: 0
2023-08-09 15:33:21,313 - synapse.replication.tcp.client - 331 - ERROR - _process_incoming_pdus_in_room_inner-23-$3eVnPoZJ61dwXPvbmOD3QQ2vmv_t3nvcRM7lT0pnbRE - Timed out waiting for repl stream 'caches' to reach 49072396 (generic_worker2); currently at: 49072351
2023-08-09 15:33:21,313 - synapse.replication.tcp.handler - 733 - WARNING - _process_incoming_pdus_in_room_inner-23-$3eVnPoZJ61dwXPvbmOD3QQ2vmv_t3nvcRM7lT0pnbRE - Dropping command as not connected: 'LOCK_RELEASED'
@cremesk
Copy link
Contributor Author

cremesk commented Aug 10, 2023

i check some more today..

i can see in redis some connection from synapse to keyspace 0:

$ redis-cli client list
id=96 addr=127.0.0.1:33322 laddr=127.0.0.1:6379 fd=37 name= age=125 idle=5 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=97 addr=127.0.0.1:33326 laddr=127.0.0.1:6379 fd=38 name= age=125 idle=5 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=86 addr=127.0.0.1:33200 laddr=127.0.0.1:6379 fd=27 name= age=125 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=87 addr=127.0.0.1:33204 laddr=127.0.0.1:6379 fd=28 name= age=125 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=88 addr=127.0.0.1:33206 laddr=127.0.0.1:6379 fd=29 name= age=125 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=89 addr=127.0.0.1:33212 laddr=127.0.0.1:6379 fd=30 name= age=125 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=90 addr=127.0.0.1:33216 laddr=127.0.0.1:6379 fd=31 name= age=125 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=91 addr=127.0.0.1:33218 laddr=127.0.0.1:6379 fd=32 name= age=125 idle=5 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=92 addr=127.0.0.1:33236 laddr=127.0.0.1:6379 fd=33 name= age=125 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=93 addr=127.0.0.1:33248 laddr=127.0.0.1:6379 fd=34 name= age=125 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=94 addr=127.0.0.1:33264 laddr=127.0.0.1:6379 fd=35 name= age=125 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=95 addr=127.0.0.1:33270 laddr=127.0.0.1:6379 fd=36 name= age=125 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=81 addr=127.0.0.1:33176 laddr=127.0.0.1:6379 fd=22 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=82 addr=127.0.0.1:33180 laddr=127.0.0.1:6379 fd=23 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=83 addr=127.0.0.1:33182 laddr=127.0.0.1:6379 fd=24 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=84 addr=127.0.0.1:33184 laddr=127.0.0.1:6379 fd=25 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=85 addr=127.0.0.1:33186 laddr=127.0.0.1:6379 fd=26 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=72 addr=127.0.0.1:60854 laddr=127.0.0.1:6379 fd=9 name= age=127 idle=7 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=73 addr=127.0.0.1:60856 laddr=127.0.0.1:6379 fd=14 name= age=127 idle=7 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=100 addr=127.0.0.1:39324 laddr=127.0.0.1:6379 fd=11 name= age=74 idle=68 flags=N db=15 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=set user=default redir=-1 resp=3
id=101 addr=127.0.0.1:39328 laddr=127.0.0.1:6379 fd=40 name= age=74 idle=66 flags=N db=15 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=set user=default redir=-1 resp=3
id=102 addr=127.0.0.1:39336 laddr=127.0.0.1:6379 fd=41 name= age=74 idle=1 flags=N db=15 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=2048 rbp=1024 obl=0 oll=0 omem=0 tot-mem=23296 events=r cmd=ping user=default redir=-1 resp=3
id=74 addr=127.0.0.1:33112 laddr=127.0.0.1:6379 fd=15 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=75 addr=127.0.0.1:33114 laddr=127.0.0.1:6379 fd=16 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=76 addr=127.0.0.1:33134 laddr=127.0.0.1:6379 fd=17 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=45 addr=127.0.0.1:49054 laddr=127.0.0.1:6379 fd=8 name= age=1924 idle=74 flags=N db=15 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=set user=default redir=-1 resp=3
id=40 addr=127.0.0.1:57108 laddr=127.0.0.1:6379 fd=7 name= age=7647 idle=74 flags=N db=15 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=set user=default redir=-1 resp=3
id=41 addr=127.0.0.1:57110 laddr=127.0.0.1:6379 fd=10 name= age=7647 idle=74 flags=N db=15 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=set user=default redir=-1 resp=3
id=43 addr=127.0.0.1:57116 laddr=127.0.0.1:6379 fd=12 name= age=7647 idle=68 flags=N db=15 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=set user=default redir=-1 resp=3
id=44 addr=127.0.0.1:57118 laddr=127.0.0.1:6379 fd=13 name= age=7647 idle=74 flags=N db=15 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=set user=default redir=-1 resp=3
id=77 addr=127.0.0.1:33140 laddr=127.0.0.1:6379 fd=18 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=78 addr=127.0.0.1:33164 laddr=127.0.0.1:6379 fd=19 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=79 addr=127.0.0.1:33168 laddr=127.0.0.1:6379 fd=20 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=80 addr=127.0.0.1:33170 laddr=127.0.0.1:6379 fd=21 name= age=126 idle=6 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1800 events=r cmd=ping user=default redir=-1 resp=2
id=104 addr=127.0.0.1:50000 laddr=127.0.0.1:6379 fd=39 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=26 qbuf-free=20448 argv-mem=10 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37658 events=r cmd=client|list user=default redir=-1 resp=2

(keyspace 15 is the matrix-media-repo)

also the redis monitor shows something:

1691655260.962363 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655261.462137 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655261.961874 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655262.461745 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655262.961479 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655263.462241 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655263.961612 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655264.462413 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655264.962409 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655265.462231 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655265.658805 [15 127.0.0.1:39336] "get" "8f9c2ba670a529988be06731b1073b76750674c7d257afd9ce9be829c2e2fba6"                                                                                                                                                                    
1691655265.961470 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655266.462369 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655266.962188 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655267.462457 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655267.962277 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655268.381523 [0 127.0.0.1:60854] "SET" "cache_v1:event_to_prev_state_group:$AGyA_CHldnE4VRRrkwXtU7TjJsxT2h2SErwjdv_UkwU" "21650425" "PX" "3600000"                                                                                                      
1691655268.462082 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655268.961850 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655269.138676 [15 127.0.0.1:39336] "get" "0c4758baa765a38bf93bd63fd022015e9a58755b3347dc75954a100ff764d6a3"                                                                                                                                                                    
1691655269.462457 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655269.962305 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655270.462149 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655270.602054 [0 127.0.0.1:60854] "SET" "cache_v1:event_to_prev_state_group:$HuRWX7-yuVAeTNIawi5-WfRzGl6oOqce4RdPIVSYVI8" "21694533" "PX" "3600000"                                                                                                     
1691655270.961878 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655271.461621 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655271.962424 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655272.462238 [15 127.0.0.1:39336] "ping"
1691655272.962180 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655273.461933 [15 127.0.0.1:39336] "ping"
1691655273.961682 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655274.461456 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655274.578997 [0 127.0.0.1:60854] "SET" "cache_v1:event_to_prev_state_group:$tfILj7nap3c-1j7MDDPXCSmjP2CwhsSppoKIVNZWC9E" "20142916" "PX" "3600000"                                                                                                                            
1691655274.581211 [0 127.0.0.1:60854] "SET" "cache_v1:get_joined_hosts:20142916" "[\"matrix.org\",\"envs.net\"]" "PX" "3600000"                                                                                                                                                    
1691655274.962291 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655275.462071 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655275.962380 [15 127.0.0.1:39336] "ping"
1691655276.462158 [15 127.0.0.1:39336] "ping"                                                                                            
1691655276.961916 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655277.462449 [15 127.0.0.1:39336] "ping"
1691655277.962222 [15 127.0.0.1:39336] "ping"   
1691655278.462022 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655278.961761 [15 127.0.0.1:39336] "ping"
1691655279.461973 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655279.961777 [15 127.0.0.1:39336] "ping"                                                                                                                                                                                                                                      
1691655280.461551 [15 127.0.0.1:39336] "ping"

but it feels like i see no RDATA commands in redis. also the grafana graph shows no RDATA:
image

(i tried and tested several things.. therefore, quite a lot of restarts are also recorded.)

@cremesk cremesk changed the title after upgrade to debian 12 worker setup looks broken after upgrade to debian 12 replication for workers is broken Aug 10, 2023
@cremesk
Copy link
Contributor Author

cremesk commented Aug 10, 2023

the log part in debug mode looks something like:

2023-08-10 09:36:05,159 - synapse.storage.SQL - 454 - DEBUG - persist_events-26 - [SQL values] {persist_events-e5} [['$169166013542oXcta:t2l.io']]                                                                                                                                 
2023-08-10 09:36:05,160 - synapse.storage.SQL - 475 - DEBUG - persist_events-26 - [SQL time] {persist_events-e5} 0.000331 sec                                                                                                                                                      
2023-08-10 09:36:05,160 - synapse.storage.txn - 842 - DEBUG - persist_events-26 - [TXN END] {persist_events-e5} 0.005365 sec                                                                                                                                                       
2023-08-10 09:36:05,160 - synapse.storage.txn - 738 - DEBUG - persist_events-26 - [TXN START] {MultiWriterIdGenerator._update_table-e6}                                                                                                                                            
2023-08-10 09:36:05,161 - synapse.storage.SQL - 449 - DEBUG - persist_events-26 - [SQL] {MultiWriterIdGenerator._update_table-e6} INSERT INTO stream_positions (stream_name, instance_name, stream_id) VALUES (?, ?, ?) ON CONFLICT (stream_name, instance_name) DO UPDATE SET stre
am_id = GREATEST(stream_positions.stream_id, EXCLUDED.stream_id)                                                                                                                                                                                                                   
2023-08-10 09:36:05,161 - synapse.storage.SQL - 454 - DEBUG - persist_events-26 - [SQL values] {MultiWriterIdGenerator._update_table-e6} ('events', 'generic_worker1', (175268179,))                                                                                               
2023-08-10 09:36:05,161 - synapse.storage.SQL - 475 - DEBUG - persist_events-26 - [SQL time] {MultiWriterIdGenerator._update_table-e6} 0.000209 sec                                                                                                                                
2023-08-10 09:36:05,161 - synapse.storage.txn - 842 - DEBUG - persist_events-26 - [TXN END] {MultiWriterIdGenerator._update_table-e6} 0.000408 sec                                                                                                                                 
2023-08-10 09:36:05,161 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-26 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                 
2023-08-10 09:36:05,161 - synapse.storage.txn - 738 - DEBUG - persist_events-26 - [TXN START] {drop_lock-e7}                                                                                                                                                                       
2023-08-10 09:36:05,161 - synapse.storage.SQL - 449 - DEBUG - persist_events-26 - [SQL] {drop_lock-e7} DELETE FROM worker_read_write_locks WHERE lock_name = ? AND lock_key = ? AND token = ?                                                                                      
2023-08-10 09:36:05,161 - synapse.storage.SQL - 454 - DEBUG - persist_events-26 - [SQL values] {drop_lock-e7} ['delete_room_lock', '!zCLWAbYiBPnmFazMZQ:t2l.io', 'jhsCla']                                                                                                         
2023-08-10 09:36:05,162 - synapse.storage.SQL - 475 - DEBUG - persist_events-26 - [SQL time] {drop_lock-e7} 0.000425 sec                                                                                                                                                           
2023-08-10 09:36:05,162 - synapse.storage.txn - 842 - DEBUG - persist_events-26 - [TXN END] {drop_lock-e7} 0.000533 sec                                                                                                                                                            
2023-08-10 09:36:05,162 - synapse.util.metrics - 163 - DEBUG - POST-26 - Entering block on_new_event                                                                                                                                                                               
2023-08-10 09:36:05,162 - synapse.util.metrics - 176 - DEBUG - POST-26 - Exiting block on_new_event                                                                                                                                                                                
2023-08-10 09:36:05,163 - synapse.access.http.9094 - 465 - INFO - POST-26 - 127.0.0.1 - 9094 - {None} Processed request: 15.016sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 104B 200 "POST /_synapse/replication/fed_send_events/mkSqieHCQM HTTP/1.1" "Synapse/1.90.0rc1
" [0 dbevts]                                                                                                                                                                                                                                                                       
2023-08-10 09:36:05,179 - synapse.metrics._gc - 118 - DEBUG - sentinel - Collecting gc 0                                                                                                                                                                                           
2023-08-10 09:36:05,296 - synapse.access.http.9094 - 420 - DEBUG - POST-31 - 127.0.0.1 - 9094 - Received request: POST /_synapse/replication/fed_send_events/dqNNyRPWCI                                                                                                            
2023-08-10 09:36:05,296 - synapse.util.metrics - 163 - DEBUG - POST-31 - Entering block repl.wait_for_stream_position                                                                                                                                                              
2023-08-10 09:36:05,296 - synapse.replication.tcp.client - 321 - INFO - POST-31 - Waiting for repl stream 'caches' to reach 49364133 (master); currently at: 49363495                                                                                                              
2023-08-10 09:36:05,776 - synapse.handlers.typing - 111 - DEBUG - typing._handle_timeouts-19 - Checking for typing timeouts                                                                                                                                                        
2023-08-10 09:36:05,983 - synapse.storage.TIME - 644 - DEBUG - sentinel - Total database time: 1.009% {persist_events(9): 0.608%, _get_events_which_are_prevs(9): 0.098%, _get_prevs_before_rejected(9): 0.082%}                                                                   
2023-08-10 09:36:06,166 - synapse.replication.tcp.client - 331 - ERROR - POST-28 - Timed out waiting for repl stream 'caches' to reach 49364133 (master); currently at: 49363495                                                                                                   
2023-08-10 09:36:06,167 - synapse.util.metrics - 176 - DEBUG - POST-28 - Exiting block repl.wait_for_stream_position                                                                                                                                                               
2023-08-10 09:36:06,167 - synapse.util.metrics - 163 - DEBUG - POST-28 - Entering block repl.wait_for_stream_position                                                                                                                                                              
2023-08-10 09:36:06,167 - synapse.replication.tcp.client - 321 - INFO - POST-28 - Waiting for repl stream 'device_lists' to reach 165020706 (master); currently at: 165020179                                                                                                      
2023-08-10 09:36:06,177 - synapse.replication.tcp.client - 331 - ERROR - POST-29 - Timed out waiting for repl stream 'caches' to reach 49364133 (master); currently at: 49363495                                                                                                   
2023-08-10 09:36:06,177 - synapse.util.metrics - 176 - DEBUG - POST-29 - Exiting block repl.wait_for_stream_position                                                                                                                                                               
2023-08-10 09:36:06,177 - synapse.util.metrics - 163 - DEBUG - POST-29 - Entering block repl.wait_for_stream_position                                                                                                                                                              
2023-08-10 09:36:06,177 - synapse.replication.tcp.client - 321 - INFO - POST-29 - Waiting for repl stream 'device_lists' to reach 165020706 (master); currently at: 165020179                                                                                                      
2023-08-10 09:36:06,206 - synapse.replication.tcp.client - 331 - ERROR - POST-30 - Timed out waiting for repl stream 'caches' to reach 49364133 (master); currently at: 49363495                                                                                                   
2023-08-10 09:36:06,206 - synapse.util.metrics - 176 - DEBUG - POST-30 - Exiting block repl.wait_for_stream_position                                                                                                                                                               
2023-08-10 09:36:06,206 - synapse.util.metrics - 163 - DEBUG - POST-30 - Entering block repl.wait_for_stream_position                                                                                                                                                              
2023-08-10 09:36:06,206 - synapse.replication.tcp.client - 321 - INFO - POST-30 - Waiting for repl stream 'device_lists' to reach 165020706 (master); currently at: 165020179                                                                                                      
2023-08-10 09:36:06,292 - synapse.access.http.9094 - 420 - DEBUG - POST-32 - 127.0.0.1 - 9094 - Received request: POST /_synapse/replication/fed_send_events/qkQyFKnlli                                                                                                            
2023-08-10 09:36:06,292 - synapse.util.metrics - 163 - DEBUG - POST-32 - Entering block repl.wait_for_stream_position                                                                                                                                                              
2023-08-10 09:36:06,292 - synapse.replication.tcp.client - 321 - INFO - POST-32 - Waiting for repl stream 'caches' to reach 49364133 (master); currently at: 49363495                                                                                                              
2023-08-10 09:36:06,809 - synapse.access.http.9094 - 420 - DEBUG - POST-33 - 127.0.0.1 - 9094 - Received request: POST /_synapse/replication/fed_send_events/DlEyqRXsIz                                                                                                            
2023-08-10 09:36:06,809 - synapse.util.metrics - 163 - DEBUG - POST-33 - Entering block repl.wait_for_stream_position                                                                                                                                                              
2023-08-10 09:36:06,809 - synapse.replication.tcp.client - 321 - INFO - POST-33 - Waiting for repl stream 'caches' to reach 49364133 (master); currently at: 49363495                                                                                                              
2023-08-10 09:36:06,895 - synapse.access.http.9094 - 420 - DEBUG - POST-34 - 127.0.0.1 - 9094 - Received request: POST /_synapse/replication/fed_send_events/cHzBibwbGX                                                                                                            
2023-08-10 09:36:06,895 - synapse.util.metrics - 163 - DEBUG - POST-34 - Entering block repl.wait_for_stream_position                                                                                                                                                              
2023-08-10 09:36:06,895 - synapse.replication.tcp.client - 321 - INFO - POST-34 - Waiting for repl stream 'caches' to reach 49364133 (master); currently at: 49363495                                                                                                              
2023-08-10 09:36:08,972 - synapse.replication.tcp.client - 331 - ERROR - POST-27 - Timed out waiting for repl stream 'pushers' to reach 930623 (master); currently at: 930619
2023-08-10 09:36:08,972 - synapse.util.metrics - 176 - DEBUG - POST-27 - Exiting block repl.wait_for_stream_position         
2023-08-10 09:36:08,972 - synapse.util.caches.response_cache - 242 - DEBUG - POST-27 - [repl.fed_send_events]: no cached result for [VDWYhyPdBX], calculating new one 
2023-08-10 09:36:08,972 - synapse.util.metrics - 163 - DEBUG - POST-27 - Entering block repl_fed_send_events_parse           
2023-08-10 09:36:08,972 - synapse.util.metrics - 176 - DEBUG - POST-27 - Exiting block repl_fed_send_events_parse                                                                                                                                                                  
2023-08-10 09:36:08,972 - synapse.replication.http.federation - 141 - INFO - POST-27 - Got 1 events from federation                                                                                                                                                                
2023-08-10 09:36:08,973 - synapse.storage.txn - 738 - DEBUG - persist_events-27 - [TXN START] {try_acquire_read_write_lock-e8}
2023-08-10 09:36:08,973 - synapse.storage.SQL - 449 - DEBUG - persist_events-27 - [SQL] {try_acquire_read_write_lock-e8} DELETE FROM worker_read_write_locks WHERE last_renewed_ts < ? AND lock_name = ? AND lock_key = ?; ; INSERT INTO worker_read_write_locks (lock_name, lock_k
ey, write_lock, instance_name, token, last_renewed_ts) VALUES (?, ?, ?, ?, ?, ?)                                             
2023-08-10 09:36:08,973 - synapse.storage.SQL - 454 - DEBUG - persist_events-27 - [SQL values] {try_acquire_read_write_lock-e8} (1691660048973, 'delete_room_lock', '!qIzswZsdZuYCxgZScS:matrix.org', 'delete_room_lock', '!qIzswZsdZuYCxgZScS:matrix.org', False, 'generic_worker1
', 'yXHfmG', 1691660168973)                                                                                                              
2023-08-10 09:36:08,973 - synapse.storage.SQL - 475 - DEBUG - persist_events-27 - [SQL time] {try_acquire_read_write_lock-e8} 0.000472 sec                                                                                                           
2023-08-10 09:36:08,974 - synapse.storage.txn - 842 - DEBUG - persist_events-27 - [TXN END] {try_acquire_read_write_lock-e8} 0.000768 sec                                                                                                                                          
2023-08-10 09:36:08,974 - synapse.util.metrics - 163 - DEBUG - persist_events-27 - Entering block _calculate_state_and_extrem                                                                                                                                                      
2023-08-10 09:36:08,974 - synapse.storage.txn - 738 - DEBUG - persist_events-27 - [TXN START] {get_latest_event_ids_in_room-e9}

and also:

2023-08-10 09:40:14,271 - synapse.http.client - 868 - DEBUG - _process_incoming_pdus_in_room_inner-209-$HaLllbr6yzaUdVjvPAM2RloDkJdIZC9K-aks9ZVq-v4 - Sending request POST synapse-replication://generic_worker4/_synapse/replication/fed_send_events/LpsiRPdiOx                   
2023-08-10 09:40:14,274 - synapse.replication.tcp.client - 331 - ERROR - _process_incoming_pdus_in_room_inner-158-$wBrkQKczitEodQrmfwpAvGuLrm5gR3wet2usi4Ti9Ig-$vzYiwcrmmyYmWuruMjXiQqSyIhlVVi9zGx8wbnIkOFI - Timed out waiting for repl stream 'events' to reach 175268541 (generi
c_worker2); currently at: 175268077                                                                                                                                                                                                                                                
2023-08-10 09:40:14,274 - synapse.replication.tcp.client - 321 - INFO - _process_incoming_pdus_in_room_inner-158-$wBrkQKczitEodQrmfwpAvGuLrm5gR3wet2usi4Ti9Ig-$vzYiwcrmmyYmWuruMjXiQqSyIhlVVi9zGx8wbnIkOFI - Waiting for repl stream 'backfill' to reach 77634014 (generic_worker2)
; currently at: 77634004                                                                                                                                                                                                                                                           
2023-08-10 09:40:14,288 - synapse.replication.tcp.client - 331 - ERROR - PUT-28156 - Timed out waiting for repl stream 'account_data' to reach 21990810 (additional1); currently at: 21990726                                                                                      
2023-08-10 09:40:14,330 - synapse.http.client - 924 - INFO - _process_incoming_pdus_in_room_inner-18-$169166015926312uMuEq:t2bot.io - Received response to POST synapse-replication://generic_worker5/_synapse/replication/fed_send_events/flVJafJjmQ: 200                         
2023-08-10 09:40:14,330 - synapse.replication.tcp.client - 321 - INFO - _process_incoming_pdus_in_room_inner-18-$169166015926312uMuEq:t2bot.io - Waiting for repl stream 'events' to reach 175268543 (generic_worker5); currently at: 175268077                                    
2023-08-10 09:40:14,330 - synapse.http.client - 924 - INFO - _process_incoming_pdus_in_room_inner-6-$qK01Zm-NCF_KFdhrFzjk5Bb_E-kXMjBmTX1QG6LhqCQ - Received response to POST synapse-replication://generic_worker5/_synapse/replication/fed_send_events/erwctNCKHK: 200            
2023-08-10 09:40:14,331 - synapse.replication.tcp.client - 321 - INFO - _process_incoming_pdus_in_room_inner-6-$qK01Zm-NCF_KFdhrFzjk5Bb_E-kXMjBmTX1QG6LhqCQ - Waiting for repl stream 'events' to reach 175268544 (generic_worker5); currently at: 175268077                       
2023-08-10 09:40:14,593 - synapse.replication.tcp.client - 331 - ERROR - PUT-28792 - Timed out waiting for repl stream 'to_device' to reach 10450939 (additional1); currently at: 10450931                                                                                         
2023-08-10 09:40:14,594 - synapse.replication.tcp.client - 321 - INFO - PUT-28792 - Waiting for repl stream 'account_data' to reach 21990811 (additional1); currently at: 21990726
2023-08-10 09:40:14,999 - synapse.http.client - 924 - INFO - PUT-30654 - Received response to POST synapse-replication://additional1/_synapse/replication/add_user_account_data/%40eleonora%3Aenvs.net/im.vector.setting.breadcrumbs: 200
2023-08-10 09:40:15,000 - synapse.replication.tcp.client - 321 - INFO - PUT-30654 - Waiting for repl stream 'typing' to reach 199 (additional1); currently at: 0
2023-08-10 09:40:15,012 - synapse.replication.tcp.client - 331 - ERROR - PUT-28844 - Timed out waiting for repl stream 'to_device' to reach 10450939 (additional1); currently at: 10450931
2023-08-10 09:40:15,013 - synapse.replication.tcp.client - 321 - INFO - PUT-28844 - Waiting for repl stream 'account_data' to reach 21990812 (additional1); currently at: 21990726
2023-08-10 09:40:14,210 - synapse.replication.tcp.client - 321 - INFO - POST-106 - Waiting for repl stream 'pushers' to reach 930634 (master); currently at: 930619
2023-08-10 09:40:15,044 - synapse.replication.tcp.client - 331 - ERROR - POST-107 - Timed out waiting for repl stream 'device_lists' to reach 165020981 (master); currently at: 165020179
2023-08-10 09:40:15,223 - synapse.http.client - 924 - INFO - PUT-30679 - Received response to POST synapse-replication://additional1/_synapse/replication/add_user_account_data/%40haunted%3Aenvs.net/im.vector.setting.breadcrumbs: 200
2023-08-10 09:40:15,223 - synapse.replication.tcp.client - 321 - INFO - PUT-30679 - Waiting for repl stream 'typing' to reach 199 (additional1); currently at: 0
2023-08-10 09:40:15,453 - synapse.replication.tcp.client - 331 - ERROR - PUT-28891 - Timed out waiting for repl stream 'to_device' to reach 10450939 (additional1); currently at: 10450931
2023-08-10 09:40:15,453 - synapse.replication.tcp.client - 321 - INFO - PUT-28891 - Waiting for repl stream 'account_data' to reach 21990812 (additional1); currently at: 21990726
2023-08-10 09:40:15,044 - synapse.replication.tcp.client - 321 - INFO - POST-107 - Waiting for repl stream 'pushers' to reach 930634 (master); currently at: 930619
2023-08-10 09:40:15,484 - synapse.replication.tcp.client - 331 - ERROR - POST-104 - Timed out waiting for repl stream 'pushers' to reach 930634 (master); currently at: 930619
2023-08-10 09:40:15,485 - synapse.replication.http.federation - 141 - INFO - POST-104 - Got 1 events from federation
2023-08-10 09:40:15,491 - synapse.replication.tcp.client - 331 - ERROR - PUT-28905 - Timed out waiting for repl stream 'to_device' to reach 10450939 (additional1); currently at: 10450931
2023-08-10 09:40:15,492 - synapse.replication.tcp.client - 321 - INFO - PUT-28905 - Waiting for repl stream 'account_data' to reach 21990812 (additional1); currently at: 21990726
2023-08-10 09:40:15,492 - synapse.replication.tcp.client - 331 - ERROR - PUT-28905 - Timed out waiting for repl stream 'to_device' to reach 10450939 (additional1); currently at: 10450931
2023-08-10 09:40:15,496 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-104 - Dropping command as not connected: 'LOCK_RELEASED'
2023-08-10 09:40:15,492 - synapse.replication.tcp.client - 321 - INFO - PUT-28905 - Waiting for repl stream 'account_data' to reach 21990812 (additional1); currently at: 21990726
2023-08-10 09:40:15,498 - synapse.http.client - 924 - INFO - _process_incoming_pdus_in_room_inner-176-$pnxEgwo-uRcFJKv79c6tu-BHP05jfsptkNlidbcTNz8 - Received response to POST synapse-replication://generic_worker1/_synapse/replication/fed_send_events/JoeKARVXnJ: 200
2023-08-10 09:40:15,498 - synapse.replication.tcp.client - 321 - INFO - _process_incoming_pdus_in_room_inner-176-$pnxEgwo-uRcFJKv79c6tu-BHP05jfsptkNlidbcTNz8 - Waiting for repl stream 'events' to reach 175268545 (generic_worker1); currently at: 175268077
2023-08-10 09:40:15,532 - synapse.http.client - 509 - DEBUG - PUT-32357 - HTTP POST b'{"_INT_STREAM_POS":{"instance_name":"master","streams":{"caches":49370903,"device_lists":165021005,"federation":0,"push_rules":35244,"pushers":930634,"un_partial_stated_event":6053,"un_part
ial_stated_room":5347}},"content":{"recent_rooms":["!sSTSpHlEXXdYLpKQGl:gazizova.net","!lkRpZbqCFaCHGgQOVi:gazizova.net","!rEKVwRaxvUrYFbSauI:envs.net","!NWKCeLAneBPHtqssOl:gazizova.net"]}}' -> synapse-replication://additional1/_synapse/replication/add_user_account_data/%40e
leonora%3Aenvs.net/im.vector.setting.breadcrumbs
2023-08-10 09:40:15,532 - synapse.http.client - 868 - DEBUG - PUT-32357 - Sending request POST synapse-replication://additional1/_synapse/replication/add_user_account_data/%40eleonora%3Aenvs.net/im.vector.setting.breadcrumbs
2023-08-10 09:40:15,578 - synapse.http.client - 509 - DEBUG - PUT-32367 - HTTP POST b'{"_INT_STREAM_POS":{"instance_name":"master","streams":{"caches":49370903,"device_lists":165021005,"federation":0,"push_rules":35244,"pushers":930634,"un_partial_stated_event":6053,"un_part
ial_stated_room":5347}},"content":{"recent_rooms":["!zMuVRxoqjyxyjSEBXc:matrix.org","!glLuldscRKMQxGWgjZ:matrix.org","!HxxSmgxPkysXjCBLeB:matrix.org"]}}' -> synapse-replication://additional1/_synapse/replication/add_user_account_data/%40yusgam%3Aenvs.net/im.vector.setting.br
eadcrumbs
2023-08-10 09:40:15,578 - synapse.http.client - 868 - DEBUG - PUT-32367 - Sending request POST synapse-replication://additional1/_synapse/replication/add_user_account_data/%40yusgam%3Aenvs.net/im.vector.setting.breadcrumbs
2023-08-10 09:40:15,498 - synapse.access.http.9094 - 465 - INFO - POST-104 - 127.0.0.1 - 9094 - {None} Processed request: 15.016sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 104B 200 "POST /_synapse/replication/fed_send_events/JoeKARVXnJ HTTP/1.1" "Synapse/1.90.0rc
1" [0 dbevts]
2023-08-10 09:40:15,663 - synapse.replication.tcp.client - 331 - ERROR - POST-108 - Timed out waiting for repl stream 'device_lists' to reach 165020981 (master); currently at: 165020179
2023-08-10 09:40:15,837 - synapse.http.client - 924 - INFO - PUT-30744 - Received response to POST synapse-replication://additional1/_synapse/replication/fed_send_edu/m.receipt/OrjSkeKMBf: 200
2023-08-10 09:40:15,837 - synapse.replication.tcp.client - 321 - INFO - PUT-30744 - Waiting for repl stream 'typing' to reach 199 (additional1); currently at: 0
2023-08-10 09:40:15,967 - synapse.replication.tcp.handler - 733 - WARNING - GET-32406 - Dropping command as not connected: 'LOCK_RELEASED'
2023-08-10 09:40:15,977 - synapse.replication.tcp.client - 331 - ERROR - PUT-29623 - Timed out waiting for repl stream 'receipts' to reach 158588954 (additional1); currently at: 158588539
2023-08-10 09:40:15,978 - synapse.replication.tcp.client - 321 - INFO - PUT-29623 - Waiting for repl stream 'to_device' to reach 10450939 (additional1); currently at: 10450931
2023-08-10 09:40:16,033 - synapse.http.client - 509 - DEBUG - PUT-32421 - HTTP POST b'{"_INT_STREAM_POS":{"instance_name":"master","streams":{"caches":49370903,"device_lists":165021005,"federation":0,"push_rules":35244,"pushers":930634,"un_partial_stated_event":6053,"un_part
ial_stated_room":5347}},"content":{"recent_rooms":["!pPkkgeThAWnvwTTKwR:envs.net","!IZqTPFQTHUZcpbJxlq:envs.net","!QmKYtHHGxDwgRidksS:envs.net","!VpYwNHfZWlvnJKoQsC:envs.net","!FdYrAuOBaLmPONxjjl:envs.net"]}}' -> synapse-replication://additional1/_synapse/replication/add_use
r_account_data/%40titan%3Aenvs.net/im.vector.setting.breadcrumbs
2023-08-10 09:40:16,033 - synapse.http.client - 868 - DEBUG - PUT-32421 - Sending request POST synapse-replication://additional1/_synapse/replication/add_user_account_data/%40titan%3Aenvs.net/im.vector.setting.breadcrumbs
2023-08-10 09:40:16,224 - synapse.replication.tcp.client - 331 - ERROR - PUT-28335 - Timed out waiting for repl stream 'account_data' to reach 21990810 (additional1); currently at: 21990726
2023-08-10 09:40:16,225 - synapse.replication.tcp.client - 331 - ERROR - PUT-28335 - Timed out waiting for repl stream 'account_data' to reach 21990810 (additional1); currently at: 21990726

@cremesk
Copy link
Contributor Author

cremesk commented Aug 10, 2023

maybe it could be partly in relation with the missing patch of #16061 ?

@cremesk
Copy link
Contributor Author

cremesk commented Aug 11, 2023

with synapse without workers i also see sometimes a replication problem i think:

2023-08-11 08:47:30,001 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355449 - TX [levis.name] Received 404 response to transaction: 404: Not Found                                                                2023-08-11 08:47:30,006 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298228 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    
2023-08-11 08:47:30,012 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355450 - {PUT-O-12947381} [conduit.pennarbed.eu] Request failed: PUT matrix-federation://conduit.pennarbed.eu/_matrix/federation/v1/send/1691665575066: H
ttpResponseException('404: Not Found')                                                                                                                                                                                                                                             2023-08-11 08:47:30,013 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355450 - TX [conduit.pennarbed.eu] Received 404 response to transaction: 404: Not Found                                                      
2023-08-11 08:47:30,014 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298229 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    
2023-08-11 08:47:30,020 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355448 - {PUT-O-12947382} [koenner.duckdns.org] Request failed: PUT matrix-federation://koenner.duckdns.org/_matrix/federation/v1/send/1691665575067: HttpResponseException('404: Not Found')                                                                                                                                                                                                                                               
2023-08-11 08:47:30,020 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355448 - TX [koenner.duckdns.org] Received 404 response to transaction: 404: Not Found                                                       
2023-08-11 08:47:30,022 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298230 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    2023-08-11 08:47:30,034 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355451 - {PUT-O-12947383} [levis.name] Request failed: PUT matrix-federation://levis.name/_matrix/federation/v1/send/1691665575068: HttpResponseException
('404: Not Found')                                                                                                                                                                                                                                                                 
2023-08-11 08:47:30,034 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355451 - TX [levis.name] Received 404 response to transaction: 404: Not Found                                                                2023-08-11 08:47:30,036 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298231 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    
2023-08-11 08:47:30,052 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355452 - {PUT-O-12947384} [conduit.pennarbed.eu] Request failed: PUT matrix-federation://conduit.pennarbed.eu/_matrix/federation/v1/send/1691665575069: H
ttpResponseException('404: Not Found')                                                                                                                                                                                                                                             2023-08-11 08:47:30,052 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355452 - TX [conduit.pennarbed.eu] Received 404 response to transaction: 404: Not Found                                                      
2023-08-11 08:47:30,056 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298232 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    
2023-08-11 08:47:30,064 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355454 - {PUT-O-12947385} [levis.name] Request failed: PUT matrix-federation://levis.name/_matrix/federation/v1/send/1691665575070: HttpResponseException('404: Not Found')                                                                                                                                                                                                                                                                 
2023-08-11 08:47:30,064 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355454 - TX [levis.name] Received 404 response to transaction: 404: Not Found                                                                
2023-08-11 08:47:30,065 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298233 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    2023-08-11 08:47:30,068 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355453 - {PUT-O-12947386} [koenner.duckdns.org] Request failed: PUT matrix-federation://koenner.duckdns.org/_matrix/federation/v1/send/1691665575071: Htt
pResponseException('404: Not Found')                                                                                                                                                                                                                                               
2023-08-11 08:47:30,068 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355453 - TX [koenner.duckdns.org] Received 404 response to transaction: 404: Not Found                                                       2023-08-11 08:47:30,071 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298234 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    
2023-08-11 08:47:30,093 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355455 - {PUT-O-12947387} [conduit.pennarbed.eu] Request failed: PUT matrix-federation://conduit.pennarbed.eu/_matrix/federation/v1/send/1691665575072: H
ttpResponseException('404: Not Found')                                                                                                                                                                                                                                             2023-08-11 08:47:30,093 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355455 - TX [conduit.pennarbed.eu] Received 404 response to transaction: 404: Not Found                                                      
2023-08-11 08:47:30,096 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355456 - {PUT-O-12947388} [levis.name] Request failed: PUT matrix-federation://levis.name/_matrix/federation/v1/send/1691665575073: HttpResponseException
('404: Not Found')                                                                                                                                                                                                                                                                 2023-08-11 08:47:30,096 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355456 - TX [levis.name] Received 404 response to transaction: 404: Not Found                                                                
2023-08-11 08:47:30,098 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298235 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    
2023-08-11 08:47:30,100 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298236 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    2023-08-11 08:47:30,121 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355457 - {PUT-O-12947391} [koenner.duckdns.org] Request failed: PUT matrix-federation://koenner.duckdns.org/_matrix/federation/v1/send/1691665575076: Htt
pResponseException('404: Not Found')                                                                                                                                                                                                                                               
2023-08-11 08:47:30,121 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355457 - TX [koenner.duckdns.org] Received 404 response to transaction: 404: Not Found                                                       2023-08-11 08:47:30,123 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298237 - Dropping command as not connected: 'REMOTE_SERVER_UP'                                                                                                                    
2023-08-11 08:47:30,131 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355459 - {PUT-O-12947390} [levis.name] Request failed: PUT matrix-federation://levis.name/_matrix/federation/v1/send/1691665575075: HttpResponseException
('404: Not Found')
2023-08-11 08:47:30,131 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355459 - TX [levis.name] Received 404 response to transaction: 404: Not Found
2023-08-11 08:47:30,133 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298238 - Dropping command as not connected: 'REMOTE_SERVER_UP'
2023-08-11 08:47:30,139 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355458 - {PUT-O-12947389} [conduit.pennarbed.eu] Request failed: PUT matrix-federation://conduit.pennarbed.eu/_matrix/federation/v1/send/1691665575074: H
ttpResponseException('404: Not Found')
2023-08-11 08:47:30,140 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355458 - TX [conduit.pennarbed.eu] Received 404 response to transaction: 404: Not Found
2023-08-11 08:47:30,141 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298239 - Dropping command as not connected: 'REMOTE_SERVER_UP'
2023-08-11 08:47:30,161 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355461 - {PUT-O-12947392} [levis.name] Request failed: PUT matrix-federation://levis.name/_matrix/federation/v1/send/1691665575077: HttpResponseException
('404: Not Found')
2023-08-11 08:47:30,161 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355461 - TX [levis.name] Received 404 response to transaction: 404: Not Found
2023-08-11 08:47:30,163 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298240 - Dropping command as not connected: 'REMOTE_SERVER_UP'
2023-08-11 08:47:30,170 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355460 - {PUT-O-12947394} [koenner.duckdns.org] Request failed: PUT matrix-federation://koenner.duckdns.org/_matrix/federation/v1/send/1691665575079: Htt
pResponseException('404: Not Found')
2023-08-11 08:47:30,171 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355460 - TX [koenner.duckdns.org] Received 404 response to transaction: 404: Not Found
2023-08-11 08:47:30,191 - synapse.replication.tcp.handler - 733 - WARNING - store_retry_timings-4298241 - Dropping command as not connected: 'REMOTE_SERVER_UP'
2023-08-11 08:47:30,192 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355462 - {PUT-O-12947393} [conduit.pennarbed.eu] Request failed: PUT matrix-federation://conduit.pennarbed.eu/_matrix/federation/v1/send/1691665575078: H
ttpResponseException('404: Not Found')
2023-08-11 08:47:30,192 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355462 - TX [conduit.pennarbed.eu] Received 404 response to transaction: 404: Not Found
2023-08-11 08:47:30,193 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355447 - {PUT-O-12947379} [matrix.leandr.in] Request failed: PUT matrix-federation://matrix.leandr.in/_matrix/federation/v1/send/1691665575064: HttpRespo
nseException('401: Unauthorized')
2023-08-11 08:47:30,193 - synapse.federation.sender.per_destination_queue - 399 - WARNING - federation_transaction_transmission_loop-5355447 - TX [matrix.leandr.in] Received 401 response to transaction: 401: Unauthorized
2023-08-11 08:47:30,217 - synapse.http.matrixfederationclient - 805 - WARNING - federation_transaction_transmission_loop-5355463 - {PUT-O-12947395} [levis.name] Request failed: PUT matrix-federation://levis.name/_matrix/federation/v1/send/1691665575080: HttpResponseException
('404: Not Found')

@cremesk
Copy link
Contributor Author

cremesk commented Aug 11, 2023

if I activate redis in single synpase mode (without workers), I see the following in the logs:

2023-08-11 08:57:14,666 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @2cccyotgpg:synapse.glester.com: 403: Forbidden                                                                                        
2023-08-11 08:57:15,019 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-187 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:15,020 - synapse.replication.tcp.handler - 733 - WARNING - PUT-4453 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                          
2023-08-11 08:57:15,150 - synapse.replication.tcp.handler - 733 - WARNING - GET-4469 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                          
2023-08-11 08:57:15,779 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4319} [peopleinbearsuits.com] Request failed: GET matrix-federation://peopleinbearsuits.com/_matrix/federation/v1/user/devices/%40amy%3Apeopleinbearsuits.com
: HttpResponseException('404: Not Found')                                                                                                                                                                                                                                          
2023-08-11 08:57:15,779 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @amy:peopleinbearsuits.com: 404: Not Found                                                                                             
2023-08-11 08:57:16,056 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-188 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:16,058 - synapse.replication.tcp.handler - 733 - WARNING - _process_incoming_pdus_in_room_inner-147-$HQFagbk1MwX6ymIl-c6BBwiSKTSNuqemkQE__JWh_ys - Dropping command as not connected: 'LOCK_RELEASED'                                                             
2023-08-11 08:57:16,193 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-189 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:16,196 - synapse.replication.tcp.handler - 733 - WARNING - PUT-4511 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                          
2023-08-11 08:57:16,423 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4393} [77737232.com] Request failed: GET matrix-federation://77737232.com/_matrix/federation/v1/user/devices/%40jacky%3A77737232.com: HttpResponseException('
403: Forbidden')                                                                                                                                                                                                                                                                   
2023-08-11 08:57:16,423 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @jacky:77737232.com: 403: Forbidden                                                                                                    
2023-08-11 08:57:16,443 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4460} [geofabrik.de] Request failed: GET matrix-federation://geofabrik.de/_matrix/federation/v1/user/devices/%40ja1azn585n%3Ageofabrik.de: HttpResponseExcept
ion('403: Forbidden')                                                                                                                                                                                                                                                              
2023-08-11 08:57:16,443 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @ja1azn585n:geofabrik.de: 403: Forbidden                                                                                               
2023-08-11 08:57:16,491 - synapse.replication.tcp.handler - 733 - WARNING - GET-4530 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                          
2023-08-11 08:57:16,612 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4473} [matrix.subfocal.net] Request failed: GET matrix-federation://matrix.subfocal.net/_matrix/federation/v1/user/devices/%408mdwbv1rdx%3Amatrix.subfocal.ne
t: HttpResponseException('403: Forbidden')                                                                                                                                                                                                                                         
2023-08-11 08:57:16,612 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @8mdwbv1rdx:matrix.subfocal.net: 403: Forbidden                                                                                        
2023-08-11 08:57:16,629 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4482} [matrix.bootlin.com] Request failed: GET matrix-federation://matrix.bootlin.com/_matrix/federation/v1/user/devices/%40kahlaudun%3Amatrix.bootlin.com: H
ttpResponseException('403: Forbidden')                                                                                                                                                                                                                                             
2023-08-11 08:57:16,629 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @kahlaudun:matrix.bootlin.com: 403: Forbidden                                                                                          
2023-08-11 08:57:16,649 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4510} [matrix.bootlin.com] Request failed: GET matrix-federation://matrix.bootlin.com/_matrix/federation/v1/user/devices/%40caribearce%3Amatrix.bootlin.com: 
HttpResponseException('403: Forbidden')                                                                                                                                                                                                                                            
2023-08-11 08:57:16,649 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @caribearce:matrix.bootlin.com: 403: Forbidden                                                                                         
2023-08-11 08:57:16,674 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-190 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:16,675 - synapse.replication.tcp.handler - 733 - WARNING - _process_incoming_pdus_in_room_inner-148-$1zqGp-8aJeclEUsNngngUoeESCAWwCraMsqzXSxQNoU - Dropping command as not connected: 'LOCK_RELEASED'                                                             
2023-08-11 08:57:16,821 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-191 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:16,823 - synapse.replication.tcp.handler - 733 - WARNING - _process_incoming_pdus_in_room_inner-149-$AOX79gFkQuzd4Szcwiw7G_OkGJXXb0ti7S2huSxw5gw - Dropping command as not connected: 'LOCK_RELEASED'                                                             
2023-08-11 08:57:17,056 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4520} [rthome.me] Request failed: GET matrix-federation://rthome.me/_matrix/federation/v1/user/devices/%40fs8zdkz2ex%3Arthome.me: HttpResponseException('403:
 Forbidden')                                                                                                                                                                                                                                                                       
2023-08-11 08:57:17,056 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @fs8zdkz2ex:rthome.me: 403: Forbidden                                                                                                  
2023-08-11 08:57:17,229 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4606} [matrix.subfocal.net] Request failed: GET matrix-federation://matrix.subfocal.net/_matrix/federation/v1/user/devices/%404wxed3c7k7%3Amatrix.subfocal.ne
t: HttpResponseException('403: Forbidden')                                                                                                                                                                                                                                         
2023-08-11 08:57:17,229 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @4wxed3c7k7:matrix.subfocal.net: 403: Forbidden                                                                                        
2023-08-11 08:57:17,262 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-193 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:17,265 - synapse.replication.tcp.handler - 733 - WARNING - _process_incoming_pdus_in_room_inner-151-$FkUA1RBbverkhSV_dQvOhFOxnXcTNq2xanYqLKdn_aA - Dropping command as not connected: 'LOCK_RELEASED'                                                             
2023-08-11 08:57:17,270 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-192 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:17,272 - synapse.replication.tcp.handler - 733 - WARNING - _process_incoming_pdus_in_room_inner-150-$nC2yyuj0ESpbAYXLbhW0_vhqpOshBqe0gTg8AxDvAbw - Dropping command as not connected: 'LOCK_RELEASED'                                                             
2023-08-11 08:57:17,281 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-194 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:17,284 - synapse.replication.tcp.handler - 733 - WARNING - PUT-4608 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                          
2023-08-11 08:57:17,285 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4641} [matrix.atvg-studios.com] Request failed: GET matrix-federation://matrix.atvg-studios.com/_matrix/federation/v1/user/devices/%40root%3Amatrix.atvg-stud
ios.com: HttpResponseException('403: Forbidden')                                                                                                                                                                                                                                   
2023-08-11 08:57:17,285 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @root:matrix.atvg-studios.com: 403: Forbidden                                                                                          
2023-08-11 08:57:17,474 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4664} [agrobio-bretagne.org] Request failed: GET matrix-federation://agrobio-bretagne.org/_matrix/federation/v1/user/devices/%40jesusiskingsxha0uit9k%3Aagrob
io-bretagne.org: HttpResponseException('403: Forbidden')                                                                                                                                                                                                                           
2023-08-11 08:57:17,474 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @jesusiskingsxha0uit9k:agrobio-bretagne.org: 403: Forbidden                                                                            
2023-08-11 08:57:17,516 - synapse.replication.tcp.handler - 733 - WARNING - GET-4628 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                          
2023-08-11 08:57:17,550 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-195 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:17,556 - synapse.replication.tcp.handler - 733 - WARNING - _process_incoming_pdus_in_room_inner-152-$BQv5N9f4eH2FRUSyf6x34wZKo_goqMHvLTK-NFuSgiU - Dropping command as not connected: 'LOCK_RELEASED'                                                             
2023-08-11 08:57:17,647 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4682} [matrix.subfocal.net] Request failed: GET matrix-federation://matrix.subfocal.net/_matrix/federation/v1/user/devices/%40o29dxfgosq%3Amatrix.subfocal.ne
t: HttpResponseException('403: Forbidden')                                                                                                                                                                                                                                         
2023-08-11 08:57:17,647 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @o29dxfgosq:matrix.subfocal.net: 403: Forbidden                                                                                        
2023-08-11 08:57:17,823 - synapse.http.matrixfederationclient - 805 - WARNING - _maybe_retry_device_resync-5 - {GET-O-4823} [matrix.subfocal.net] Request failed: GET matrix-federation://matrix.subfocal.net/_matrix/federation/v1/user/devices/%40gh7txl4krf%3Amatrix.subfocal.ne
t: HttpResponseException('403: Forbidden')                                                                                                                                                                                                                                         
2023-08-11 08:57:17,823 - synapse.handlers.device - 1304 - WARNING - _maybe_retry_device_resync-5 - Failed to handle device list update for @gh7txl4krf:matrix.subfocal.net: 403: Forbidden                                                                                        
2023-08-11 08:57:17,897 - synapse.replication.tcp.handler - 733 - WARNING - persist_events-196 - Dropping command as not connected: 'LOCK_RELEASED'                                                                                                                                
2023-08-11 08:57:17,899 - synapse.replication.tcp.handler - 733 - WARNING - _process_incoming_pdus_in_room_inner-153-$sS0bCTE_vcQ5l3AW_yIwwbHDDFWmYBPzN-1AhDPTiXk - Dropping command as not connected: 'LOCK_RELEASED'

@erikjohnston
Copy link
Member

Hi @cremesk, a quick looks suggest to me that for some reason the workers aren't sending/receiving redis pub/sub traffic. In particular:

Dropping command as not connected

Suggests that its failing to talk to the Redis. Also, the lack of PUBLISH commands in your monitor logs suggest its not doing any pub/sub. I am confused why Synapse seems to be able to send SET commands happily. Are there any logs after restart that reference redis?

@cremesk
Copy link
Contributor Author

cremesk commented Aug 14, 2023

hi @erikjohnston
That's what also surprises me. Somehow a connection seems to come about.
I don't see anything relevant in the logs at first, I would say.

i also try the txredisapi==1.4.10 in the synapse venv (#16107) unfortunately, the problem remains the same.

@cremesk
Copy link
Contributor Author

cremesk commented Aug 14, 2023

okay i look a bit deeper in my synapse logs and i think i found the issue...

  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/synapse/replication/tcp/redis.py", line 132, in _send_subscribe                                                                                                                                                     
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/txredisapi.py", line 534, in handle_reply                                                                                                                                                                           
txredisapi.ResponseError: NOPERM this user has no permissions to access one of the channels used as arguments                                                                                                                                                                      
txredisapi.ResponseError: NOPERM this user has no permissions to access one of the channels used as arguments                                                                                                                                                                      
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/txredisapi.py", line 534, in handle_reply                                                                                                                                                                           
txredisapi.ResponseError: NOPERM this user has no permissions to access one of the channels used as arguments                                                                                                                                                                      
2023-08-10 09:42:29,123 - synapse.replication.tcp.redis - 197 - INFO - sentinel - Lost connection to redis                                                                                                                                                                         
2023-08-10 09:42:29,123 - synapse.replication.tcp.redis - 299 - INFO - sentinel - Connection to redis server 127.0.0.1:6379 lost: Connection to the other side was lost in a non-clean fashion: Connection lost.                                                                   
2023-08-10 09:42:29,123 - synapse.replication.tcp.redis - 299 - INFO - sentinel - Connection to redis server 127.0.0.1:6379 lost: Connection to the other side was lost in a non-clean fashion: Connection lost.                                                                   
txredisapi.ResponseError: NOPERM this user has no permissions to access one of the channels used as arguments                                                                                                                                                                      
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/synapse/replication/tcp/redis.py", line 132, in _send_subscribe                                                                                                                                                     
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/txredisapi.py", line 534, in handle_reply                                                                                                                                                                           
txredisapi.ResponseError: NOPERM this user has no permissions to access one of the channels used as arguments                                                                                                                                                                      
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/synapse/replication/tcp/redis.py", line 132, in _send_subscribe                                                                                                                                                     
  File "/opt/venvs/matrix-synapse/lib/python3.11/site-packages/txredisapi.py", line 534, in handle_reply                                                                                                                                                                           
txredisapi.ResponseError: NOPERM this user has no permissions to access one of the channels used as arguments

it looks like the debian 12 upgrade add a acl line to the end of the redis config..

user default on nopass ~* +@all

i delete this line and now it looks fine!

@cremesk cremesk closed this as completed Aug 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants