-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Error doing background update 'chain_cover' on v1.26.0rc1 #9208
Comments
Thank you for the report! We're currently investigating. Aside from the logspam, things should continue working correctly. Even though the background update to create a chain cover index has failed, Synapse will fall back to the previous state resolution algorithm which does not depend on that index being present. |
I'm very confused how this is possible. Could you run the following patch and send over the logs (filtering by diff --git a/synapse/storage/databases/main/events.py b/synapse/storage/databases/main/events.py
index 3216b3f3c..d72585303 100644
--- a/synapse/storage/databases/main/events.py
+++ b/synapse/storage/databases/main/events.py
@@ -587,6 +587,11 @@ class PersistEventsStore:
auth_id not in chain_map
and auth_id not in events_to_calc_chain_id_for
):
+ logger.info(
+ "Discarding event ID %s due to missing auth event %s",
+ event_id,
+ auth_id,
+ )
events_to_calc_chain_id_for.discard(event_id)
# If this is an event we're trying to persist we add it to
@@ -614,6 +619,8 @@ class PersistEventsStore:
if not events_to_calc_chain_id_for:
return
+ logger.info("Calculating chain IDs for: %s", events_to_calc_chain_id_for)
+
# We now calculate the chain IDs/sequence numbers for the events. We
# do this by looking at the chain ID and sequence number of any auth
# event with the same type/state_key and incrementing the sequence
@@ -628,9 +635,11 @@ class PersistEventsStore:
for event_id in sorted_topologically(
events_to_calc_chain_id_for, event_to_auth_chain
):
+ logger.info("Calculating chain ID for %s", event_id)
existing_chain_id = None
for auth_id in event_to_auth_chain.get(event_id, []):
if event_to_types.get(event_id) == event_to_types.get(auth_id):
+ logger.info("Basing chain ID on %s", auth_id)
existing_chain_id = chain_map[auth_id]
break
@@ -666,6 +675,7 @@ class PersistEventsStore:
chains_tuples_allocated.add(new_chain_tuple)
+ logger.info("Calculated chain ID for %s: %s", event_id, new_chain_tuple)
chain_map[event_id] = new_chain_tuple
new_chain_tuples[event_id] = new_chain_tuple
|
@erikjohnston ~1 minute worth of logs (all workers): 2021-01-22-synapse-9208.log |
Huh, something odd is happening. It almost looks like we're not currently sorting topologically. I'd appreciate if you could try with some more logging: diff --git a/synapse/storage/databases/main/events.py b/synapse/storage/databases/main/events.py
index 3216b3f3c..29ddf0831 100644
--- a/synapse/storage/databases/main/events.py
+++ b/synapse/storage/databases/main/events.py
@@ -579,9 +579,17 @@ class PersistEventsStore:
else:
chain_map[auth_id] = (chain_id, sequence_number)
+ logger.info("event_to_auth_chain: %s", event_to_auth_chain)
+
# Now we check if we have any events where we don't have auth chain,
# this should only be out of band memberships.
for event_id in sorted_topologically(event_to_auth_chain, event_to_auth_chain):
+ logger.info(
+ "Checking auth events for %s (in %s): %s",
+ event_id,
+ event_to_room_id.get(event_id),
+ event_to_auth_chain[event_id],
+ )
for auth_id in event_to_auth_chain[event_id]:
if (
auth_id not in chain_map |
Ah, Can you see if the following patch fixes it? diff --git a/synapse/util/iterutils.py b/synapse/util/iterutils.py
index 6ef2b008a..8d2411513 100644
--- a/synapse/util/iterutils.py
+++ b/synapse/util/iterutils.py
@@ -78,7 +78,7 @@ def sorted_topologically(
if node not in degree_map:
continue
- for edge in edges:
+ for edge in set(edges):
if edge in degree_map:
degree_map[node] += 1
|
@erikjohnston yep, the exception spam is gone. Is this a sign of some latent issue with intelfx.name database? |
Ish, they're probably old events from way back before we added sanity checks. They're not "wrong" per se, just that we don't see that situation in modern events and so our testing didn't pick the situation up :) |
Also, yay that that fixed it. |
Closing since #9210 didn't see to close it automatically. 🤷 |
Description
After upgrading v1.25.0 synapse installation to v1.26.0rc1 and restarting the instance one more time, it started to continuously report this error:
Steps to reproduce
Version information
If not matrix.org:
Version:
{"server_version":"1.26.0rc1","python_version":"3.9.1"}
Install method: hand packaged
Platform: x86_64 Arch Linux
The text was updated successfully, but these errors were encountered: