Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Federation catchup for a destination can block indefinitely on un-partial stating #15220

Closed
squahtx opened this issue Mar 7, 2023 · 3 comments · Fixed by #15248
Closed

Federation catchup for a destination can block indefinitely on un-partial stating #15220

squahtx opened this issue Mar 7, 2023 · 3 comments · Fixed by #15248
Labels
A-Federated-Join joins over federation generally suck A-Federation O-Occasional Affects or can be seen by some users regularly or most users rarely S-Critical Blocks development, potential data loss, more than 25% of users possibly affected, no workarounds. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Release-Blocker Must be resolved before making a release

Comments

@squahtx
Copy link
Contributor

squahtx commented Mar 7, 2023

What we think happened

  1. A user faster joined a remote room
  2. Some events were sent into the room, but not transmitted successfully
  3. The user left the room before it became full state
  4. The federation catchup loop tried to transmit events in the room, but blocked on full state. This also blocked *all* catchup transmission for the destination.

Stack trace

  File "/home/synapse/src/synapse/metrics/background_process_metrics.py", line 242, in run
    return await func(*args, **kwargs)
  File "/home/synapse/src/synapse/federation/sender/per_destination_queue.py", line 321, in _transaction_transmission_loop
    await self._catch_up_transmission_loop()
  File "/home/synapse/src/synapse/federation/sender/per_destination_queue.py", line 544, in _catch_up_transmission_loop
    new_pdus = await filter_events_for_server(
  File "/home/synapse/src/synapse/visibility.py", line 650, in filter_events_for_server
    event_to_memberships = await _event_to_memberships(
  File "/home/synapse/src/synapse/visibility.py", line 746, in _event_to_memberships
    event_to_state_ids = await storage.state.get_state_ids_for_events(
  File "/home/synapse/src/synapse/logging/opentracing.py", line 914, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/home/synapse/src/synapse/logging/opentracing.py", line 914, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/home/synapse/src/synapse/storage/controllers/state.py", line 267, in get_state_ids_for_events
    event_to_groups = await self.get_state_group_for_events(
  File "/home/synapse/src/synapse/logging/opentracing.py", line 914, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/home/synapse/src/synapse/logging/opentracing.py", line 914, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/home/synapse/src/synapse/storage/controllers/state.py", line 374, in get_state_group_for_events
    await self._partial_state_events_tracker.await_full_state(event_ids)
  File "/home/synapse/src/synapse/logging/opentracing.py", line 914, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/home/synapse/src/synapse/storage/util/partial_state_events_tracker.py", line 82, in await_full_state
    logger.info(
@squahtx squahtx added A-Federation A-Federated-Join joins over federation generally suck S-Critical Blocks development, potential data loss, more than 25% of users possibly affected, no workarounds. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Occasional Affects or can be seen by some users regularly or most users rarely labels Mar 7, 2023
@DMRobertson
Copy link
Contributor

Workedaround in ab629c1, but I think we ought to fix this properly for the next release.

@DMRobertson DMRobertson added the X-Release-Blocker Must be resolved before making a release label Mar 8, 2023
@DMRobertson
Copy link
Contributor

Of note: we only reach that far down into the stack if the room's history visibility is restricted:

# for any with restricted vis, we also need the memberships
event_to_memberships = await _event_to_memberships(
storage,
[
e
for e in events
if event_to_history_vis[e.event_id]
not in (HistoryVisibility.SHARED, HistoryVisibility.WORLD_READABLE)
],
target_server_name,
)
to_return = []
for e in events:
erased = is_sender_erased(e, erased_senders)
visible = check_event_is_visible(
event_to_history_vis[e.event_id], event_to_memberships.get(e.event_id, {})
)
if e in partial_state_invisible_events:
visible = False
if visible and not erased:
to_return.append(e)
elif redact:
to_return.append(prune_event(e))
return to_return

@DMRobertson
Copy link
Contributor

Fixed by #15248.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federated-Join joins over federation generally suck A-Federation O-Occasional Affects or can be seen by some users regularly or most users rarely S-Critical Blocks development, potential data loss, more than 25% of users possibly affected, no workarounds. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Release-Blocker Must be resolved before making a release
Projects
None yet
2 participants