Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Faster joins: handle gappy /syncs #12645

Closed
richvdh opened this issue May 5, 2022 · 4 comments
Closed

Faster joins: handle gappy /syncs #12645

richvdh opened this issue May 5, 2022 · 4 comments
Assignees
Labels
A-Federated-Join joins over federation generally suck A-Sync defects related to /sync T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@richvdh
Copy link
Member

richvdh commented May 5, 2022

We need to make sure that we're handling gappy /syncs correctly. It would be easy to think that we sent more state to the clients than we did, and we'll need to check this and write tests.

To do this, we'll first have to understand the existing code for tracking which state we have sent to clients. It looks entirely magical.

@anoadragon453 anoadragon453 added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label May 5, 2022
@erikjohnston
Copy link
Member

I think the code we care about is here:

elif batch.limited:
if batch:
state_at_timeline_start = (
await self.state_store.get_state_ids_for_event(
batch.events[0].event_id, state_filter=state_filter
)
)
else:
# We can get here if the user has ignored the senders of all
# the recent events.
state_at_timeline_start = await self.get_state_at(
room_id, stream_position=now_token, state_filter=state_filter
)
# for now, we disable LL for gappy syncs - see
# https://github.com/vector-im/riot-web/issues/7211#issuecomment-419976346
# N.B. this slows down incr syncs as we are now processing way
# more state in the server than if we were LLing.
#
# We still have to filter timeline_start to LL entries (above) in order
# for _calculate_state's LL logic to work, as we have to include LL
# members for timeline senders in case they weren't loaded in the initial
# sync. We do this by (counterintuitively) by filtering timeline_start
# members to just be ones which were timeline senders, which then ensures
# all of the rest get included in the state block (if we need to know
# about them).
state_filter = StateFilter.all()
# If this is an initial sync then full_state should be set, and
# that case is handled above. We assert here to ensure that this
# is indeed the case.
assert since_token is not None
state_at_previous_sync = await self.get_state_at(
room_id, stream_position=since_token, state_filter=state_filter
)
if batch:
current_state_ids = await self.state_store.get_state_ids_for_event(
batch.events[-1].event_id, state_filter=state_filter
)
else:
# Its not clear how we get here, but empirically we do
# (#5407). Logging has been added elsewhere to try and
# figure out where this state comes from.
current_state_ids = await self.get_state_at(
room_id, stream_position=now_token, state_filter=state_filter
)
state_ids = _calculate_state(
timeline_contains=timeline_state,
timeline_start=state_at_timeline_start,
previous=state_at_previous_sync,
current=current_state_ids,
# we have to include LL members in case LL initial sync missed them
lazy_load_members=lazy_load_members,
)

which basically gets the state at the end of the last sync response and the state at the start of the current sync response, takes the delta, and removes any state events that will appear in the timeline anyway.

One way of fixing this is to somehow set the state_at_previous_sync to the empty set if we were still in the process of lazy joining at the time?

@MadLittleMods MadLittleMods added A-Sync defects related to /sync A-Federated-Join joins over federation generally suck labels Jun 3, 2022
@squahtx squahtx self-assigned this Jun 8, 2022
@squahtx
Copy link
Contributor

squahtx commented Jun 8, 2022

I think this is fine. The only state that we might think we sent, but didn't, are m.room.member events.

When lazy_load_members is off, we want to pretend the room has not been joined yet: #12989
When lazy_load_members is on, we:

  1. first decide that we always want to return m.room.members in the /sync response, by kicking m.room.member events out of state-at-the-end-of-the-previous-sync:
    # If we are lazyloading room members, we explicitly add the membership events
    # for the senders in the timeline into the state block returned by /sync,
    # as we may not have sent them to the client before. We find these membership
    # events by filtering them out of timeline_start, which has already been filtered
    # to only include membership events for the senders in the timeline.
    # In practice, we can do this by removing them from the p_ids list,
    # which is the list of relevant state we know we have already sent to the client.
    # see https://github.com/matrix-org/synapse/pull/2970/files/efcdacad7d1b7f52f879179701c7e0d9b763511f#r204732809
    if lazy_load_members:
    p_ids.difference_update(
    e for t, e in timeline_start.items() if t[0] == EventTypes.Member
    )
  2. then filter out m.room.member events that we have already sent, by checking against an in-memory cache, which we update at the same time:
    if lazy_load_members and not include_redundant_members:
    cache_key = (sync_config.user.to_string(), sync_config.device_id)
    cache = self.get_lazy_loaded_members_cache(cache_key)
    # if it's a new sync sequence, then assume the client has had
    # amnesia and doesn't want any recent lazy-loaded members
    # de-duplicated.
    if since_token is None:
    logger.debug("clearing LruCache for %r", cache_key)
    cache.clear()
    else:
    # only send members which aren't in our LruCache (either
    # because they're new to this client or have been pushed out
    # of the cache)
    logger.debug("filtering state from %r...", state_ids)
    state_ids = {
    t: event_id
    for t, event_id in state_ids.items()
    if cache.get(t[1]) != event_id
    }
    logger.debug("...to %r", state_ids)
    # add any member IDs we are about to send into our LruCache
    for t, event_id in itertools.chain(
    state_ids.items(), timeline_state.items()
    ):
    if t[0] == EventTypes.Member:
    cache.set(t[1], event_id)

so in the following sequence, the gappy /sync will include the m.room.member event for Bob:

  1. Alice joins room
  2. /sync
  3. Synapse finishes syncing partial state
  4. Bob sends two messages
  5. gappy /sync with limit: 1

@squahtx
Copy link
Contributor

squahtx commented Jun 8, 2022

Complement test: matrix-org/complement#390

@squahtx
Copy link
Contributor

squahtx commented Jun 10, 2022

The test has been merged. Please reopen if I've missed anything.

@squahtx squahtx closed this as completed Jun 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federated-Join joins over federation generally suck A-Sync defects related to /sync T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

5 participants