Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

look up cross-signing keys from the DB in bulk #6486

Merged
merged 16 commits into from
Dec 12, 2019

Conversation

uhoreg
Copy link
Member

@uhoreg uhoreg commented Dec 6, 2019

to improve performance of looking up 500 users' keys.

Also improve some docstrings.

@uhoreg uhoreg requested a review from a team December 6, 2019 04:46
Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It'd be good to also add a cache here, as e.g. if a room enables encryption then we'll get N users asking for the keys of the other N users, so adding a cache should help a lot. (c.f. @cachedList)

synapse/storage/data_stores/main/end_to_end_keys.py Outdated Show resolved Hide resolved
synapse/storage/data_stores/main/end_to_end_keys.py Outdated Show resolved Hide resolved
synapse/storage/data_stores/main/end_to_end_keys.py Outdated Show resolved Hide resolved
synapse/storage/data_stores/main/end_to_end_keys.py Outdated Show resolved Hide resolved
synapse/storage/data_stores/main/end_to_end_keys.py Outdated Show resolved Hide resolved
synapse/handlers/e2e_keys.py Outdated Show resolved Hide resolved
@uhoreg uhoreg requested a review from erikjohnston December 6, 2019 20:47
@uhoreg
Copy link
Member Author

uhoreg commented Dec 7, 2019

It'd be good to also add a cache here, as e.g. if a room enables encryption then we'll get N users asking for the keys of the other N users, so adding a cache should help a lot. (c.f. @cachedList)

Ah, I forgot to do this.

So the difficulty here is that the from_user_id will be different for each user, which would make the result different, so we can't just cache the whole thing. I think we'd have to split the first query out into a separate function, cache that, and then add in the signatures afterwards.

I think that it'll end up being a big change, so I'll take it out of review until it's done.

@uhoreg
Copy link
Member Author

uhoreg commented Dec 10, 2019

I've split out the first query into a separate function, and set it up to take a list of key types. However, I'm running into some issues adding the caching.

  1. I did the split in the function at https://github.com/matrix-org/synapse/pull/6486/files#diff-9f5f1ff9d5f3d4e7194db99c80985c81R340 which takes a txn parameter. Is it possible to make @cachedList ignore the txn parameter, or do I need to apply @cachedList to the non-txn version (which means the two queries will be in separate transactions)?
  2. If I allow it to take a list of key types to fetch, then that seems like it could cause the cache to explode, as we could have a cache entry for each user, for each possible subset of key types. This would also make cache invalidation hard. In practice, we currently only search for both master and self-signing keys with that function, but it seems a bit dangerous to rely on that always being true. Are there any suggestions for how to deal with this?

@erikjohnston
Copy link
Member

  1. I did the split in the function at https://github.com/matrix-org/synapse/pull/6486/files#diff-9f5f1ff9d5f3d4e7194db99c80985c81R340 which takes a txn parameter. Is it possible to make @cachedList ignore the txn parameter, or do I need to apply @cachedList to the non-txn version (which means the two queries will be in separate transactions)?

Yeah, I'm afraid they will need to be in separate transactions (and you can then cache both transactions). I don't think that'll cause too much overhead as its just one extra query per request, rather than an extra query per queried user?

  1. If I allow it to take a list of key types to fetch, then that seems like it could cause the cache to explode, as we could have a cache entry for each user, for each possible subset of key types. This would also make cache invalidation hard. In practice, we currently only search for both master and self-signing keys with that function, but it seems a bit dangerous to rely on that always being true. Are there any suggestions for how to deal with this?

I think including the key type is fine, the default cache sizes aren't that big so it won't use that much extra memory by default. You can then use invalidate_many to invalidate all keys for a user (or the standard invalidate to invalidatejust a particular user/key). c.f._get_linearized_receipts_for_room` for an example

The other option is to have the query return all keys for a user, filter the keys after the cache, and then invalidate all keys for a user if one of them changes. It's not perfect, but probably good enough.

@uhoreg uhoreg self-assigned this Dec 10, 2019
@uhoreg
Copy link
Member Author

uhoreg commented Dec 10, 2019

Tests were working fine until I merged in develop, and then get_e2e_cross_signing_keys_bulk is now infinite-looping. (e.g. f1ea148 should work fine). From my debugging (i.e. print statements), it seems to be hanging somewhere between the return statement in _get_bare_e2e_cross_signing_keys_bulk_txn and the value arriving into _get_bare_e2e_cross_signing_keys_bulk. i.e., it seems to be hanging somewhere in runInteraction. Has something changed there recently?

Anyways, aside from that, should be ready for review now.

@uhoreg uhoreg requested a review from erikjohnston December 10, 2019 21:39
@erikjohnston
Copy link
Member

Looks like it might be exploding due to a leaked log context, somewhere

@richvdh richvdh removed the z-bug (Deprecated Label) label Dec 11, 2019
@uhoreg
Copy link
Member Author

uhoreg commented Dec 11, 2019

Tests pass now. Should be good now. There is a bit of ugliness, though, in that the @cachedList seems to add None values to the dict, which need to be skipped over by various things. So there's a question of whether we add code to skip over None values, or whether we make another iteration of the dict in get_e2e_cross_signing_keys_bulk to drop the None values. I've implemented the first option for now, but switching should be easy. Aside from that minor thing, I think this should be ready.

@uhoreg uhoreg requested a review from erikjohnston December 11, 2019 19:00
list_name="user_ids",
num_args=1,
)
def _get_bare_e2e_cross_signing_keys_bulk(self, user_ids: list) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_bare_e2e_cross_signing_keys_bulk(self, user_ids: list) -> dict:
def _get_bare_e2e_cross_signing_keys_bulk(self, user_ids: List[str]) -> Dict[str, Dict[str, dict]]:

(You might need to import from typing)

the signatures for the calling user need to be fetched.

Args:
txn (twisted.enterprise.adbapi.Connection): db connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
txn (twisted.enterprise.adbapi.Connection): db connection

@@ -271,7 +271,7 @@ def __init__(
else:
self.function_to_call = orig

arg_spec = inspect.getargspec(orig)
arg_spec = inspect.getfullargspec(orig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ooi?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was going to add a comment explaining it. It's because _get_bare_e2e_cross_signing_keys_bulk has type annotations, so getargspec doesn't work on it, and it suggests to use getfullargspec instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaaaaah, cool

@uhoreg uhoreg merged commit cb2db17 into develop Dec 12, 2019
babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit 'cb2db1799':
  look up cross-signing keys from the DB in bulk (#6486)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants