-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix #8518 (sync requests being cached wrongly on timeout) #9358
Conversation
I could make this cleaner by using |
(Note: i am fully aware this is kind-of a hack, and i'm prodding with this PR to see if this hack is acceptable for the problem) |
Thanks for digging into the bug. I think your approach is probably fine, it doesn't really fill me with joy but oh well. Perhaps one refinement/alternative would be to have Jonathon/et al.: Thoughts? (This will also need a test) |
I don't wanna bet on this, and make the response depend on falsey/truthy evaluation, which can be wonky, but i'll look at I'll also look at making a test somewhere |
I made it so that |
@erikjohnston there seems to be no a. want me to make one, and only add a test for the conditional ? |
Ugh, I really thought there were tests already. I think adding tests should be easy enough, so if you could do that then that would be great. Some simple tests plus some tests of the new logic would be fab, assuming that by the time you've done the tests for the new logic it'll be trivial to do some simple tests as well. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Some conversation in #synapse-dev uncovered that adding tests in this PR probably isn't the best option, so i'll make another PR that depends on this one, and adds tests for ResponseCache |
754eb0d
to
04c399d
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine for me, thanks for clearly describing what caused the bug in the description!
Oh, the description is actually outdated right now, let me update that. Edit: The previous approach used a new type called |
Thanks! Merging as the tests in #9458 are passing. |
sync_config.request_key, | ||
lambda result: since_token != result.next_batch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have had a comment. Why does doing this give the right behaviour? (I shouldn't have to go and find the PR that added it, and even having done so it's hard to correlate the description to the code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was described in the PR description, but i'll take it in another PR to add a comment here, i.e. "small fixes for ResponseCache
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was described in the PR description,
yeah as I said: having to go and find the PR and grok it is a pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, i'll add a comment describing what that conditional does 👍
def add_conditional(self, key: T, conditional: Callable[[Any], bool]): | ||
self.pending_conditionals.setdefault(key, set()).add(conditional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a dangerous thing to add to the public interface. It seems like it would be very easy to use in racy way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should i signal that add_conditional
is private by prefixing it with _
(in a new PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it's only called in one place, I'd just inline it.
# See if there's already a result on this key that hasn't yet completed. Due to the single-threaded nature of | ||
# python, adding a key immediately in the same execution thread will not cause a race condition. | ||
result = self.get(key) | ||
if not result or isinstance(result, defer.Deferred) and not result.called: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct, but I had to go and look up the precedence of and
vs or
to check it. Please use parens to disambiguate in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to write a bunch of comments after the event; I wanted to review it in the light of #9507 and these were a few things I spotted.
TL;DR is that this could be easier to grok but I can't see any obvious bugs here.
if not result or isinstance(result, defer.Deferred) and not result.called: | ||
self.add_conditional(key, should_cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it correct that the conditional is added when not result.called
? I'd argue that cachability of the result should be a property of the callback
being used (and hence set at the same time as that callback is set) rather than any subsequent callbacks
(which are discarded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operator precedence is or
-> and
-> not
, so this is actually
(not result) or (isinstance(result, defer.Deferred) and (not result.called))
I had to check after your previous comment, it could actually be if not result and not result.called
, but because result
can be a plain return value, i need to insert the check to isinstance(..., defer.Deferred)
Yes, the conditional is added only to be executed after the result is called, i do agree that now i realise that any future calls can be like "hey, this shouldn't be cached anyways because the returned value doesn't agree with what i already got from somewhere else", in which case I need to rewrite this to execute the conditional locally whenever it's called, and evict the cache when it's not valid according to the new conditional. (the callLater
already taking place would simply execute anyways, no way to properly garbage collect that early at that point (to my knowledge), if it was an asyncio Task
, i could also map it somewhere to then .cancel()
it, if this is possible with callLater
handles or something, please tell me)
(this last approach (letting callLater
still call) could potentially evict a cache early if a subsequent quick wrap
call has the same key, though idk how much of a problem that'd be, seeing as it'd evict cache shortly after the call, and only if;
- the cache was deemed to be time-cached (with conditionals or not)
- after the function's return, another function does
wrap_conditional
with a conditional that evaluates toFalse
- shortly after that
wrap_conditional
call and then-immediate cache eviction, a function (within the remaining timeout) callswrap*
with the same key- wait.
Oh, this might actually be a problem, if the function (with the same key) hasn't returned yet on the second wrap*
call, then the previous callLater
will evict the deferred, which could lead to more wonky stuff, but the fact it evicts a call in flight already bad enough.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed at considerable length in the room: I don't agree with your logic here. I think that supporting multiple should_cache
callbacks per key significantly complicates the implentation, and semantically is dubious at best.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, i'll just make it a simple Dict[T, Callable]
mapping then 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, more discussion in #9522 on this.
result = self.get(key) | ||
if not result or isinstance(result, defer.Deferred) and not result.called: | ||
self.add_conditional(key, should_cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is duplicating half the logic of wrap
, which is kinda the worst of all worlds. Could you not make should_cache
nullable, and have wrap
call wrap_conditional
rather than the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially did that, but that became a bit of a mess, and you'd had to shift around the positional arguments on any wrap
calls currently in place.
I also didn't wanna make it a keyword argument, because that'd shadow any potential keyword arguments to the inner call. While i know it is trivial to "just add it", i didn't want to because it is non-explicit to anyone not aware of this change to ResponseCache
, and so it is possible for it to become a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially did that, but that became a bit of a mess, and you'd had to shift around the positional arguments on any wrap calls currently in place.
I'm not suggesting changing the signature of wrap
.
Wrap would be:
def wrap(
self, key: T, callback: "Callable[..., Any]", *args: Any, **kwargs: Any
) -> defer.Deferred:
return self.wrap_conditional(key, None, callback, *args, **kwargs)
why is that a mess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, i'll change it to that in that PR, then 👍
I've backed this out in aee1076. I think we'll want to:
I think the current known issue of #8518 is better than the unknown impact of the regression issue #9507. Unfortunately with this change we did not have confidence in creating a new release candidate or deploying to matrix.org, which are necessities for the team. |
This fixes #8518 by adding a conditional check on
SyncResult
in a function whenprev_stream_token == current_stream_token
, as a sanity check. InCachedResponse.set.<remove>()
, the result is immediately popped from the cache if the conditional function returns "false".This prevents the caching of a timed-out
SyncResult
(that hasnext_key
as the stream key that produced thatSyncResult
). The cache is prevented from returning aSyncResult
that makes the client request the same stream key over and over again, effectively making it stuck in a loop of requesting and getting a response immediately for as long as the cache keeps those values.Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.Signed-off-by: Jonathan de Jong <[email protected]>
Fixes #8518