-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout to OnlineState tracking. #412
Conversation
@@ -51,19 +51,14 @@ import { | |||
WatchTargetChange, | |||
WatchTargetChangeState | |||
} from './watch_change'; | |||
import { clearTimeout } from 'timers'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer necessary, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) {} | ||
onlineStateHandler: (onlineState: OnlineState) => void | ||
) { | ||
this.onlineStateTracker = new OnlineStateTracker(onlineStateHandler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well pass the OnlineStateTracker
in so the RemoteStore doesn't even have to know about the onlineStateHandler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm... I think I prefer to keep OnlineStateTracker as an implementation detail of RemoteStore. I.e.: I think it's better for RemoteStore to know about onlineStateHandler than for FirestoreClient to know about OnlineStateTracker. But feel free to make a case otherwise. I do notice that if we passed OnlineStateTracker into RemoteStore, we could keep RemoteStore unaware of the AsyncQueue, but I don't think that's necessary a goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a strong case either way. The general principle I was following was to invert dependencies where possible, the general rule being to pass your dependencies rather than new them up.
This makes it possible to construct a testing OnlineStateTracker if we ever wanted to do so and pushes all the object construction into one place. While it's true that FirestoreClient doesn't need to know about OnlineStateTracker, FirestoreClient is the assembly point for the rest of our constellation of objects so it doesn't seem like a bad place.
* then we set the OnlineState to Offline, and the client will behave as if | ||
* it is offline (get()s will return cached data, etc.). | ||
*/ | ||
export class OnlineStateTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of any way it would happen but just in case: are there any interactions between this class and the idle timeout timers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be since this timer goes away once you're connected. And the idle timers start once you're connected... And with the improved testing, if both timers were active at once, runDelayedOperationsEarly() would run both so we'd be exercising them together, at least in theory...
this.setAndBroadcast(OnlineState.Unknown); | ||
|
||
if (this.watchStreamTimer === null) { | ||
this.watchStreamTimer = setTimeout(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given all the timer gunk we have now I wonder if it would make sense to either add methods to the async queue to just do this for us or add a special clock/delay type that you could pass in to the constructor to allow us to test these kinds of things directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the async queue (using the cancellation support I added).
@@ -236,8 +178,8 @@ export class RemoteStore { | |||
*/ | |||
disableNetwork(): Promise<void> { | |||
this.disableNetworkInternal(); | |||
// Set the OnlineState to failed so get()'s return from cache, etc. | |||
this.updateOnlineState(OnlineState.Failed); | |||
// Set the OnlineState to Offline so get()'s return from cache, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use apostrophes for making plurals. (it should be "get()s" or just "gets").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* Returns whether a watch stream is currently necessary (network is enabled | ||
* and we have active watch targets). | ||
*/ | ||
private isWatchStreamNecessary(): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this extra concept that's confusingly similar to shouldStartWatchStream, especially since it's only used in the implementation of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good call. This is an artifact of a previous implementation attempt. Unwound.
Despite the nitpicking in here I really like the direction this is heading. |
* Introduces a DelayedOperation helper class in AsyncQueue to encapsulate delayed op logic. * Adds cancellation support which I want to use in #412 * Remove delayedOperationsCount in favor of keeping delayedOperations populated correctly at all times. * Fixes a preexisting issue in AsyncQueue.schedule() where the returned promise would always be resolved with undefined, instead of the result of your op. Also remove an AnyDuringMigration usage.
…on. (#489) * Introduces a DelayedOperation helper class in AsyncQueue to encapsulate delayed op logic. * Adds cancellation support which I want to use in #412 * Updates the idle timer in persistent_stream.ts to use new cancellation support. * Remove delayedOperationsCount in favor of keeping delayedOperations populated correctly at all times. * Fixes a preexisting issue in AsyncQueue.schedule() where the returned promise would always be resolved with undefined, instead of the result of your op. Also remove an AnyDuringMigration usage.
* Addressed mcg@'s review feedback. * Moved OnlineState timeout to AsyncQueue using new cancelable delayed operation capabilities. * Added a dummy .catch() handler to the CancelablePromises returned by AsyncQueue.enqueueAfterDelay() to avoid UnhandledPromiseRejection log spam due to canceled timers. * Added a SpecTest to verify OnlineState timeout behavior. * Added ability to run timers from spec tests (including assigning string names to TimerId enum values) * Added TimerId.All to match iOS and make it easier to run all timers from spec tests.
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the looong delay, but I think this is finally ready for review again. I addressed the feedback, moved to the new cancelable delayed operation support via AsyncQueue, and added a spec test.
If you want, I could probably split this into at least a couple PRs (e.g. I could separate out the "cleanup" piece of the PR description). Let me know if you'd like me to do that.
@@ -51,19 +51,14 @@ import { | |||
WatchTargetChange, | |||
WatchTargetChangeState | |||
} from './watch_change'; | |||
import { clearTimeout } from 'timers'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) {} | ||
onlineStateHandler: (onlineState: OnlineState) => void | ||
) { | ||
this.onlineStateTracker = new OnlineStateTracker(onlineStateHandler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm... I think I prefer to keep OnlineStateTracker as an implementation detail of RemoteStore. I.e.: I think it's better for RemoteStore to know about onlineStateHandler than for FirestoreClient to know about OnlineStateTracker. But feel free to make a case otherwise. I do notice that if we passed OnlineStateTracker into RemoteStore, we could keep RemoteStore unaware of the AsyncQueue, but I don't think that's necessary a goal.
@@ -236,8 +178,8 @@ export class RemoteStore { | |||
*/ | |||
disableNetwork(): Promise<void> { | |||
this.disableNetworkInternal(); | |||
// Set the OnlineState to failed so get()'s return from cache, etc. | |||
this.updateOnlineState(OnlineState.Failed); | |||
// Set the OnlineState to Offline so get()'s return from cache, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* Returns whether a watch stream is currently necessary (network is enabled | ||
* and we have active watch targets). | ||
*/ | ||
private isWatchStreamNecessary(): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good call. This is an artifact of a previous implementation attempt. Unwound.
this.setAndBroadcast(OnlineState.Unknown); | ||
|
||
if (this.watchStreamTimer === null) { | ||
this.watchStreamTimer = setTimeout(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the async queue (using the cancellation support I added).
* then we set the OnlineState to Offline, and the client will behave as if | ||
* it is offline (get()s will return cached data, etc.). | ||
*/ | ||
export class OnlineStateTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be since this timer goes away once you're connected. And the idle timers start once you're connected... And with the improved testing, if both timers were active at once, runDelayedOperationsEarly() would run both so we'd be exercising them together, at least in theory...
|
||
if (this.watchStreamTimerPromise === null) { | ||
this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay( | ||
TimerId.OnlineStateTimeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this cluster of names confusingly dissimlar. Why is this an "OnlineStateTimeout", the promise is a "watchStreamTimerPromise", and the timeout a "MAX_WATCH_STREAM_TIMEOUT_MS"?
These all seem related to the handling the initial connection but:
- OnlineStateTimeout seems like it might be applicable to other transitions
- watchStreamTimer seems wholly unrelated to that purpose
- MAX_WATCH_STREAM_TIMEOUT_MS seems just as likely to apply to an idle timeout. Also, why "MAX"? Isn't this just the timeout duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points.
I renamed:
MAX_WATCH_STREAM_TIMEOUT_MS
=>ONLINE_STATE_TIMEOUT_MS
watchStreamTimerPromise
=>onlineStateTimer
And I clarified some comments. I agree that OnlineStateTimeout is a somewhat imprecise name (could be applicable to other transitions). We could do "OnlineStateUnknownTimeout", but I don't think that's much clearer (and makes all of these names longer) and so I'm inclined to just lean on comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a marked improvement and LGTM as-is. Thanks. Agree that leaning on comments can be the way to go in cases like these.
I'm still torn about this because the connection timer seems very much related to what we're doing here: namely if we're ever connecting our state starts at unknown. If we logically fail to connect (i.e. fail to transition out of unknown) then this timer fires. It seems like, for all intents and purposes, OnlineStateUnknownTimeout is the real connection timeout. Could that other thing we have be some lower-level concept?
Or maybe the name of OnlineStateUnknownTimeout itself is caught in my craw. Our other timeouts are named in terms of the end state, not the starting state. IOW: how long until some state, not how long in some state. How long until idle, how long until connected, etc.
This comment isn't super actionable but I'm still uneasy, and if history has shown anything: this code is fiddly, keeps changing, and we keep misunderstanding it over time, so if you have the energy I'd like to try to find a way to make it more immediately obvious and less fuzzy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that our timers have been fiddly / confusing and getting this right is worthwhile. :-)
So right now our timers are: ListenStreamIdle / ListenStreamConnection, WriteStreamIdle / WriteStreamConnection, OnlineStateTimeout
The Connection timers are used to schedule delayed stream attempts subject to exponential backoff... Not all stream attempts are delayed in which case the "Connection" timer doesn't come into play (e.g the first attempt once a stream becomes necessary, or the first attempt after an established stream encounters an error, etc.).
But the OnlineStateTimeout is still relevant when the stream attempt isn't delayed (e.g. it'll fire if the first stream attempt doesn't succeed / fail within 10 seconds, triggering us to OnlineState.Offline while we wait).
And the Connection timers will continue to be used even once the OnlineStateTimeout has triggered (because we'll continue scheduling stream attempts with backoff even after we've entered OnlineState.Offline).
So I think they're not actually that related, and we probably have a naming problem that makes them seem related. We could try to be more verbose, e.g. ListenStreamDelayedRestart instead of ListenStreamConnection.
In naming these, it's worth thinking about their usage from tests, e.g.: runDelayedOperationsUntil(TimerId.ListenStreamDelayedRestart) to make a test skip the exponential backoff delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK could the connection timer ids really be backoff timer ids? This is essentially where we use them:
this.backoff = new ExponentialBackoff(
queue,
connectionTimerId,
BACKOFF_INITIAL_DELAY_MS,
BACKOFF_FACTOR,
BACKOFF_MAX_DELAY_MS
);
TimerId.ListenStreamBackoff
=> runDelayedOperationsUntil(TimerId.ListenStreamBackoff)
?
I don't think the fact that the online state timeout is for succeeding or failing much matters. Connection timeouts pretty much always work this way: they introduce a client-generated error if the connection hasn't already succeeded or failed on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly what they are, except "backoff" describes an implementation detail about how the timers are scheduled rather than what they actually are for. You could hypothetically imagine that the stream could have multiple timers, for different purposes, that happen to use exponential backoff scheduling, at which point "ListenStreamBackoff" would be meaningless (as opposed to "ListenStreamDelayedRestart" or whatever).
That said, we probably conflate "backoff" with the act of reconnecting already, and so I'd be okay with ListenStreamBackoff if you think that's clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I still do think there's a meaningful distinction between our "online state timeout" and a "connection timeout" because we explicitly don't cancel the existing stream attempt and start a new one... Instead we're relying on grpc's internal timeout / retry connection logic.
); | ||
this.logClientOfflineWarningIfNecessary(); | ||
this.setAndBroadcast(OnlineState.Offline); | ||
return Promise.resolve(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this failure to connect in time bump the number of watch stream failures? If not could you please add comments why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
) {} | ||
onlineStateHandler: (onlineState: OnlineState) => void | ||
) { | ||
this.onlineStateTracker = new OnlineStateTracker(onlineStateHandler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a strong case either way. The general principle I was following was to invert dependencies where possible, the general rule being to pass your dependencies rather than new them up.
This makes it possible to construct a testing OnlineStateTracker if we ever wanted to do so and pushes all the object construction into one place. While it's true that FirestoreClient doesn't need to know about OnlineStateTracker, FirestoreClient is the assembly point for the rest of our constellation of objects so it doesn't seem like a bad place.
ListenStreamConnection = 'listen_stream_connection', | ||
WriteStreamIdle = 'write_stream_idle', | ||
WriteStreamConnection = 'write_stream_connection', | ||
OnlineStateTimeout = 'online_state_timeout' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how OnlineStateTimeout is different from either the connection or idle timeouts. I'm not sure if I'm asking for it to be deduplicated against those others, comments on the individual timer ids, or some kind of theory of operation primer on how these all fit together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments that hopefully makes things somewhat clearer. FWIW, coming up with names for these is a bit tricky because they're meant to be high-level concepts (since they'll be triggered from high-level tests, etc.) and so e.g. "Connection" was chosen over "Backoff"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good feedback. PTAL!
|
||
if (this.watchStreamTimerPromise === null) { | ||
this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay( | ||
TimerId.OnlineStateTimeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points.
I renamed:
MAX_WATCH_STREAM_TIMEOUT_MS
=>ONLINE_STATE_TIMEOUT_MS
watchStreamTimerPromise
=>onlineStateTimer
And I clarified some comments. I agree that OnlineStateTimeout is a somewhat imprecise name (could be applicable to other transitions). We could do "OnlineStateUnknownTimeout", but I don't think that's much clearer (and makes all of these names longer) and so I'm inclined to just lean on comments.
); | ||
this.logClientOfflineWarningIfNecessary(); | ||
this.setAndBroadcast(OnlineState.Offline); | ||
return Promise.resolve(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
ListenStreamConnection = 'listen_stream_connection', | ||
WriteStreamIdle = 'write_stream_idle', | ||
WriteStreamConnection = 'write_stream_connection', | ||
OnlineStateTimeout = 'online_state_timeout' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments that hopefully makes things somewhat clearer. FWIW, coming up with names for these is a bit tricky because they're meant to be high-level concepts (since they'll be triggered from high-level tests, etc.) and so e.g. "Connection" was chosen over "Backoff"
5050675
to
b5d412f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
if (this.watchStreamTimerPromise === null) { | ||
this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay( | ||
TimerId.OnlineStateTimeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a marked improvement and LGTM as-is. Thanks. Agree that leaning on comments can be the way to go in cases like these.
I'm still torn about this because the connection timer seems very much related to what we're doing here: namely if we're ever connecting our state starts at unknown. If we logically fail to connect (i.e. fail to transition out of unknown) then this timer fires. It seems like, for all intents and purposes, OnlineStateUnknownTimeout is the real connection timeout. Could that other thing we have be some lower-level concept?
Or maybe the name of OnlineStateUnknownTimeout itself is caught in my craw. Our other timeouts are named in terms of the end state, not the starting state. IOW: how long until some state, not how long in some state. How long until idle, how long until connected, etc.
This comment isn't super actionable but I'm still uneasy, and if history has shown anything: this code is fiddly, keeps changing, and we keep misunderstanding it over time, so if you have the energy I'd like to try to find a way to make it more immediately obvious and less fuzzy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM. Future me really thanks for taking the time to make this better.
Add 10 second timeout to OnlineState tracking.
AsyncQueue.enqueueAfterDelay() to avoid UnhandledPromiseRejection log spam
due to canceled timers.
names to TimerId enum values)
tests.