Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout to OnlineState tracking. #412

Merged
merged 12 commits into from
Mar 3, 2018
Merged

Conversation

mikelehen
Copy link
Contributor

@mikelehen mikelehen commented Jan 5, 2018

Add 10 second timeout to OnlineState tracking.

  • Added a timer that triggers after 10 seconds without a successful stream attempt, reverting to "offline" behavior, potentially triggering data from cache.
  • Cleanup:
    • Renamed OnlineState enum values to Offline, Online, and Unknown.
    • Refactored "online state tracking" into its own OnlineStateTracker class.
    • Added a dummy .catch() handler to the CancelablePromises returned by
      AsyncQueue.enqueueAfterDelay() to avoid UnhandledPromiseRejection log spam
      due to canceled timers.
  • Added a SpecTest to verify new timeout behavior.
    • Added ability to run timers from spec tests (including assigning string
      names to TimerId enum values)
    • Added TimerId.All to match iOS and make it easier to run all timers from spec
      tests.

@@ -51,19 +51,14 @@ import {
WatchTargetChange,
WatchTargetChangeState
} from './watch_change';
import { clearTimeout } from 'timers';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer necessary, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

) {}
onlineStateHandler: (onlineState: OnlineState) => void
) {
this.onlineStateTracker = new OnlineStateTracker(onlineStateHandler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well pass the OnlineStateTracker in so the RemoteStore doesn't even have to know about the onlineStateHandler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... I think I prefer to keep OnlineStateTracker as an implementation detail of RemoteStore. I.e.: I think it's better for RemoteStore to know about onlineStateHandler than for FirestoreClient to know about OnlineStateTracker. But feel free to make a case otherwise. I do notice that if we passed OnlineStateTracker into RemoteStore, we could keep RemoteStore unaware of the AsyncQueue, but I don't think that's necessary a goal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a strong case either way. The general principle I was following was to invert dependencies where possible, the general rule being to pass your dependencies rather than new them up.

This makes it possible to construct a testing OnlineStateTracker if we ever wanted to do so and pushes all the object construction into one place. While it's true that FirestoreClient doesn't need to know about OnlineStateTracker, FirestoreClient is the assembly point for the rest of our constellation of objects so it doesn't seem like a bad place.

* then we set the OnlineState to Offline, and the client will behave as if
* it is offline (get()s will return cached data, etc.).
*/
export class OnlineStateTracker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any way it would happen but just in case: are there any interactions between this class and the idle timeout timers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be since this timer goes away once you're connected. And the idle timers start once you're connected... And with the improved testing, if both timers were active at once, runDelayedOperationsEarly() would run both so we'd be exercising them together, at least in theory...

this.setAndBroadcast(OnlineState.Unknown);

if (this.watchStreamTimer === null) {
this.watchStreamTimer = setTimeout(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all the timer gunk we have now I wonder if it would make sense to either add methods to the async queue to just do this for us or add a special clock/delay type that you could pass in to the constructor to allow us to test these kinds of things directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the async queue (using the cancellation support I added).

@@ -236,8 +178,8 @@ export class RemoteStore {
*/
disableNetwork(): Promise<void> {
this.disableNetworkInternal();
// Set the OnlineState to failed so get()'s return from cache, etc.
this.updateOnlineState(OnlineState.Failed);
// Set the OnlineState to Offline so get()'s return from cache, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use apostrophes for making plurals. (it should be "get()s" or just "gets").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Returns whether a watch stream is currently necessary (network is enabled
* and we have active watch targets).
*/
private isWatchStreamNecessary(): boolean {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this extra concept that's confusingly similar to shouldStartWatchStream, especially since it's only used in the implementation of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good call. This is an artifact of a previous implementation attempt. Unwound.

@wilhuff wilhuff assigned mikelehen and unassigned wilhuff Jan 11, 2018
@wilhuff
Copy link
Contributor

wilhuff commented Jan 11, 2018

Despite the nitpicking in here I really like the direction this is heading.

mikelehen pushed a commit that referenced this pull request Feb 5, 2018
* Introduces a DelayedOperation helper class in AsyncQueue to encapsulate
  delayed op logic.
* Adds cancellation support which I want to use in
  #412
* Remove delayedOperationsCount in favor of keeping delayedOperations populated
  correctly at all times.
* Fixes a preexisting issue in AsyncQueue.schedule() where the returned promise
  would always be resolved with undefined, instead of the result of your op.
  Also remove an AnyDuringMigration usage.
mikelehen added a commit that referenced this pull request Feb 6, 2018
…on. (#489)

* Introduces a DelayedOperation helper class in AsyncQueue to encapsulate
  delayed op logic.
* Adds cancellation support which I want to use in
  #412
* Updates the idle timer in persistent_stream.ts to use new cancellation support.
* Remove delayedOperationsCount in favor of keeping delayedOperations populated
  correctly at all times.
* Fixes a preexisting issue in AsyncQueue.schedule() where the returned promise
  would always be resolved with undefined, instead of the result of your op.
  Also remove an AnyDuringMigration usage.
Michael Lehenbauer added 2 commits February 22, 2018 16:26
* Addressed mcg@'s review feedback.
* Moved OnlineState timeout to AsyncQueue using new cancelable delayed
  operation capabilities.
* Added a dummy .catch() handler to the CancelablePromises returned by
  AsyncQueue.enqueueAfterDelay() to avoid UnhandledPromiseRejection log spam
  due to canceled timers.
* Added a SpecTest to verify OnlineState timeout behavior.
  * Added ability to run timers from spec tests (including assigning string
    names to TimerId enum values)
  * Added TimerId.All to match iOS and make it easier to run all timers from spec
    tests.
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this State. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@mikelehen mikelehen changed the base branch from firestore-api-changes to master February 23, 2018 17:00
Copy link
Contributor Author

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the looong delay, but I think this is finally ready for review again. I addressed the feedback, moved to the new cancelable delayed operation support via AsyncQueue, and added a spec test.

If you want, I could probably split this into at least a couple PRs (e.g. I could separate out the "cleanup" piece of the PR description). Let me know if you'd like me to do that.

@@ -51,19 +51,14 @@ import {
WatchTargetChange,
WatchTargetChangeState
} from './watch_change';
import { clearTimeout } from 'timers';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

) {}
onlineStateHandler: (onlineState: OnlineState) => void
) {
this.onlineStateTracker = new OnlineStateTracker(onlineStateHandler);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... I think I prefer to keep OnlineStateTracker as an implementation detail of RemoteStore. I.e.: I think it's better for RemoteStore to know about onlineStateHandler than for FirestoreClient to know about OnlineStateTracker. But feel free to make a case otherwise. I do notice that if we passed OnlineStateTracker into RemoteStore, we could keep RemoteStore unaware of the AsyncQueue, but I don't think that's necessary a goal.

@@ -236,8 +178,8 @@ export class RemoteStore {
*/
disableNetwork(): Promise<void> {
this.disableNetworkInternal();
// Set the OnlineState to failed so get()'s return from cache, etc.
this.updateOnlineState(OnlineState.Failed);
// Set the OnlineState to Offline so get()'s return from cache, etc.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Returns whether a watch stream is currently necessary (network is enabled
* and we have active watch targets).
*/
private isWatchStreamNecessary(): boolean {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good call. This is an artifact of a previous implementation attempt. Unwound.

this.setAndBroadcast(OnlineState.Unknown);

if (this.watchStreamTimer === null) {
this.watchStreamTimer = setTimeout(() => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the async queue (using the cancellation support I added).

* then we set the OnlineState to Offline, and the client will behave as if
* it is offline (get()s will return cached data, etc.).
*/
export class OnlineStateTracker {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be since this timer goes away once you're connected. And the idle timers start once you're connected... And with the improved testing, if both timers were active at once, runDelayedOperationsEarly() would run both so we'd be exercising them together, at least in theory...


if (this.watchStreamTimerPromise === null) {
this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay(
TimerId.OnlineStateTimeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this cluster of names confusingly dissimlar. Why is this an "OnlineStateTimeout", the promise is a "watchStreamTimerPromise", and the timeout a "MAX_WATCH_STREAM_TIMEOUT_MS"?

These all seem related to the handling the initial connection but:

  • OnlineStateTimeout seems like it might be applicable to other transitions
  • watchStreamTimer seems wholly unrelated to that purpose
  • MAX_WATCH_STREAM_TIMEOUT_MS seems just as likely to apply to an idle timeout. Also, why "MAX"? Isn't this just the timeout duration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points.

I renamed:

  • MAX_WATCH_STREAM_TIMEOUT_MS => ONLINE_STATE_TIMEOUT_MS
  • watchStreamTimerPromise => onlineStateTimer

And I clarified some comments. I agree that OnlineStateTimeout is a somewhat imprecise name (could be applicable to other transitions). We could do "OnlineStateUnknownTimeout", but I don't think that's much clearer (and makes all of these names longer) and so I'm inclined to just lean on comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a marked improvement and LGTM as-is. Thanks. Agree that leaning on comments can be the way to go in cases like these.

I'm still torn about this because the connection timer seems very much related to what we're doing here: namely if we're ever connecting our state starts at unknown. If we logically fail to connect (i.e. fail to transition out of unknown) then this timer fires. It seems like, for all intents and purposes, OnlineStateUnknownTimeout is the real connection timeout. Could that other thing we have be some lower-level concept?

Or maybe the name of OnlineStateUnknownTimeout itself is caught in my craw. Our other timeouts are named in terms of the end state, not the starting state. IOW: how long until some state, not how long in some state. How long until idle, how long until connected, etc.

This comment isn't super actionable but I'm still uneasy, and if history has shown anything: this code is fiddly, keeps changing, and we keep misunderstanding it over time, so if you have the energy I'd like to try to find a way to make it more immediately obvious and less fuzzy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that our timers have been fiddly / confusing and getting this right is worthwhile. :-)

So right now our timers are: ListenStreamIdle / ListenStreamConnection, WriteStreamIdle / WriteStreamConnection, OnlineStateTimeout

The Connection timers are used to schedule delayed stream attempts subject to exponential backoff... Not all stream attempts are delayed in which case the "Connection" timer doesn't come into play (e.g the first attempt once a stream becomes necessary, or the first attempt after an established stream encounters an error, etc.).

But the OnlineStateTimeout is still relevant when the stream attempt isn't delayed (e.g. it'll fire if the first stream attempt doesn't succeed / fail within 10 seconds, triggering us to OnlineState.Offline while we wait).

And the Connection timers will continue to be used even once the OnlineStateTimeout has triggered (because we'll continue scheduling stream attempts with backoff even after we've entered OnlineState.Offline).

So I think they're not actually that related, and we probably have a naming problem that makes them seem related. We could try to be more verbose, e.g. ListenStreamDelayedRestart instead of ListenStreamConnection.

In naming these, it's worth thinking about their usage from tests, e.g.: runDelayedOperationsUntil(TimerId.ListenStreamDelayedRestart) to make a test skip the exponential backoff delay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK could the connection timer ids really be backoff timer ids? This is essentially where we use them:

    this.backoff = new ExponentialBackoff(
      queue,
      connectionTimerId,
      BACKOFF_INITIAL_DELAY_MS,
      BACKOFF_FACTOR,
      BACKOFF_MAX_DELAY_MS
    );

TimerId.ListenStreamBackoff => runDelayedOperationsUntil(TimerId.ListenStreamBackoff)?

I don't think the fact that the online state timeout is for succeeding or failing much matters. Connection timeouts pretty much always work this way: they introduce a client-generated error if the connection hasn't already succeeded or failed on its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly what they are, except "backoff" describes an implementation detail about how the timers are scheduled rather than what they actually are for. You could hypothetically imagine that the stream could have multiple timers, for different purposes, that happen to use exponential backoff scheduling, at which point "ListenStreamBackoff" would be meaningless (as opposed to "ListenStreamDelayedRestart" or whatever).

That said, we probably conflate "backoff" with the act of reconnecting already, and so I'd be okay with ListenStreamBackoff if you think that's clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I still do think there's a meaningful distinction between our "online state timeout" and a "connection timeout" because we explicitly don't cancel the existing stream attempt and start a new one... Instead we're relying on grpc's internal timeout / retry connection logic.

);
this.logClientOfflineWarningIfNecessary();
this.setAndBroadcast(OnlineState.Offline);
return Promise.resolve();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this failure to connect in time bump the number of watch stream failures? If not could you please add comments why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

) {}
onlineStateHandler: (onlineState: OnlineState) => void
) {
this.onlineStateTracker = new OnlineStateTracker(onlineStateHandler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a strong case either way. The general principle I was following was to invert dependencies where possible, the general rule being to pass your dependencies rather than new them up.

This makes it possible to construct a testing OnlineStateTracker if we ever wanted to do so and pushes all the object construction into one place. While it's true that FirestoreClient doesn't need to know about OnlineStateTracker, FirestoreClient is the assembly point for the rest of our constellation of objects so it doesn't seem like a bad place.

ListenStreamConnection = 'listen_stream_connection',
WriteStreamIdle = 'write_stream_idle',
WriteStreamConnection = 'write_stream_connection',
OnlineStateTimeout = 'online_state_timeout'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how OnlineStateTimeout is different from either the connection or idle timeouts. I'm not sure if I'm asking for it to be deduplicated against those others, comments on the individual timer ids, or some kind of theory of operation primer on how these all fit together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments that hopefully makes things somewhat clearer. FWIW, coming up with names for these is a bit tricky because they're meant to be high-level concepts (since they'll be triggered from high-level tests, etc.) and so e.g. "Connection" was chosen over "Backoff"

Copy link
Contributor Author

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feedback. PTAL!


if (this.watchStreamTimerPromise === null) {
this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay(
TimerId.OnlineStateTimeout,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points.

I renamed:

  • MAX_WATCH_STREAM_TIMEOUT_MS => ONLINE_STATE_TIMEOUT_MS
  • watchStreamTimerPromise => onlineStateTimer

And I clarified some comments. I agree that OnlineStateTimeout is a somewhat imprecise name (could be applicable to other transitions). We could do "OnlineStateUnknownTimeout", but I don't think that's much clearer (and makes all of these names longer) and so I'm inclined to just lean on comments.

);
this.logClientOfflineWarningIfNecessary();
this.setAndBroadcast(OnlineState.Offline);
return Promise.resolve();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

ListenStreamConnection = 'listen_stream_connection',
WriteStreamIdle = 'write_stream_idle',
WriteStreamConnection = 'write_stream_connection',
OnlineStateTimeout = 'online_state_timeout'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments that hopefully makes things somewhat clearer. FWIW, coming up with names for these is a bit tricky because they're meant to be high-level concepts (since they'll be triggered from high-level tests, etc.) and so e.g. "Connection" was chosen over "Backoff"

@mikelehen mikelehen force-pushed the mikelehen/onlinestate-timeout branch from 5050675 to b5d412f Compare March 1, 2018 03:05
Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


if (this.watchStreamTimerPromise === null) {
this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay(
TimerId.OnlineStateTimeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a marked improvement and LGTM as-is. Thanks. Agree that leaning on comments can be the way to go in cases like these.

I'm still torn about this because the connection timer seems very much related to what we're doing here: namely if we're ever connecting our state starts at unknown. If we logically fail to connect (i.e. fail to transition out of unknown) then this timer fires. It seems like, for all intents and purposes, OnlineStateUnknownTimeout is the real connection timeout. Could that other thing we have be some lower-level concept?

Or maybe the name of OnlineStateUnknownTimeout itself is caught in my craw. Our other timeouts are named in terms of the end state, not the starting state. IOW: how long until some state, not how long in some state. How long until idle, how long until connected, etc.

This comment isn't super actionable but I'm still uneasy, and if history has shown anything: this code is fiddly, keeps changing, and we keep misunderstanding it over time, so if you have the energy I'd like to try to find a way to make it more immediately obvious and less fuzzy.

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM. Future me really thanks for taking the time to make this better.

@mikelehen mikelehen merged commit 0fa319e into master Mar 3, 2018
@mikelehen mikelehen deleted the mikelehen/onlinestate-timeout branch March 3, 2018 01:23
@firebase firebase locked and limited conversation to collaborators Oct 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants