Add timeout to OnlineState tracking. #412

mikelehen · 2018-01-05T16:52:40Z

Add 10 second timeout to OnlineState tracking.

Added a timer that triggers after 10 seconds without a successful stream attempt, reverting to "offline" behavior, potentially triggering data from cache.
Cleanup:
- Renamed OnlineState enum values to Offline, Online, and Unknown.
- Refactored "online state tracking" into its own OnlineStateTracker class.
- Added a dummy .catch() handler to the CancelablePromises returned by
  AsyncQueue.enqueueAfterDelay() to avoid UnhandledPromiseRejection log spam
  due to canceled timers.
Added a SpecTest to verify new timeout behavior.
- Added ability to run timers from spec tests (including assigning string
  names to TimerId enum values)
- Added TimerId.All to match iOS and make it easier to run all timers from spec
  tests.

wilhuff · 2018-01-11T17:12:48Z

packages/firestore/src/remote/remote_store.ts

@@ -51,19 +51,14 @@ import {
  WatchTargetChange,
  WatchTargetChangeState
 } from './watch_change';
+import { clearTimeout } from 'timers';


This is no longer necessary, right?

wilhuff · 2018-01-11T17:14:19Z

packages/firestore/src/remote/remote_store.ts

-  ) {}
+    onlineStateHandler: (onlineState: OnlineState) => void
+  ) {
+    this.onlineStateTracker = new OnlineStateTracker(onlineStateHandler);


Might as well pass the OnlineStateTracker in so the RemoteStore doesn't even have to know about the onlineStateHandler.

Hrm... I think I prefer to keep OnlineStateTracker as an implementation detail of RemoteStore. I.e.: I think it's better for RemoteStore to know about onlineStateHandler than for FirestoreClient to know about OnlineStateTracker. But feel free to make a case otherwise. I do notice that if we passed OnlineStateTracker into RemoteStore, we could keep RemoteStore unaware of the AsyncQueue, but I don't think that's necessary a goal.

I don't think there's a strong case either way. The general principle I was following was to invert dependencies where possible, the general rule being to pass your dependencies rather than new them up.

This makes it possible to construct a testing OnlineStateTracker if we ever wanted to do so and pushes all the object construction into one place. While it's true that FirestoreClient doesn't need to know about OnlineStateTracker, FirestoreClient is the assembly point for the rest of our constellation of objects so it doesn't seem like a bad place.

wilhuff · 2018-01-11T17:16:34Z

packages/firestore/src/remote/online_state_tracker.ts

+ * then we set the OnlineState to Offline, and the client will behave as if
+ * it is offline (get()s will return cached data, etc.).
+ */
+export class OnlineStateTracker {


I can't think of any way it would happen but just in case: are there any interactions between this class and the idle timeout timers?

There shouldn't be since this timer goes away once you're connected. And the idle timers start once you're connected... And with the improved testing, if both timers were active at once, runDelayedOperationsEarly() would run both so we'd be exercising them together, at least in theory...

wilhuff · 2018-01-11T17:20:26Z

packages/firestore/src/remote/online_state_tracker.ts

+    this.setAndBroadcast(OnlineState.Unknown);
+
+    if (this.watchStreamTimer === null) {
+      this.watchStreamTimer = setTimeout(() => {


Given all the timer gunk we have now I wonder if it would make sense to either add methods to the async queue to just do this for us or add a special clock/delay type that you could pass in to the constructor to allow us to test these kinds of things directly.

Added to the async queue (using the cancellation support I added).

wilhuff · 2018-01-11T17:22:38Z

packages/firestore/src/remote/remote_store.ts

@@ -236,8 +178,8 @@ export class RemoteStore {
   */
  disableNetwork(): Promise<void> {
    this.disableNetworkInternal();
-    // Set the OnlineState to failed so get()'s return from cache, etc.
-    this.updateOnlineState(OnlineState.Failed);
+    // Set the OnlineState to Offline so get()'s return from cache, etc.


Don't use apostrophes for making plurals. (it should be "get()s" or just "gets").

wilhuff · 2018-01-11T17:25:38Z

packages/firestore/src/remote/remote_store.ts

+   * Returns whether a watch stream is currently necessary (network is enabled
+   * and we have active watch targets).
+   */
+  private isWatchStreamNecessary(): boolean {


I'm not a fan of this extra concept that's confusingly similar to shouldStartWatchStream, especially since it's only used in the implementation of it.

Oops, good call. This is an artifact of a previous implementation attempt. Unwound.

wilhuff · 2018-01-11T17:26:56Z

Despite the nitpicking in here I really like the direction this is heading.

* Introduces a DelayedOperation helper class in AsyncQueue to encapsulate delayed op logic. * Adds cancellation support which I want to use in #412 * Remove delayedOperationsCount in favor of keeping delayedOperations populated correctly at all times. * Fixes a preexisting issue in AsyncQueue.schedule() where the returned promise would always be resolved with undefined, instead of the result of your op. Also remove an AnyDuringMigration usage.

…on. (#489) * Introduces a DelayedOperation helper class in AsyncQueue to encapsulate delayed op logic. * Adds cancellation support which I want to use in #412 * Updates the idle timer in persistent_stream.ts to use new cancellation support. * Remove delayedOperationsCount in favor of keeping delayedOperations populated correctly at all times. * Fixes a preexisting issue in AsyncQueue.schedule() where the returned promise would always be resolved with undefined, instead of the result of your op. Also remove an AnyDuringMigration usage.

* Addressed mcg@'s review feedback. * Moved OnlineState timeout to AsyncQueue using new cancelable delayed operation capabilities. * Added a dummy .catch() handler to the CancelablePromises returned by AsyncQueue.enqueueAfterDelay() to avoid UnhandledPromiseRejection log spam due to canceled timers. * Added a SpecTest to verify OnlineState timeout behavior. * Added ability to run timers from spec tests (including assigning string names to TimerId enum values) * Added TimerId.All to match iOS and make it easier to run all timers from spec tests.

googlebot · 2018-02-23T16:59:27Z

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this State. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

mikelehen

Sorry for the looong delay, but I think this is finally ready for review again. I addressed the feedback, moved to the new cancelable delayed operation support via AsyncQueue, and added a spec test.

If you want, I could probably split this into at least a couple PRs (e.g. I could separate out the "cleanup" piece of the PR description). Let me know if you'd like me to do that.

mikelehen · 2018-02-05T16:54:29Z

packages/firestore/src/remote/remote_store.ts

@@ -51,19 +51,14 @@ import {
  WatchTargetChange,
  WatchTargetChangeState
 } from './watch_change';
+import { clearTimeout } from 'timers';


mikelehen · 2018-02-05T16:56:50Z

packages/firestore/src/remote/remote_store.ts

-  ) {}
+    onlineStateHandler: (onlineState: OnlineState) => void
+  ) {
+    this.onlineStateTracker = new OnlineStateTracker(onlineStateHandler);


Hrm... I think I prefer to keep OnlineStateTracker as an implementation detail of RemoteStore. I.e.: I think it's better for RemoteStore to know about onlineStateHandler than for FirestoreClient to know about OnlineStateTracker. But feel free to make a case otherwise. I do notice that if we passed OnlineStateTracker into RemoteStore, we could keep RemoteStore unaware of the AsyncQueue, but I don't think that's necessary a goal.

mikelehen · 2018-02-05T17:00:07Z

packages/firestore/src/remote/remote_store.ts

@@ -236,8 +178,8 @@ export class RemoteStore {
   */
  disableNetwork(): Promise<void> {
    this.disableNetworkInternal();
-    // Set the OnlineState to failed so get()'s return from cache, etc.
-    this.updateOnlineState(OnlineState.Failed);
+    // Set the OnlineState to Offline so get()'s return from cache, etc.


mikelehen · 2018-02-05T17:05:18Z

packages/firestore/src/remote/remote_store.ts

+   * Returns whether a watch stream is currently necessary (network is enabled
+   * and we have active watch targets).
+   */
+  private isWatchStreamNecessary(): boolean {


Oops, good call. This is an artifact of a previous implementation attempt. Unwound.

mikelehen · 2018-02-07T15:15:23Z

packages/firestore/src/remote/online_state_tracker.ts

+    this.setAndBroadcast(OnlineState.Unknown);
+
+    if (this.watchStreamTimer === null) {
+      this.watchStreamTimer = setTimeout(() => {


Added to the async queue (using the cancellation support I added).

mikelehen · 2018-02-23T17:02:31Z

packages/firestore/src/remote/online_state_tracker.ts

+ * then we set the OnlineState to Offline, and the client will behave as if
+ * it is offline (get()s will return cached data, etc.).
+ */
+export class OnlineStateTracker {


There shouldn't be since this timer goes away once you're connected. And the idle timers start once you're connected... And with the improved testing, if both timers were active at once, runDelayedOperationsEarly() would run both so we'd be exercising them together, at least in theory...

wilhuff · 2018-02-27T18:33:36Z

packages/firestore/src/remote/online_state_tracker.ts

+
+    if (this.watchStreamTimerPromise === null) {
+      this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay(
+        TimerId.OnlineStateTimeout,


I find this cluster of names confusingly dissimlar. Why is this an "OnlineStateTimeout", the promise is a "watchStreamTimerPromise", and the timeout a "MAX_WATCH_STREAM_TIMEOUT_MS"?

These all seem related to the handling the initial connection but:

OnlineStateTimeout seems like it might be applicable to other transitions

watchStreamTimer seems wholly unrelated to that purpose

MAX_WATCH_STREAM_TIMEOUT_MS seems just as likely to apply to an idle timeout. Also, why "MAX"? Isn't this just the timeout duration?

Good points.

I renamed:

MAX_WATCH_STREAM_TIMEOUT_MS => ONLINE_STATE_TIMEOUT_MS

watchStreamTimerPromise => onlineStateTimer

And I clarified some comments. I agree that OnlineStateTimeout is a somewhat imprecise name (could be applicable to other transitions). We could do "OnlineStateUnknownTimeout", but I don't think that's much clearer (and makes all of these names longer) and so I'm inclined to just lean on comments.

This is a marked improvement and LGTM as-is. Thanks. Agree that leaning on comments can be the way to go in cases like these.

I'm still torn about this because the connection timer seems very much related to what we're doing here: namely if we're ever connecting our state starts at unknown. If we logically fail to connect (i.e. fail to transition out of unknown) then this timer fires. It seems like, for all intents and purposes, OnlineStateUnknownTimeout is the real connection timeout. Could that other thing we have be some lower-level concept?

Or maybe the name of OnlineStateUnknownTimeout itself is caught in my craw. Our other timeouts are named in terms of the end state, not the starting state. IOW: how long until some state, not how long in some state. How long until idle, how long until connected, etc.

This comment isn't super actionable but I'm still uneasy, and if history has shown anything: this code is fiddly, keeps changing, and we keep misunderstanding it over time, so if you have the energy I'd like to try to find a way to make it more immediately obvious and less fuzzy.

I agree that our timers have been fiddly / confusing and getting this right is worthwhile. :-)

So right now our timers are: ListenStreamIdle / ListenStreamConnection, WriteStreamIdle / WriteStreamConnection, OnlineStateTimeout

The Connection timers are used to schedule delayed stream attempts subject to exponential backoff... Not all stream attempts are delayed in which case the "Connection" timer doesn't come into play (e.g the first attempt once a stream becomes necessary, or the first attempt after an established stream encounters an error, etc.).

But the OnlineStateTimeout is still relevant when the stream attempt isn't delayed (e.g. it'll fire if the first stream attempt doesn't succeed / fail within 10 seconds, triggering us to OnlineState.Offline while we wait).

And the Connection timers will continue to be used even once the OnlineStateTimeout has triggered (because we'll continue scheduling stream attempts with backoff even after we've entered OnlineState.Offline).

So I think they're not actually that related, and we probably have a naming problem that makes them seem related. We could try to be more verbose, e.g. ListenStreamDelayedRestart instead of ListenStreamConnection.

In naming these, it's worth thinking about their usage from tests, e.g.: runDelayedOperationsUntil(TimerId.ListenStreamDelayedRestart) to make a test skip the exponential backoff delay.

OK could the connection timer ids really be backoff timer ids? This is essentially where we use them:

this.backoff = new ExponentialBackoff( queue, connectionTimerId, BACKOFF_INITIAL_DELAY_MS, BACKOFF_FACTOR, BACKOFF_MAX_DELAY_MS );

TimerId.ListenStreamBackoff => runDelayedOperationsUntil(TimerId.ListenStreamBackoff)?

I don't think the fact that the online state timeout is for succeeding or failing much matters. Connection timeouts pretty much always work this way: they introduce a client-generated error if the connection hasn't already succeeded or failed on its own.

Yes, that's exactly what they are, except "backoff" describes an implementation detail about how the timers are scheduled rather than what they actually are for. You could hypothetically imagine that the stream could have multiple timers, for different purposes, that happen to use exponential backoff scheduling, at which point "ListenStreamBackoff" would be meaningless (as opposed to "ListenStreamDelayedRestart" or whatever).

That said, we probably conflate "backoff" with the act of reconnecting already, and so I'd be okay with ListenStreamBackoff if you think that's clearer.

Also, I still do think there's a meaningful distinction between our "online state timeout" and a "connection timeout" because we explicitly don't cancel the existing stream attempt and start a new one... Instead we're relying on grpc's internal timeout / retry connection logic.

wilhuff · 2018-02-27T18:35:42Z

packages/firestore/src/remote/online_state_tracker.ts

+          );
+          this.logClientOfflineWarningIfNecessary();
+          this.setAndBroadcast(OnlineState.Offline);
+          return Promise.resolve();


Should this failure to connect in time bump the number of watch stream failures? If not could you please add comments why not?

Added a comment.

wilhuff · 2018-02-27T18:42:44Z

packages/firestore/src/remote/remote_store.ts

-  ) {}
+    onlineStateHandler: (onlineState: OnlineState) => void
+  ) {
+    this.onlineStateTracker = new OnlineStateTracker(onlineStateHandler);


I don't think there's a strong case either way. The general principle I was following was to invert dependencies where possible, the general rule being to pass your dependencies rather than new them up.

This makes it possible to construct a testing OnlineStateTracker if we ever wanted to do so and pushes all the object construction into one place. While it's true that FirestoreClient doesn't need to know about OnlineStateTracker, FirestoreClient is the assembly point for the rest of our constellation of objects so it doesn't seem like a bad place.

wilhuff · 2018-02-27T18:46:27Z

packages/firestore/src/util/async_queue.ts

+  ListenStreamConnection = 'listen_stream_connection',
+  WriteStreamIdle = 'write_stream_idle',
+  WriteStreamConnection = 'write_stream_connection',
+  OnlineStateTimeout = 'online_state_timeout'


I don't understand how OnlineStateTimeout is different from either the connection or idle timeouts. I'm not sure if I'm asking for it to be deduplicated against those others, comments on the individual timer ids, or some kind of theory of operation primer on how these all fit together.

I added comments that hopefully makes things somewhat clearer. FWIW, coming up with names for these is a bit tricky because they're meant to be high-level concepts (since they'll be triggered from high-level tests, etc.) and so e.g. "Connection" was chosen over "Backoff"

mikelehen

Good feedback. PTAL!

mikelehen · 2018-03-01T02:47:01Z

packages/firestore/src/remote/online_state_tracker.ts

+
+    if (this.watchStreamTimerPromise === null) {
+      this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay(
+        TimerId.OnlineStateTimeout,


Good points.

I renamed:

MAX_WATCH_STREAM_TIMEOUT_MS => ONLINE_STATE_TIMEOUT_MS

watchStreamTimerPromise => onlineStateTimer

And I clarified some comments. I agree that OnlineStateTimeout is a somewhat imprecise name (could be applicable to other transitions). We could do "OnlineStateUnknownTimeout", but I don't think that's much clearer (and makes all of these names longer) and so I'm inclined to just lean on comments.

mikelehen · 2018-03-01T02:56:06Z

packages/firestore/src/remote/online_state_tracker.ts

+          );
+          this.logClientOfflineWarningIfNecessary();
+          this.setAndBroadcast(OnlineState.Offline);
+          return Promise.resolve();


Added a comment.

mikelehen · 2018-03-01T03:00:10Z

packages/firestore/src/util/async_queue.ts

+  ListenStreamConnection = 'listen_stream_connection',
+  WriteStreamIdle = 'write_stream_idle',
+  WriteStreamConnection = 'write_stream_connection',
+  OnlineStateTimeout = 'online_state_timeout'


I added comments that hopefully makes things somewhat clearer. FWIW, coming up with names for these is a bit tricky because they're meant to be high-level concepts (since they'll be triggered from high-level tests, etc.) and so e.g. "Connection" was chosen over "Backoff"

wilhuff

LGTM

wilhuff · 2018-03-01T03:44:27Z

packages/firestore/src/remote/online_state_tracker.ts

+
+    if (this.watchStreamTimerPromise === null) {
+      this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay(
+        TimerId.OnlineStateTimeout,


This is a marked improvement and LGTM as-is. Thanks. Agree that leaning on comments can be the way to go in cases like these.

I'm still torn about this because the connection timer seems very much related to what we're doing here: namely if we're ever connecting our state starts at unknown. If we logically fail to connect (i.e. fail to transition out of unknown) then this timer fires. It seems like, for all intents and purposes, OnlineStateUnknownTimeout is the real connection timeout. Could that other thing we have be some lower-level concept?

Or maybe the name of OnlineStateUnknownTimeout itself is caught in my craw. Our other timeouts are named in terms of the end state, not the starting state. IOW: how long until some state, not how long in some state. How long until idle, how long until connected, etc.

This comment isn't super actionable but I'm still uneasy, and if history has shown anything: this code is fiddly, keeps changing, and we keep misunderstanding it over time, so if you have the energy I'd like to try to find a way to make it more immediately obvious and less fuzzy.

wilhuff

Still LGTM. Future me really thanks for taking the time to make this better.

Add timeout to OnlineState tracking.

e837136

mikelehen added api: firestore enhancement labels Jan 5, 2018

mikelehen assigned wilhuff Jan 5, 2018

mikelehen requested review from schmidt-sebastian and wilhuff as code owners January 5, 2018 16:52

google-oss-bot added the needs-triage label Jan 5, 2018

mikelehen removed the needs-triage label Jan 5, 2018

Add reminder to dispatch timeout onto AsyncQueue.

46588e6

wilhuff reviewed Jan 11, 2018

View reviewed changes

wilhuff assigned mikelehen and unassigned wilhuff Jan 11, 2018

wilhuff mentioned this pull request Jan 22, 2018

[Firestore] getDocumentsWithCompletion does not pull from cache when app is in offline mode. firebase/firebase-ios-sdk#485

Closed

mikelehen mentioned this pull request Feb 5, 2018

Refactor AsyncQueue's "delayed scheduling" support and add cancellation. #489

Merged

Michael Lehenbauer added 2 commits February 22, 2018 16:26

Merge branch 'master' into mikelehen/onlinestate-timeout

a85ac2d

mikelehen requested review from bojeil-google, gauntface, jshcrowthe, sphippen, TMSCH and wti806 as code owners February 23, 2018 16:59

mikelehen changed the base branch from firestore-api-changes to master February 23, 2018 17:00

mikelehen commented Feb 23, 2018

View reviewed changes

mikelehen removed request for gauntface and sphippen February 23, 2018 17:10

mikelehen removed request for TMSCH, jshcrowthe, bojeil-google and wti806 February 23, 2018 17:10

mikelehen assigned wilhuff and unassigned mikelehen Feb 23, 2018

mikelehen mentioned this pull request Feb 25, 2018

[Multi-Tab] Master Election #501

Merged

Update changelog.

e4819fe

wilhuff reviewed Feb 27, 2018

View reviewed changes

Tweak naming / comments based on CR feedback.

b5d412f

mikelehen commented Mar 1, 2018

View reviewed changes

mikelehen force-pushed the mikelehen/onlinestate-timeout branch from 5050675 to b5d412f Compare March 1, 2018 03:05

Merge branch 'master' into mikelehen/onlinestate-timeout

6722528

wilhuff approved these changes Mar 1, 2018

View reviewed changes

Michael Lehenbauer added 3 commits March 2, 2018 12:08

Tweak timer names: Connection => ConnectionBackoff.

cd7e71e

Merge branch 'master' into mikelehen/onlinestate-timeout

23cd42a

[AUTOMATED]: Prettier Code Styling

8925ae6

wilhuff approved these changes Mar 2, 2018

View reviewed changes

Michael Lehenbauer added 2 commits March 2, 2018 14:08

Merge branch 'master' into mikelehen/onlinestate-timeout

5c74573

comment tweak.

de41073

mikelehen merged commit 0fa319e into master Mar 3, 2018

mikelehen deleted the mikelehen/onlinestate-timeout branch March 3, 2018 01:23

firebase locked and limited conversation to collaborators Oct 23, 2019

Add timeout to OnlineState tracking. #412

Add timeout to OnlineState tracking. #412

Conversation

mikelehen commented Jan 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wilhuff commented Jan 11, 2018

googlebot commented Feb 23, 2018

mikelehen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikelehen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wilhuff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wilhuff left a comment

Choose a reason for hiding this comment

mikelehen commented Jan 5, 2018 •

edited

Loading