Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi-Tab] Master Election #501

Merged
merged 19 commits into from
Mar 1, 2018

Conversation

schmidt-sebastian
Copy link
Contributor

@schmidt-sebastian schmidt-sebastian commented Feb 8, 2018

This PR adds "master election" to Firestore (which isn't quite as complicated as it sounds thanks to IndexedDb).

The PR lays the groundwork for two different persistence modes: Some operations can require the primary lease, and some operations can run without it. Hence, the runTransaction function now takes in a parameter to indicate whether it needs the primary lease. A lot of the code in this PR is formatting changes due to this additional argument, and if deemed reasonable, I can split this up and move it to a different PR.

I am also adding Spec test for multi-client workflows. I added a new spec builder that deals with multi-client tests. There's a lot of code in there that doesn't do anything but change the return value of the super class, and if we combine the MultiClientSpecBuilder and the SpecBuilder, this would go away. The main benefit of the separation is that we can easily hide this from Android and iOS.

I also added a MockPlatform to override some of the document functionality to use it in testing. This allows me to alter its behavior per client.

@schmidt-sebastian schmidt-sebastian changed the base branch from master to firestore-multi-tab February 9, 2018 21:22
@schmidt-sebastian schmidt-sebastian force-pushed the multitab-masterelection branch 2 times, most recently from b48223b to 89cd193 Compare February 12, 2018 23:51
@schmidt-sebastian schmidt-sebastian changed the title NOT FOR REVIEW [Multi-Tab] Master Election Feb 12, 2018
@schmidt-sebastian schmidt-sebastian force-pushed the multitab-masterelection branch 2 times, most recently from ca5e473 to 9d94ed0 Compare February 13, 2018 06:36
Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

Err, lots of naming / nitty feedback and a bit of AsyncQueue questions mixed in. I didn't look very closely at the spec test implementation in this pass, but the tests themselves look sweet (though I wonder if 2 tests is really exhaustive enough :-)). I'll take a closer look in the next iteration.

this.primaryStateListener.applyPrimaryState(this.isPrimary);
}
})
.then(() => this.isPrimary);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go in the previous then...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer applicable.

@@ -616,4 +622,9 @@ export class SyncEngine implements RemoteSyncer {
return this.remoteStore.handleUserChange(user);
});
}

applyPrimaryState(primaryClient: boolean): void {
// TODO(multitab): Apply the primary state internally on the AsyncQueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the comment on SyncEngine: The SyncEngine’s methods should only ever be called by methods running in the global async queue.

So this comment should be solved in the callers of this method rather than here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved all callers to the AsyncQueue, which let me do the rest of the cleanup I did in IndexedDbPersistence (much in line with what we discussed on HipChat)

@@ -299,6 +303,7 @@ export class FirestoreClient {

// Setup wiring between sync engine and remote store
this.remoteStore.syncEngine = this.syncEngine;
this.persistence.setPrimaryStateListener(this.syncEngine);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to call SyncEngine.applyPrimanyState() with the current state, but the LocalStore and RemoteStore have not been started yet, so SyncEngine will be in a not-yet-working state. So I suspect we should /not/ call it immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that we do call it immediately. It makes 'Master Election' for Memory Persistence much easier as SyncEngine can always start out in secondary mode. For Memory Persistence, Sync Engine will then always transition to primary during client initialization.

I have moved the initialization of the listener further down to address the dependencies between local store and remote store.

@@ -299,6 +303,7 @@ export class FirestoreClient {

// Setup wiring between sync engine and remote store
this.remoteStore.syncEngine = this.syncEngine;
this.persistence.setPrimaryStateListener(this.syncEngine);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW- Having a PrimaryStateListener interface with a single method feels a bit heavy-handed on web. You could probably just:

this.persistence.primaryStateListener = isPrimary => this.syncEngine.applyPrimaryState(isPrimary);

I don't feel very strongly though, so your call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this a single callback type, but still kept the original setter (partly so I could call it back immediately).


constructor(
private localStore: LocalStore,
private remoteStore: RemoteStore,
private currentUser: User
) {}

get isPrimaryClient() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this is only used for testing, and I suspect this should remain the case. Add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suspect you are right. Comment added.

@@ -26,18 +26,27 @@ import { SpecStep } from './spec_test_runner';
const EXCLUSIVE_TAG = 'exclusive';
// Persistence-related tests.
const PERSISTENCE_TAG = 'persistence';
// Multi-client related tests. Requires persistence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Requires persistence." is sort of confusing. Maybe remove it or change to "(which implies persistence needs to be enabled as well)" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

function getTestRunner(tags, persistenceEnabled): Function {
if (
!MULTI_CLIENT_TEST_FILTER(tags, persistenceEnabled) ||
!WEB_SPEC_TEST_FILTER(tags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to read... I'd just pass persistenceEnabled to WEB_SPEC_TEST_FILTER and combine these checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined


specTest('Single tab acquires primary lease', ['multi-client'], () => {
// This test simulates primary state handoff between two background tabs.
// With all instances are in the background, the first active tab acquires
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "are" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.shutdown()
.expectPrimaryState(false)
.client(1)
.restart()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be like refreshing the browser, right? That shouldn't be necessary, right? If we don't have a lighter-weight way to kick the client, add a TODO(multitab)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the refactor to use asyncQueue.enqueueWithDelay I can now simply trigger the required client state refresh. This is now tryAcquirePrimaryLease.

// With all instances are in the background, the first active tab acquires
// ownership.
return client(0)
.becomeHidden()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Or is this the default? I am not sure it makes sense to always call becomeHidden/Visible after calling client for the first time. Seems like there should be a default that you can rely on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This both necessary and the default. We need to schedule a step before we can verify expectations.

Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, woops. In case it wasn't obvious, the previous review was meant to be "Request changes" :-)

@mikelehen mikelehen removed their assignment Feb 14, 2018
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this State. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ough. So much work. Back to you!

@@ -299,6 +303,7 @@ export class FirestoreClient {

// Setup wiring between sync engine and remote store
this.remoteStore.syncEngine = this.syncEngine;
this.persistence.setPrimaryStateListener(this.syncEngine);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that we do call it immediately. It makes 'Master Election' for Memory Persistence much easier as SyncEngine can always start out in secondary mode. For Memory Persistence, Sync Engine will then always transition to primary during client initialization.

I have moved the initialization of the listener further down to address the dependencies between local store and remote store.

@@ -299,6 +303,7 @@ export class FirestoreClient {

// Setup wiring between sync engine and remote store
this.remoteStore.syncEngine = this.syncEngine;
this.persistence.setPrimaryStateListener(this.syncEngine);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this a single callback type, but still kept the original setter (partly so I could call it back immediately).


constructor(
private localStore: LocalStore,
private remoteStore: RemoteStore,
private currentUser: User
) {}

get isPrimaryClient() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suspect you are right. Comment added.

@@ -616,4 +622,9 @@ export class SyncEngine implements RemoteSyncer {
return this.remoteStore.handleUserChange(user);
});
}

applyPrimaryState(primaryClient: boolean): void {
// TODO(multitab): Apply the primary state internally on the AsyncQueue.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved all callers to the AsyncQueue, which let me do the rest of the cleanup I did in IndexedDbPersistence (much in line with what we discussed on HipChat)

*/
const CLIENT_STATE_MAX_AGE_MS = 5000;

/** Refresh interval for the client state. Currently set to four seconds. */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds five words to our codebase. I thought we were going for that... anyways, removed.

.shutdown()
.expectPrimaryState(false)
.client(1)
.restart()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the refactor to use asyncQueue.enqueueWithDelay I can now simply trigger the required client state refresh. This is now tryAcquirePrimaryLease.

@@ -89,17 +99,25 @@ const UNSUPPORTED_PLATFORM_ERROR_MSG =
* owner lease immediately regardless of the current lease timestamp.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private ensurePrimaryLease(
txn: SimpleDbTransaction
): PersistencePromise<void> {
return this.tryAcquirePrimaryLease(txn).next(owner => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this entire section based on our HipChat discussion. I only acquire the lease at the end, but verify that I can obtain it at the beginning.

@@ -280,63 +408,60 @@ export class IndexedDbPersistence implements Persistence {
});
}

/** Verifies that that `updateTimeMs` is within CLIENT_STATE_MAX_AGE_MS. */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// With all instances are in the background, the first active tab acquires
// ownership.
return client(0)
.becomeHidden()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This both necessary and the default. We need to schedule a step before we can verify expectations.

Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay and amount of feedback... If you make smaller PRs, I'll make fewer comments and be faster. :-)

@@ -309,6 +314,11 @@ export class FirestoreClient {
})
.then(() => {
return this.remoteStore.start();
})
.then(() => {
this.persistence.setPrimaryStateListener(isPrimary =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's a non-obvious ordering dependency here, I think this warrants a comment:

// NOTE: This will immediately call the listener, so we make sure to set it after localStore / remoteStore are started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doone.

const CLIENT_STATE_MAX_AGE_MS = 5000;
/**
* Maximum refresh interval for the primary lease. Used for extending a
* currently owned lease.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to suggest this should be "minimum" (since we'll refresh the lease at least once every 4 seconds). But I can see it the other way too (we'll wait at most 4 seconds before refreshing the lease). So maybe change the comment to something like:

The maximum amount of time we'll wait before refreshing the primary lease (but we may opportunistically refresh earlier if we are performing an IndexedDB operation already).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are obviously talking about the minimum maximum time here :)

Let me explain the three different concepts here. I have also updated the documentation in the code to clear up some of the confusion.

  • CLIENT_METADATA_MAX_AGE_MS is the existing concept and used to ignore state that is outdated.
  • CLIENT_METADATA_REFRESH_INTERVAL_MS is less than CLIENT_METADATA_MAX_AGE_MS and used by the primary instance to refresh its lease.

Secondary instances will try to grab the lease from the primary leaseholder once it expires. Ideally, if a lease expires in 10 ms, we want to to acquire it in 10.001 ms. This can end up causing a lot of churn for background tabs that will never acquire the primary lease - and hence, I introduced a minimum refresh interval.

As part of this latest revision, I renamed this minimum interval to MINIMUM_REFRESH_INTERVAL_MS. I don't foresee it being used outside of client metadata refreshes for now, but it is somewhat of an independent concept.

* Minimum interval for attemots to acquire the primary lease. Used when
* synchronizing client lease refreshes across clients.
*/
const PRIMARY_LEASE_MIN_REFRESH_INTERVAL_MS = 4000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er we have a MIN and MAX "Primary Lease Refresh Interval" and they're both the same value? I think we're also using these intervals to just refresh client metadata state, regardless of the primary lease. This is confusing. I wonder if we could simplify to just two constants:

/**
 * Oldest acceptable age in milliseconds for client metadata read from IndexedDB.
 * Client metadata and primary leases that are older than 5 seconds are ignored.
 */
const CLIENT_METADATA_MAX_AGE_MS = 5000;

/**
 * The minimum frequency at which clients will update their metadata, including
 * refreshing the primary lease if held or potentially trying to acquire it if not held.
 *
 * NOTE: Clients may opportunistically refresh their metadata earlier than this interval
 * if they're performing an IndexedDB operation already for another purpose.
 */
const CLIENT_METADATA_REFRESH_INTERVAL_MS = 4000;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment based on your suggestion.

*/
const PRIMARY_LEASE_MIN_REFRESH_INTERVAL_MS = 4000;
/** LocalStorage location to indicate a zombied primary key (see class comment). */
const ZOMBIE_PRIMARY_LOCALSTORAGE_SUFFIX = 'zombiedOwnerId';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZOMBIE => ZOMBIED ?
zombiedOwnerId => zombiedPrimaryId? (I don't think we care about back-compat here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// tslint:disable-next-line:no-any setTimeout() type differs on browser / node
private ownerLeaseRefreshHandle: any;
private primaryLeaseRefreshHandle: any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and the stopClientStateRefreshes() method are no longer used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if we should cancel our callback on the queue instead, or if we should just rely on the queue's shutdown.

* Note: Instances can only toggle between Primary and Secondary state if
* IndexedDB persistence is enabled and multiple clients are active. If this
* listener is registered with MemoryPersistence, the callback will be called
* exactly once marking the current instance as Primary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the call in shutdown(), then we should update this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer applicable.

const WEB_SPEC_TEST_FILTER = (tags: string[], persistenceEnabled: boolean) => {
return (
tags.indexOf(NO_WEB_TAG) === -1 &&
(tags.indexOf(MULTI_CLIENT_TAG) === -1 || persistenceEnabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this code now it looked incorrect that we don't check for PERSISTENCE_TAG too, and in fact we don't check it anywhere, and it doesn't exist on iOS or Android either. I think it was a temporary measure that we used before all the platforms had persistence. Can you remove the tag here (and its usage as 'persistence' in persistence_spec.test.ts)? Else I can do it against master, but it'll probably cause a conflict with this PR.

@@ -197,7 +197,7 @@ describeSpec('Persistence:', ['persistence'], () => {
.shutdown()
.expectPrimaryState(false)
.client(1)
.restart()
.tryAcquirePrimaryLease()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #412 lands, you can merge it and replace this with

.runTimer(TimerId.ClientStateRefresh)

Can you add a TODO somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO to spec_builder.ts's tryAcquirePrimaryLease.

@@ -788,6 +792,12 @@ abstract class TestRunner {
await this.remoteStore.enableNetwork();
}

private async doAcquirePrimaryLease(): Promise<void> {
expect(this.queue.containsDelayedOperation(TimerId.ClientStateRefresh)).to
.be.true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter since this code should go away, but FYI- runDelayedOperationsEarly() will throw if the timer doesn't exist, so this check should be unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


/** Shut down the client and close it network connection. */
shutdown?: boolean;

shutdown?: true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for review again. I also removed the 'persistence' tag from the Spec tests (in a separate commit).

Further, I added logging to log the last step of a failed spec tests. I find this really useful.

@@ -309,6 +314,11 @@ export class FirestoreClient {
})
.then(() => {
return this.remoteStore.start();
})
.then(() => {
this.persistence.setPrimaryStateListener(isPrimary =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doone.

* Oldest acceptable age in milliseconds for client states read from IndexedDB.
* Client state and primary leases that are older than 5 seconds are ignored.
*/
const CLIENT_STATE_MAX_AGE_MS = 5000;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to use the terminology here from SharedClientState, but forgot that these two datasets are completely unrelated :/ I prefer metadata here, which also conveys that this storage model is different from the data stored in WebStorage.

// tslint:disable-next-line:no-any setTimeout() type differs on browser / node
private ownerLeaseRefreshHandle: any;
private primaryLeaseRefreshHandle: any;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if we should cancel our callback on the queue instead, or if we should just rely on the queue's shutdown.

/** Our 'visibilitychange' listener if registered. */
private documentVisibilityHandler: ((e?: Event) => void) | null;

/** Callback for primary state notifications. */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

constructor(
prefix: string,
platform: Platform,
private readonly queue: AsyncQueue,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix is stored in the member localStoragePrefix, which to me seems like a better name when used internally in this class.
platform is only used to access .document and .window and doesn't need to be retained as a property.

@@ -373,10 +541,10 @@ export class IndexedDbPersistence implements Persistence {
* zombied due to their tab closing) from LocalStorage, or null if no such
* record exists.
*/
private getZombiedOwnerId(): string | null {
private getZombiedClientId(): string | null {
try {
const zombiedOwnerId = window.localStorage.getItem(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -61,6 +68,16 @@ export class MemoryPersistence implements Persistence {
return Promise.resolve();
}

tryBecomePrimary(): Promise<boolean> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not anymore. no-unused-variables should be catching this (https://palantir.github.io/tslint/rules/no-unused-variable)


setPrimaryStateListener(primaryStateListener: PrimaryStateListener) {
// All clients using memory persistence act as primary.
this.queue.enqueue(() => primaryStateListener(true));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the call during shutdown.

* Note: Instances can only toggle between Primary and Secondary state if
* IndexedDB persistence is enabled and multiple clients are active. If this
* listener is registered with MemoryPersistence, the callback will be called
* exactly once marking the current instance as Primary.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer applicable.

@@ -788,6 +792,12 @@ abstract class TestRunner {
await this.remoteStore.enableNetwork();
}

private async doAcquirePrimaryLease(): Promise<void> {
expect(this.queue.containsDelayedOperation(TimerId.ClientStateRefresh)).to
.be.true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@gauntface gauntface removed their request for review February 27, 2018 23:22
Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the scheduling logic needs some more love. PTAL.

constructor(
prefix: string,
platform: Platform,
private readonly queue: AsyncQueue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm dumb. I was reading these as if they were "public". Ignore me.

);

return this.getCurrentPrimary(txn).next(currentPrimary => {
return this.canActAsPrimary(currentPrimary, txn).next(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. canActAsPrimary(currentPrimary, txn) looks like it will check if the current primary can (still) act as primary (I misread the code at first). Maybe consider renaming to canLocalClientActAsPrimary ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the name since canActAsPrimary no longer takes an argument (I inlined getCurrentPrimary as suggested).

this.attachWindowUnloadHook();
return this.updateClientMetadataAndTryBecomePrimary().then(leaseTtl =>
this.scheduleClientMetadataAndPrimaryLeaseRefreshes(leaseTtl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads as if the refreshes will happen every leaseTtl so I'd still prefer rename this to schedule[Next?]ClientMetadataAndPrimaryLeaseRefresh, but I don't feel super strongly at this point if you think it's more important to communicate the auto-rescheduling aspect.

Relatedly (though I don't think it negates my other feedback), leaseTtl is a nice name, but not exactly accurate (it's not actually the lease TTL since it is adjusted based on min / max refresh intervals, etc.). I'd prefer something like refreshDelay or nextRefreshDelay I think. Don't feel strongly if you think that's worse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the name. The comment regarding leaseTtl is no longer applicable.

// the current lease is expected to expire.
const remainingLifetimeMs =
Date.now() -
(currentPrimary.leaseTimestampMs + CLIENT_METADATA_MAX_AGE_MS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been clearer in my prior review but this still seems wrong (it's backwards). Assuming the lease expires in the future, remainingLifetimeMs will end up being negative. If you keep this, I'd introduce a second explanatory variable:

const leaseExpirationTimestampMs = currentPrimary.leaseTimestampMs + CLIENT_METADATA_MAX_AGE_MS;
const leaseRemainingMs = leaseExpirationTimestampMs - Date.now();

But I'd be in favor of dropping this dynamic refresh interval logic entirely (see below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, that's true. I did go though and remove this logic altogether though (bringing this aspect back the very first iteration of this PR). If we do need faster lease handover, we can add this in the feature and then also figure out how to properly test it.

return Math.min(
MINIMUM_REFRESH_INTERVAL_MS,
remainingLifetimeMs + 1
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusingly, to enforce a minimum, you want to use Math.max(), not Math.min(). As you've written this, clients will refresh at least every 1 second... and since you have a bug that makes remainingLifetimeMs usually be negative, I think they'll just refresh ASAP.

Since it is both difficult to describe (via comments) the behavior you're trying to achieve and to implement it correctly, may I suggest that we simplify and just use a constant refresh interval (4 seconds) rather than trying to base it on the current lease expiration? I think this simplifies our constants, comments, and implementation considerably. The downside is you end up with a random 1-9 second delay when a master gets throttled instead of a random 1-5 second delay, I think? I'm not sure it's worth optimizing at this point.

Also:

  1. Can you add an assert to AsyncQueue.enqueueAfterDelay() that verifies the delay is >= 0?
  2. If you decide to keep the current dynamic refresh logic, can you find a way to write tests for it since we're struggling to get it right?
  3. If you get rid of the dynamic refresh logic, please look for resulting opportunities to simplify. E.g.:
    1. I believe this method won't need to return a "leaseTtl" and scheduleClientMetadataAndPrimaryLeaseRefreshes() won't need to be passed the leaseTtl.
    2. canActAsPrimary() could call getCurrentOwner() directly rather than having currentOwner passed in.
    3. At that point I would get rid of the getCurrentOwner() method entirely (especially since it should really be called getCurrentOwnerIfItsStillValid() or something).
    4. May be more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated above, I went ahead with your suggestion and removed the synchronization aspect of this PR. This simplifies most of the handling that I have not quite been able to get right (with all of it untested). This change removes leaseTtl and getCurrentOwner().

I also added the assert.

// leaseTimestampMs in the extended (or newly acquired) lease.
return this.getCurrentPrimary(txn)
.next(currentPrimary => this.canActAsPrimary(currentPrimary, txn))
.next(isPrimary => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to canActAsPrimary to make it clear we may not be primary yet and to avoid confusion with this.isPrimary ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} catch (err) {
console.warn(
`Spec test failed at step ${count}: ${JSON.stringify(lastStep)}`
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are my hero. :-)

@schmidt-sebastian schmidt-sebastian force-pushed the multitab-masterelection branch 2 times, most recently from af3e8a7 to da4a69e Compare March 1, 2018 01:29
@schmidt-sebastian schmidt-sebastian force-pushed the multitab-masterelection branch from da4a69e to 972cb4c Compare March 1, 2018 01:34
@schmidt-sebastian
Copy link
Contributor Author

Sorry, I pushed the code changes before I got the chance to answer inline. All comments addressed.

Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comment suggestions, but in general this LGTM. The one thing I'm not sure about is the SpecBuilder / MultiClientSpecBuilder split. But in the interest of getting this PR out of the way before you go on vacation again, we could add a TODO(multitab) and we can discuss / make changes if necessary in a follow-up PR.

* primary lease.
*
* @return {Promise<number>} A Promise that resolves with the interval in ms
* after which the metadata and primary lease needs to be refreshed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale return comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

}

/**
* Evaluate the state of all active instances and determine whether the local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instances => clients

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*
* Use `client(clientIndex)` to switch between clients.
*/
export class MultiClientSpecBuilder extends SpecBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I never looked at this until now, but why do we need an inherited class for this? I am not wild about needing to override every method... that kinda' defeats the purpose of inheritance.

In an effort to avoid spinning on this PR forever, I'd be okay leaving this as-is with a TODO(multitab) to revisit (ideally in the near-term).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was torn on this as well (and still am). I did this because

  • It makes it obvious what functionality does not need to be ported.
  • The whole builder aspect makes much more sense for a single client. clientA().build().build() and clientB().build().build() should be independent. We could do this with an improved version MultiSpecBuilder and swap out the underlying instance of SpecBuilder every time client(...) gets called.

That being said, I am open to remove it for now (in a follow up).

if (usePersistence) {
await IndexedDbTestRunner.destroyPersistence();
} else {
await MemoryTestRunner.destroyPersistence();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we're no longer relying on TestRunner.destroyPersistence() as a virtual method implemented by both runners, you can probably just delete MemoryTestRunner.destroyPersistence().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

}

/**
* Union type for each step. The step consists of (mostly) exactly one `field`
* set and optionally expected events in the `expect` field.
*/
export interface SpecStep {
/** The index of the current client for multi-client spec tests. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add "PORTING NOTE: Only used by web multi-tab tests." or similar to the new steps you added (except maybe shutdown).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here and in a bunch of other places.

@@ -1190,6 +1346,11 @@ export interface SpecWatchEntity {
removedTargets?: TargetId[];
}

export type SpecClientState = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a "PORTING NOTE: Only used by web multi-tab tests." ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here and in a bunch of other places.

@@ -1251,6 +1412,10 @@ export interface StateExpectation {
watchStreamRequestCount?: number;
/** Current documents in limbo. Verified in each step until overwritten. */
limboDocs?: string[];
/**
* Whether the instance holds the primary lease. Used in multi-client tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a PORTING NOTE ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here and in a bunch of other places.

@mikelehen
Copy link
Contributor

🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢

@schmidt-sebastian schmidt-sebastian merged commit e0486ff into firestore-multi-tab Mar 1, 2018
@mikelehen mikelehen removed their assignment Mar 1, 2018
@schmidt-sebastian schmidt-sebastian deleted the multitab-masterelection branch March 26, 2018 21:33
@firebase firebase locked and limited conversation to collaborators Oct 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants