-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Multi-Tab] Master Election #501
[Multi-Tab] Master Election #501
Conversation
b48223b
to
89cd193
Compare
89cd193
to
a492edb
Compare
ca5e473
to
9d94ed0
Compare
9d94ed0
to
64b0eb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it!
Err, lots of naming / nitty feedback and a bit of AsyncQueue questions mixed in. I didn't look very closely at the spec test implementation in this pass, but the tests themselves look sweet (though I wonder if 2 tests is really exhaustive enough :-)). I'll take a closer look in the next iteration.
this.primaryStateListener.applyPrimaryState(this.isPrimary); | ||
} | ||
}) | ||
.then(() => this.isPrimary); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go in the previous then
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer applicable.
@@ -616,4 +622,9 @@ export class SyncEngine implements RemoteSyncer { | |||
return this.remoteStore.handleUserChange(user); | |||
}); | |||
} | |||
|
|||
applyPrimaryState(primaryClient: boolean): void { | |||
// TODO(multitab): Apply the primary state internally on the AsyncQueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the comment on SyncEngine: The SyncEngine’s methods should only ever be called by methods running in the global async queue.
So this comment should be solved in the callers of this method rather than here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved all callers to the AsyncQueue, which let me do the rest of the cleanup I did in IndexedDbPersistence (much in line with what we discussed on HipChat)
@@ -299,6 +303,7 @@ export class FirestoreClient { | |||
|
|||
// Setup wiring between sync engine and remote store | |||
this.remoteStore.syncEngine = this.syncEngine; | |||
this.persistence.setPrimaryStateListener(this.syncEngine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to call SyncEngine.applyPrimanyState() with the current state, but the LocalStore and RemoteStore have not been started yet, so SyncEngine will be in a not-yet-working state. So I suspect we should /not/ call it immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we do call it immediately. It makes 'Master Election' for Memory Persistence much easier as SyncEngine can always start out in secondary mode. For Memory Persistence, Sync Engine will then always transition to primary during client initialization.
I have moved the initialization of the listener further down to address the dependencies between local store and remote store.
@@ -299,6 +303,7 @@ export class FirestoreClient { | |||
|
|||
// Setup wiring between sync engine and remote store | |||
this.remoteStore.syncEngine = this.syncEngine; | |||
this.persistence.setPrimaryStateListener(this.syncEngine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW- Having a PrimaryStateListener interface with a single method feels a bit heavy-handed on web. You could probably just:
this.persistence.primaryStateListener = isPrimary => this.syncEngine.applyPrimaryState(isPrimary);
I don't feel very strongly though, so your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this a single callback type, but still kept the original setter (partly so I could call it back immediately).
|
||
constructor( | ||
private localStore: LocalStore, | ||
private remoteStore: RemoteStore, | ||
private currentUser: User | ||
) {} | ||
|
||
get isPrimaryClient() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this is only used for testing, and I suspect this should remain the case. Add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suspect you are right. Comment added.
@@ -26,18 +26,27 @@ import { SpecStep } from './spec_test_runner'; | |||
const EXCLUSIVE_TAG = 'exclusive'; | |||
// Persistence-related tests. | |||
const PERSISTENCE_TAG = 'persistence'; | |||
// Multi-client related tests. Requires persistence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Requires persistence." is sort of confusing. Maybe remove it or change to "(which implies persistence needs to be enabled as well)" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
function getTestRunner(tags, persistenceEnabled): Function { | ||
if ( | ||
!MULTI_CLIENT_TEST_FILTER(tags, persistenceEnabled) || | ||
!WEB_SPEC_TEST_FILTER(tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to read... I'd just pass persistenceEnabled to WEB_SPEC_TEST_FILTER and combine these checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combined
|
||
specTest('Single tab acquires primary lease', ['multi-client'], () => { | ||
// This test simulates primary state handoff between two background tabs. | ||
// With all instances are in the background, the first active tab acquires |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "are" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.shutdown() | ||
.expectPrimaryState(false) | ||
.client(1) | ||
.restart() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be like refreshing the browser, right? That shouldn't be necessary, right? If we don't have a lighter-weight way to kick the client, add a TODO(multitab)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the refactor to use asyncQueue.enqueueWithDelay I can now simply trigger the required client state refresh. This is now tryAcquirePrimaryLease
.
// With all instances are in the background, the first active tab acquires | ||
// ownership. | ||
return client(0) | ||
.becomeHidden() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? Or is this the default? I am not sure it makes sense to always call becomeHidden/Visible after calling client for the first time. Seems like there should be a default that you can rely on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This both necessary and the default. We need to schedule a step before we can verify expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, woops. In case it wasn't obvious, the previous review was meant to be "Request changes" :-)
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ough. So much work. Back to you!
@@ -299,6 +303,7 @@ export class FirestoreClient { | |||
|
|||
// Setup wiring between sync engine and remote store | |||
this.remoteStore.syncEngine = this.syncEngine; | |||
this.persistence.setPrimaryStateListener(this.syncEngine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we do call it immediately. It makes 'Master Election' for Memory Persistence much easier as SyncEngine can always start out in secondary mode. For Memory Persistence, Sync Engine will then always transition to primary during client initialization.
I have moved the initialization of the listener further down to address the dependencies between local store and remote store.
@@ -299,6 +303,7 @@ export class FirestoreClient { | |||
|
|||
// Setup wiring between sync engine and remote store | |||
this.remoteStore.syncEngine = this.syncEngine; | |||
this.persistence.setPrimaryStateListener(this.syncEngine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this a single callback type, but still kept the original setter (partly so I could call it back immediately).
|
||
constructor( | ||
private localStore: LocalStore, | ||
private remoteStore: RemoteStore, | ||
private currentUser: User | ||
) {} | ||
|
||
get isPrimaryClient() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suspect you are right. Comment added.
@@ -616,4 +622,9 @@ export class SyncEngine implements RemoteSyncer { | |||
return this.remoteStore.handleUserChange(user); | |||
}); | |||
} | |||
|
|||
applyPrimaryState(primaryClient: boolean): void { | |||
// TODO(multitab): Apply the primary state internally on the AsyncQueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved all callers to the AsyncQueue, which let me do the rest of the cleanup I did in IndexedDbPersistence (much in line with what we discussed on HipChat)
*/ | ||
const CLIENT_STATE_MAX_AGE_MS = 5000; | ||
|
||
/** Refresh interval for the client state. Currently set to four seconds. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds five words to our codebase. I thought we were going for that... anyways, removed.
.shutdown() | ||
.expectPrimaryState(false) | ||
.client(1) | ||
.restart() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the refactor to use asyncQueue.enqueueWithDelay I can now simply trigger the required client state refresh. This is now tryAcquirePrimaryLease
.
@@ -89,17 +99,25 @@ const UNSUPPORTED_PLATFORM_ERROR_MSG = | |||
* owner lease immediately regardless of the current lease timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
private ensurePrimaryLease( | ||
txn: SimpleDbTransaction | ||
): PersistencePromise<void> { | ||
return this.tryAcquirePrimaryLease(txn).next(owner => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this entire section based on our HipChat discussion. I only acquire the lease at the end, but verify that I can obtain it at the beginning.
@@ -280,63 +408,60 @@ export class IndexedDbPersistence implements Persistence { | |||
}); | |||
} | |||
|
|||
/** Verifies that that `updateTimeMs` is within CLIENT_STATE_MAX_AGE_MS. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// With all instances are in the background, the first active tab acquires | ||
// ownership. | ||
return client(0) | ||
.becomeHidden() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This both necessary and the default. We need to schedule a step before we can verify expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay and amount of feedback... If you make smaller PRs, I'll make fewer comments and be faster. :-)
@@ -309,6 +314,11 @@ export class FirestoreClient { | |||
}) | |||
.then(() => { | |||
return this.remoteStore.start(); | |||
}) | |||
.then(() => { | |||
this.persistence.setPrimaryStateListener(isPrimary => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there's a non-obvious ordering dependency here, I think this warrants a comment:
// NOTE: This will immediately call the listener, so we make sure to set it after localStore / remoteStore are started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doone.
const CLIENT_STATE_MAX_AGE_MS = 5000; | ||
/** | ||
* Maximum refresh interval for the primary lease. Used for extending a | ||
* currently owned lease. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to suggest this should be "minimum" (since we'll refresh the lease at least once every 4 seconds). But I can see it the other way too (we'll wait at most 4 seconds before refreshing the lease). So maybe change the comment to something like:
The maximum amount of time we'll wait before refreshing the primary lease (but we may opportunistically refresh earlier if we are performing an IndexedDB operation already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are obviously talking about the minimum maximum time here :)
Let me explain the three different concepts here. I have also updated the documentation in the code to clear up some of the confusion.
- CLIENT_METADATA_MAX_AGE_MS is the existing concept and used to ignore state that is outdated.
- CLIENT_METADATA_REFRESH_INTERVAL_MS is less than CLIENT_METADATA_MAX_AGE_MS and used by the primary instance to refresh its lease.
Secondary instances will try to grab the lease from the primary leaseholder once it expires. Ideally, if a lease expires in 10 ms, we want to to acquire it in 10.001 ms. This can end up causing a lot of churn for background tabs that will never acquire the primary lease - and hence, I introduced a minimum refresh interval.
As part of this latest revision, I renamed this minimum interval to MINIMUM_REFRESH_INTERVAL_MS. I don't foresee it being used outside of client metadata refreshes for now, but it is somewhat of an independent concept.
* Minimum interval for attemots to acquire the primary lease. Used when | ||
* synchronizing client lease refreshes across clients. | ||
*/ | ||
const PRIMARY_LEASE_MIN_REFRESH_INTERVAL_MS = 4000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er we have a MIN and MAX "Primary Lease Refresh Interval" and they're both the same value? I think we're also using these intervals to just refresh client metadata state, regardless of the primary lease. This is confusing. I wonder if we could simplify to just two constants:
/**
* Oldest acceptable age in milliseconds for client metadata read from IndexedDB.
* Client metadata and primary leases that are older than 5 seconds are ignored.
*/
const CLIENT_METADATA_MAX_AGE_MS = 5000;
/**
* The minimum frequency at which clients will update their metadata, including
* refreshing the primary lease if held or potentially trying to acquire it if not held.
*
* NOTE: Clients may opportunistically refresh their metadata earlier than this interval
* if they're performing an IndexedDB operation already for another purpose.
*/
const CLIENT_METADATA_REFRESH_INTERVAL_MS = 4000;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment based on your suggestion.
*/ | ||
const PRIMARY_LEASE_MIN_REFRESH_INTERVAL_MS = 4000; | ||
/** LocalStorage location to indicate a zombied primary key (see class comment). */ | ||
const ZOMBIE_PRIMARY_LOCALSTORAGE_SUFFIX = 'zombiedOwnerId'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZOMBIE => ZOMBIED ?
zombiedOwnerId => zombiedPrimaryId? (I don't think we care about back-compat here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// tslint:disable-next-line:no-any setTimeout() type differs on browser / node | ||
private ownerLeaseRefreshHandle: any; | ||
private primaryLeaseRefreshHandle: any; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and the stopClientStateRefreshes() method are no longer used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I wonder if we should cancel our callback on the queue instead, or if we should just rely on the queue's shutdown.
* Note: Instances can only toggle between Primary and Secondary state if | ||
* IndexedDB persistence is enabled and multiple clients are active. If this | ||
* listener is registered with MemoryPersistence, the callback will be called | ||
* exactly once marking the current instance as Primary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep the call in shutdown(), then we should update this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer applicable.
const WEB_SPEC_TEST_FILTER = (tags: string[], persistenceEnabled: boolean) => { | ||
return ( | ||
tags.indexOf(NO_WEB_TAG) === -1 && | ||
(tags.indexOf(MULTI_CLIENT_TAG) === -1 || persistenceEnabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing this code now it looked incorrect that we don't check for PERSISTENCE_TAG too, and in fact we don't check it anywhere, and it doesn't exist on iOS or Android either. I think it was a temporary measure that we used before all the platforms had persistence. Can you remove the tag here (and its usage as 'persistence' in persistence_spec.test.ts)? Else I can do it against master, but it'll probably cause a conflict with this PR.
@@ -197,7 +197,7 @@ describeSpec('Persistence:', ['persistence'], () => { | |||
.shutdown() | |||
.expectPrimaryState(false) | |||
.client(1) | |||
.restart() | |||
.tryAcquirePrimaryLease() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once #412 lands, you can merge it and replace this with
.runTimer(TimerId.ClientStateRefresh)
Can you add a TODO somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added TODO to spec_builder.ts's tryAcquirePrimaryLease.
@@ -788,6 +792,12 @@ abstract class TestRunner { | |||
await this.remoteStore.enableNetwork(); | |||
} | |||
|
|||
private async doAcquirePrimaryLease(): Promise<void> { | |||
expect(this.queue.containsDelayedOperation(TimerId.ClientStateRefresh)).to | |||
.be.true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't matter since this code should go away, but FYI- runDelayedOperationsEarly() will throw if the timer doesn't exist, so this check should be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
|
||
/** Shut down the client and close it network connection. */ | ||
shutdown?: boolean; | ||
|
||
shutdown?: true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
c35a659
to
4951fa7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ready for review again. I also removed the 'persistence' tag from the Spec tests (in a separate commit).
Further, I added logging to log the last step of a failed spec tests. I find this really useful.
@@ -309,6 +314,11 @@ export class FirestoreClient { | |||
}) | |||
.then(() => { | |||
return this.remoteStore.start(); | |||
}) | |||
.then(() => { | |||
this.persistence.setPrimaryStateListener(isPrimary => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doone.
* Oldest acceptable age in milliseconds for client states read from IndexedDB. | ||
* Client state and primary leases that are older than 5 seconds are ignored. | ||
*/ | ||
const CLIENT_STATE_MAX_AGE_MS = 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended to use the terminology here from SharedClientState, but forgot that these two datasets are completely unrelated :/ I prefer metadata here, which also conveys that this storage model is different from the data stored in WebStorage.
// tslint:disable-next-line:no-any setTimeout() type differs on browser / node | ||
private ownerLeaseRefreshHandle: any; | ||
private primaryLeaseRefreshHandle: any; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I wonder if we should cancel our callback on the queue instead, or if we should just rely on the queue's shutdown.
/** Our 'visibilitychange' listener if registered. */ | ||
private documentVisibilityHandler: ((e?: Event) => void) | null; | ||
|
||
/** Callback for primary state notifications. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
constructor( | ||
prefix: string, | ||
platform: Platform, | ||
private readonly queue: AsyncQueue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix
is stored in the member localStoragePrefix
, which to me seems like a better name when used internally in this class.
platform
is only used to access .document
and .window
and doesn't need to be retained as a property.
@@ -373,10 +541,10 @@ export class IndexedDbPersistence implements Persistence { | |||
* zombied due to their tab closing) from LocalStorage, or null if no such | |||
* record exists. | |||
*/ | |||
private getZombiedOwnerId(): string | null { | |||
private getZombiedClientId(): string | null { | |||
try { | |||
const zombiedOwnerId = window.localStorage.getItem( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -61,6 +68,16 @@ export class MemoryPersistence implements Persistence { | |||
return Promise.resolve(); | |||
} | |||
|
|||
tryBecomePrimary(): Promise<boolean> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not anymore. no-unused-variables
should be catching this (https://palantir.github.io/tslint/rules/no-unused-variable)
|
||
setPrimaryStateListener(primaryStateListener: PrimaryStateListener) { | ||
// All clients using memory persistence act as primary. | ||
this.queue.enqueue(() => primaryStateListener(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the call during shutdown.
* Note: Instances can only toggle between Primary and Secondary state if | ||
* IndexedDB persistence is enabled and multiple clients are active. If this | ||
* listener is registered with MemoryPersistence, the callback will be called | ||
* exactly once marking the current instance as Primary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer applicable.
@@ -788,6 +792,12 @@ abstract class TestRunner { | |||
await this.remoteStore.enableNetwork(); | |||
} | |||
|
|||
private async doAcquirePrimaryLease(): Promise<void> { | |||
expect(this.queue.containsDelayedOperation(TimerId.ClientStateRefresh)).to | |||
.be.true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
4951fa7
to
2e10f8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the scheduling logic needs some more love. PTAL.
constructor( | ||
prefix: string, | ||
platform: Platform, | ||
private readonly queue: AsyncQueue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm dumb. I was reading these as if they were "public". Ignore me.
); | ||
|
||
return this.getCurrentPrimary(txn).next(currentPrimary => { | ||
return this.canActAsPrimary(currentPrimary, txn).next( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. canActAsPrimary(currentPrimary, txn)
looks like it will check if the current primary can (still) act as primary (I misread the code at first). Maybe consider renaming to canLocalClientActAsPrimary
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the name since canActAsPrimary
no longer takes an argument (I inlined getCurrentPrimary as suggested).
this.attachWindowUnloadHook(); | ||
return this.updateClientMetadataAndTryBecomePrimary().then(leaseTtl => | ||
this.scheduleClientMetadataAndPrimaryLeaseRefreshes(leaseTtl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads as if the refreshes will happen every leaseTtl so I'd still prefer rename this to schedule[Next?]ClientMetadataAndPrimaryLeaseRefresh
, but I don't feel super strongly at this point if you think it's more important to communicate the auto-rescheduling aspect.
Relatedly (though I don't think it negates my other feedback), leaseTtl
is a nice name, but not exactly accurate (it's not actually the lease TTL since it is adjusted based on min / max refresh intervals, etc.). I'd prefer something like refreshDelay or nextRefreshDelay I think. Don't feel strongly if you think that's worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the name. The comment regarding leaseTtl
is no longer applicable.
// the current lease is expected to expire. | ||
const remainingLifetimeMs = | ||
Date.now() - | ||
(currentPrimary.leaseTimestampMs + CLIENT_METADATA_MAX_AGE_MS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have been clearer in my prior review but this still seems wrong (it's backwards). Assuming the lease expires in the future, remainingLifetimeMs will end up being negative. If you keep this, I'd introduce a second explanatory variable:
const leaseExpirationTimestampMs = currentPrimary.leaseTimestampMs + CLIENT_METADATA_MAX_AGE_MS;
const leaseRemainingMs = leaseExpirationTimestampMs - Date.now();
But I'd be in favor of dropping this dynamic refresh interval logic entirely (see below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, that's true. I did go though and remove this logic altogether though (bringing this aspect back the very first iteration of this PR). If we do need faster lease handover, we can add this in the feature and then also figure out how to properly test it.
return Math.min( | ||
MINIMUM_REFRESH_INTERVAL_MS, | ||
remainingLifetimeMs + 1 | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusingly, to enforce a minimum, you want to use Math.max(), not Math.min(). As you've written this, clients will refresh at least every 1 second... and since you have a bug that makes remainingLifetimeMs usually be negative, I think they'll just refresh ASAP.
Since it is both difficult to describe (via comments) the behavior you're trying to achieve and to implement it correctly, may I suggest that we simplify and just use a constant refresh interval (4 seconds) rather than trying to base it on the current lease expiration? I think this simplifies our constants, comments, and implementation considerably. The downside is you end up with a random 1-9 second delay when a master gets throttled instead of a random 1-5 second delay, I think? I'm not sure it's worth optimizing at this point.
Also:
- Can you add an assert to AsyncQueue.enqueueAfterDelay() that verifies the delay is >= 0?
- If you decide to keep the current dynamic refresh logic, can you find a way to write tests for it since we're struggling to get it right?
- If you get rid of the dynamic refresh logic, please look for resulting opportunities to simplify. E.g.:
- I believe this method won't need to return a "leaseTtl" and scheduleClientMetadataAndPrimaryLeaseRefreshes() won't need to be passed the leaseTtl.
- canActAsPrimary() could call getCurrentOwner() directly rather than having currentOwner passed in.
- At that point I would get rid of the getCurrentOwner() method entirely (especially since it should really be called getCurrentOwnerIfItsStillValid() or something).
- May be more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated above, I went ahead with your suggestion and removed the synchronization aspect of this PR. This simplifies most of the handling that I have not quite been able to get right (with all of it untested). This change removes leaseTtl
and getCurrentOwner()
.
I also added the assert.
// leaseTimestampMs in the extended (or newly acquired) lease. | ||
return this.getCurrentPrimary(txn) | ||
.next(currentPrimary => this.canActAsPrimary(currentPrimary, txn)) | ||
.next(isPrimary => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to canActAsPrimary
to make it clear we may not be primary yet and to avoid confusion with this.isPrimary
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} catch (err) { | ||
console.warn( | ||
`Spec test failed at step ${count}: ${JSON.stringify(lastStep)}` | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are my hero. :-)
af3e8a7
to
da4a69e
Compare
da4a69e
to
972cb4c
Compare
Sorry, I pushed the code changes before I got the chance to answer inline. All comments addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comment suggestions, but in general this LGTM. The one thing I'm not sure about is the SpecBuilder / MultiClientSpecBuilder split. But in the interest of getting this PR out of the way before you go on vacation again, we could add a TODO(multitab) and we can discuss / make changes if necessary in a follow-up PR.
* primary lease. | ||
* | ||
* @return {Promise<number>} A Promise that resolves with the interval in ms | ||
* after which the metadata and primary lease needs to be refreshed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stale return comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
} | ||
|
||
/** | ||
* Evaluate the state of all active instances and determine whether the local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instances => clients
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* | ||
* Use `client(clientIndex)` to switch between clients. | ||
*/ | ||
export class MultiClientSpecBuilder extends SpecBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I never looked at this until now, but why do we need an inherited class for this? I am not wild about needing to override every method... that kinda' defeats the purpose of inheritance.
In an effort to avoid spinning on this PR forever, I'd be okay leaving this as-is with a TODO(multitab) to revisit (ideally in the near-term).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was torn on this as well (and still am). I did this because
- It makes it obvious what functionality does not need to be ported.
- The whole builder aspect makes much more sense for a single client.
clientA().build().build(
) andclientB().build().build()
should be independent. We could do this with an improved version MultiSpecBuilder and swap out the underlying instance of SpecBuilder every timeclient(...)
gets called.
That being said, I am open to remove it for now (in a follow up).
if (usePersistence) { | ||
await IndexedDbTestRunner.destroyPersistence(); | ||
} else { | ||
await MemoryTestRunner.destroyPersistence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we're no longer relying on TestRunner.destroyPersistence() as a virtual method implemented by both runners, you can probably just delete MemoryTestRunner.destroyPersistence().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
} | ||
|
||
/** | ||
* Union type for each step. The step consists of (mostly) exactly one `field` | ||
* set and optionally expected events in the `expect` field. | ||
*/ | ||
export interface SpecStep { | ||
/** The index of the current client for multi-client spec tests. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add "PORTING NOTE: Only used by web multi-tab tests." or similar to the new steps you added (except maybe shutdown).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added here and in a bunch of other places.
@@ -1190,6 +1346,11 @@ export interface SpecWatchEntity { | |||
removedTargets?: TargetId[]; | |||
} | |||
|
|||
export type SpecClientState = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a "PORTING NOTE: Only used by web multi-tab tests." ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added here and in a bunch of other places.
@@ -1251,6 +1412,10 @@ export interface StateExpectation { | |||
watchStreamRequestCount?: number; | |||
/** Current documents in limbo. Verified in each step until overwritten. */ | |||
limboDocs?: string[]; | |||
/** | |||
* Whether the instance holds the primary lease. Used in multi-client tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this a PORTING NOTE ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added here and in a bunch of other places.
🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢🚢 |
This PR adds "master election" to Firestore (which isn't quite as complicated as it sounds thanks to IndexedDb).
The PR lays the groundwork for two different persistence modes: Some operations can require the primary lease, and some operations can run without it. Hence, the
runTransaction
function now takes in a parameter to indicate whether it needs the primary lease. A lot of the code in this PR is formatting changes due to this additional argument, and if deemed reasonable, I can split this up and move it to a different PR.I am also adding Spec test for multi-client workflows. I added a new spec builder that deals with multi-client tests. There's a lot of code in there that doesn't do anything but change the return value of the super class, and if we combine the MultiClientSpecBuilder and the SpecBuilder, this would go away. The main benefit of the separation is that we can easily hide this from Android and iOS.
I also added a
MockPlatform
to override some of thedocument
functionality to use it in testing. This allows me to alter its behavior per client.