Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait to raise Connected event for Read connection until container is caught up #9377

Merged
merged 56 commits into from
Jul 22, 2022

Conversation

markfields
Copy link
Member

@markfields markfields commented Mar 5, 2022

Part of #7884. See comments in connectionStateHandler.ts for description of the desired behavior here.

Note that the new behavior is behind a feature gate "Fluid.Container.CatchUpBeforeDeclaringConnected". I didn't make any changes to existing function waitContainerToCatchUp, even though some functionality is duplicated. Once we finish rolling this out, we can remove the duplicate logic from that function since it will simply wait for the "connected" event (we may even deprecate it at that point)

To-Do:

  • Need to update the README and specifically info about CatchingUp state

Some follow-up work items to open afterwards:

  • Roll out new feature gate to enable new behavior for Loop partners
  • Remove feature gate and deprecate waitContainerToCatchUp

@markfields markfields requested review from vladsud and jatgarg March 5, 2022 01:43
@github-actions github-actions bot added the area: loader Loader related issues label Mar 5, 2022
@vladsud
Copy link
Contributor

vladsud commented Mar 6, 2022

I think it looks good. Some things missing:

  1. Tests
  2. breaking.md note
  3. Possibly: some feature gate to disable that change, in case we learn it screws up something we did not consider.

@markfields
Copy link
Member Author

Revisiting this finally. Writing down some questions I'm trying to clarify in my head:

  • How does startJoinOpTimer work? When it fires, it doesn't do anything. I guess receivedAddMemberEvent keys off whether it's set or not, and if so may call applyForConnectedState which in turn may call this.setConnectionState(ConnectionState.Connected);.
  • For write connections, does waitContainerToCatchUp actually cover the scenario of leveraging join op to indicate "caught up"? The comment in there (the one I said I don't understand) seems to indicate so, but I don't see it.

@markfields
Copy link
Member Author

@jatgarg can you take a look too please? Thanks!

@markfields
Copy link
Member Author

Answering my own question...

For write connections, does waitContainerToCatchUp actually cover the scenario of leveraging join op to indicate "caught up"? The comment in there (the one I said I don't understand) seems to indicate so, but I don't see it.

waitContainerToCatchUp gates on connected event which for write connections means we've processed own join op.

This PR kind of inverts this - now always "connected" event means we're caught up. So waitContainerToCatchUp is actually "wait for the container to be connected".

It can just gate on "connected" state,
since "connected" state now gates on caught up.
@ghost
Copy link

ghost commented Apr 29, 2022

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

❌ markfields sign now
You have signed the CLA already but the status is still pending? Let us recheck it.

@markfields
Copy link
Member Author

@vladsud, @agarwal-navin, @jatgarg -- Here's my update on this PR, and some input I need from you:

  • Do we all agree with the specification that "connected" means "caught up to known ops" even for write clients? i.e. that's an additional delay beyond the "join" op now. Also, the way I've implemented so far the definition of "known ops" varies based on when a particular callback is invoked, so it's a little wishy-washy. Thoughts?
  • I started looking into a feature gate for this. Ideally the gate would be used in two places: (1) ConnectionStateHandler to determine when to transition to "connected" state and (2) waitContainerToCatchUp can just wait for "connected" event when the gate is enabled (don't need to wait for additional ops). I don't see a way to cover (2) because it's a function taking in an IContainer, rather than being a private function on Container class. I think this will be fine for waitContainerToCatchUp to essentially catch up twice - if hasCheckpointSequenceNumber it'll be a no-op the second time anyway.
    • If we agree then I'll revert behavior of waitContainerToCatchUp back to how it was.
  • I need to write tests
  • I need to move this to next and include BREAKING.md note.

Anything else I should consider here? I anticipate moving this from Draft on Monday or Tuesday. Thanks guys!

@markfields
Copy link
Member Author

markfields commented Jun 17, 2022

Remaining tasks:

  • Write UTs for CatchUpMonitor and ImmediateCatchUpMonitor
  • Fix the skipped test in Container.spec.ts
  • Address //* comments

@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Jun 17, 2022

@fluid-example/bundle-size-tests: +1.61 KB
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 387.39 KB 387.39 KB No change
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 191.5 KB 191.5 KB No change
loader.js 149.91 KB 151.52 KB +1.61 KB
map.js 42.67 KB 42.67 KB No change
matrix.js 127.15 KB 127.15 KB No change
odspDriver.js 149.26 KB 149.26 KB No change
odspPrefetchSnapshot.js 38.31 KB 38.31 KB No change
sharedString.js 147.79 KB 147.79 KB No change
Total Size 1.24 MB 1.24 MB +1.61 KB

Baseline commit: 4beded1

Generated by 🚫 dangerJS against 4618761

@@ -178,6 +178,10 @@ export async function waitContainerToCatchUp(container: IContainer) {
};
container.on("closed", closedCallback);

// Depending on config, transition to "connected" state may include the guarantee
// that all known ops have been processed. If so, we may introduce additional wait here.
Copy link
Member Author

@markfields markfields Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, we may introduce additional wait here

What do you all think - is this correct or not? It may not add additional delay since events fire synchronously.

@markfields markfields marked this pull request as ready for review June 17, 2022 18:21
@markfields
Copy link
Member Author

@agarwal-navin @vladsud @tyler-cai-microsoft @jatgarg -- Thanks for the early review! Ready for review now. Just need to add UTs for the new "CatchUpMonitor" classes but otherwise ready to go.

@github-actions github-actions bot removed the dependencies Pull requests that update a dependency file label Jun 17, 2022
@markfields markfields requested a review from WayneFerrao June 17, 2022 21:42
packages/loader/container-loader/src/catchUpMonitor.ts Outdated Show resolved Hide resolved
this.opHandler({ sequenceNumber: this.deltaManager.lastSequenceNumber });

// If a listener is added after we are already caught up, notify that new listener immediately
this.on("newListener", (event: string, listener) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like instead of using events, it's better to use a property that exposes a promise. I.e. something like

public caughtUp: Promise<number>;

Promise seems to be better because this event files once, and past "firing" reflected on new listeners - that's exactly what promise does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great point. I don't remember why I switched to event-based to begin with, but I bet it would work better with Promise. That said, I don't have time to try it out now, and also leaving it will reduce merge conflicts with your follow-on refactor.

Copy link
Contributor

@vladsud vladsud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Singing off to unblock you - I did not look through it now, but was poking in my prototyping on top of it and overall, it looks fine.

@markfields markfields merged commit 0b5950d into microsoft:main Jul 22, 2022
@github-actions
Copy link
Contributor

This commit is queued for merging with the next branch! Please ignore this PR for now. Contact @microsoft/fluid-cr-infra for help.

@markfields
Copy link
Member Author

Opened https://dev.azure.com/fluidframework/internal/_workitems/edit/1384 to track rolling this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: loader Loader related issues base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connectivity eventing improvements
6 participants