-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove refresh from server path and refresh from network call #16657
Remove refresh from server path and refresh from network call #16657
Conversation
⯆ @fluid-example/bundle-size-tests: -6.58 KB
Baseline commit: 6076896 |
Can we discuss the strategy of merging into Alternatives - Just wait until we're confident, or if we're eager to start coding now, put it in a side branch until we're ready - that way the burden of merging with |
Changed to main. Let's wait until 6.0 is cut and merge the changes into 6.1 |
packages/runtime/container-runtime/src/summary/summarizerNode/summarizerNode.ts
Show resolved
Hide resolved
packages/runtime/container-runtime/src/summary/summarizerNode/summarizerNode.ts
Outdated
Show resolved
Hide resolved
Co-authored-by: Mark Fields <[email protected]>
* Updates GC state from the given snapshot if GC is enabled and the snapshot is newer than the one this node | ||
* is tracking. | ||
*/ | ||
private async refreshGCStateFromSnapshot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so good to see this gone :-)
settings["Fluid.ContainerRuntime.Test.CloseSummarizerDelayOverrideMs"] = 100; | ||
}); | ||
|
||
itExpects( | ||
"Closes the summarizing client instead of refreshing", | ||
[ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did the order of events (and event name) change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the retry scenario, we call fetchSnapshotFromStorageAndClose
, and we don't need to call refreshLatestSummary, because we are closing the container as we fetch the snapshot before passing it into refreshLatestSummary.
In the latest ack scenario, we "refresh" the snapshot and then I moved the fetchSnapshotFromStorageAndClose to the containerRuntime layer so it closes after "refresh"
); | ||
|
||
if (result.latestSummaryUpdated && !result.wasSummaryTracked) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment here explaning why in this case fetch snapshot is called.
Also nit: The fetchLatestSnapshot()
const is no longer required. You can directly call fetchSnapshotFromStorageAndClose
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had left a comment about this too but VS Code ate it. There's lots of logic in there that doesn't necessarily make sense anymore, would be good to clean up here or in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most troubling is waitForDeltaManagerToCatchup
which I'm pretty sure will hang if we're not already caught up (It should throw IMO since the Container is disposed). I wonder how often this is happening already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wait function is also called in the refreshLatestAck
codepath in submitSummary
and this concern should be addressed there too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It's a little weird that there is code running after the container runtime is explicitly disposed. For example, when refreshLatestSummaryAckFromServer
is called from submitSummary
, it would dispose the container runtime but would continue summarization and eventually fail when checkContinue
is called later (and the failure reason would be different I believe). We should short circuit this whole thing and return a failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that be cleaned up as part of AB#5417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up / short-circuiting later sounds fine, as long as we're sure we don't have a potential deadlock - I think we do. That needs to be sorted out immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to AB#5417
Good point. It's a little weird that there is code running after the container runtime is explicitly disposed. For example, when
refreshLatestSummaryAckFromServer
is called fromsubmitSummary
, it would dispose the container runtime but would continue summarization and eventually fail whencheckContinue
is called later (and the failure reason would be different I believe). We should short circuit this whole thing and return a failure.
packages/runtime/container-runtime/src/summary/summarizerNode/summarizerNode.ts
Show resolved
Hide resolved
packages/runtime/container-runtime/src/summary/summarizerNode/summarizerNode.ts
Show resolved
Hide resolved
packages/runtime/container-runtime/src/summary/summarizerNode/summarizerNode.ts
Outdated
Show resolved
Hide resolved
packages/runtime/container-runtime/src/summary/summarizerNode/summarizerNodeUtils.ts
Show resolved
Hide resolved
import { ITelemetryBaseLogger } from "@fluidframework/core-interfaces"; | ||
import { | ||
cloneGCData, | ||
getGCDataFromSnapshot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this from gc/index.ts
, maybe others here too, take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
packages/test/test-end-to-end-tests/src/test/gc/gcStateResetInSummaries.spec.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good! Fine to do more cleanup in later PRs. My one concern I want to see addressed (or we can discuss why not now) is the remaining call to waitForDeltaManagerToCatchup, in submitSummary. I'm concerned it could deadlock when the container is disposed.
{ | ||
eventName: "ClosingSummarizerOnSummaryStale", | ||
codePath, | ||
message: "Stopping fetch from storage", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this field for? It's redundant with eventName
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it will prevent the error's message from being copied over to the event, not that we typically look for that, but JFYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I realized this is just a code move, not actually new here, but something minor to consider in your next PRs)
readAndParseBlob, | ||
); | ||
|
||
// Note that we did not track this summary, but that the latest summary was updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where we are updating latest summary. If it's true that we're not, then let's update this and the doc comment above accordingly -- not to mention the naming in the return type! Makes sense not to since we're about to close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe even rename this function to something like findPendingSummaryMatchingSummaryAck
## Description #16657 removed the logic to refresh a summary's state by downloading a snapshot. With it gone, summarizer node (and rest of the system) only updates state from a summary that it was tracking. This PR simplifies the logic in `SummarizerNode::refreshLatestSummary` to account for this. It also renames the properties in the returned object to give the caller more context on what happened and what action it can take. Additionally, it removes logging the `PendingSummaryNotFound` telemetry. It is currently logged every time the summarizer loaded for the summary that is loaded from. The reason is that it processes the ack for the summary is loaded from and since the summary was not generated by this instance of the summarizer, this event is logged. Instead, added a property `pendingSummaryTracked` to the refreshLatestSummary_end event. [AB#4435](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/4435)
[AB#5146](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/5146) Originally this was added to differentiate between the Refresh from Server code and the Close Code which has already been removed: #16657, as we always close. The information this event is already contained in both `RefreshLatestSummaryFromServerFetch_end` and during `Summarize_cancel`. Remove ClosingSummarizerOnSummaryStale telemetry as it is duplicate to both `RefreshLatestSummaryFromServerFetch_end` and `RefreshLatestSummaryAckFetch_end` telemetry
AB#5118
Remove "refresh from server" logic. We are making this PR into
next
instead ofmain
as we want to wait a bit before we commit to removing "refresh from server" logic. Instead of refreshing we are closing the container and restarting. More details here: #15140This requires updating the
summarizerNode
logic andcontainerRuntime
logic. Work to move when we decide to close the container sits in AB#5152