Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplfied SummarizerNode:refreshLatestSummary logic #17106

Merged
merged 5 commits into from
Sep 1, 2023

Conversation

agarwal-navin
Copy link
Contributor

@agarwal-navin agarwal-navin commented Aug 30, 2023

Description

#16657 removed the logic to refresh a summary's state by downloading a snapshot. With it gone, summarizer node (and rest of the system) only updates state from a summary that it was tracking. This PR simplifies the logic in SummarizerNode::refreshLatestSummary to account for this. It also renames the properties in the returned object to give the caller more context on what happened and what action it can take.

Additionally, it removes logging the PendingSummaryNotFound telemetry. It is currently logged every time the summarizer loaded for the summary that is loaded from. The reason is that it processes the ack for the summary is loaded from and since the summary was not generated by this instance of the summarizer, this event is logged. Instead, added a property pendingSummaryTracked to the refreshLatestSummary_end event.

AB#4435

@agarwal-navin agarwal-navin requested a review from a team as a code owner August 30, 2023 20:43
@github-actions github-actions bot added area: runtime Runtime related issues base: main PRs targeted against main branch labels Aug 30, 2023
@markfields
Copy link
Member

Plus, we will log the refreshLatestSummary logs which will have information about this anyway

Are you saying that in today's logging, it's already redundant? Would be interesting to see an example query from Msft-internal telemetry (but up to you whether it's worth taking the time)

@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Aug 30, 2023

@fluid-example/bundle-size-tests: -938 Bytes
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 438.41 KB 437.95 KB -469 Bytes
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 237.16 KB 236.7 KB -469 Bytes
loader.js 147.91 KB 147.91 KB No change
map.js 46.36 KB 46.36 KB No change
matrix.js 139.17 KB 139.17 KB No change
odspDriver.js 89.26 KB 89.26 KB No change
odspPrefetchSnapshot.js 41.92 KB 41.92 KB No change
sharedString.js 155.79 KB 155.79 KB No change
sharedTree2.js 240.27 KB 240.27 KB No change
Total Size 1.64 MB 1.64 MB -938 Bytes

Baseline commit: 74f413a

Generated by 🚫 dangerJS against fde2bbd

@agarwal-navin
Copy link
Contributor Author

Plus, we will log the refreshLatestSummary logs which will have information about this anyway

Are you saying that in today's logging, it's already redundant? Would be interesting to see an example query from Msft-internal telemetry (but up to you whether it's worth taking the time)

Here's an example:

Data_eventName Data_eventName1 Data_proposalHandle Data_proposalHandle1 summaryRefSeq summaryRefSeq1 Data_referenceSequenceNumber Data_referenceSequenceNumber1
fluid:telemetry:SummarizerNode:PendingSummaryNotFound fluid:telemetry:SummarizerNode:refreshLatestSummary_end w+AI w+AI 376 376 376 376
fluid:telemetry:SummarizerNode:PendingSummaryNotFound fluid:telemetry:SummarizerNode:refreshLatestSummary_end wgq4B wgq4B 22,274 22,274 22,274 22,274
fluid:telemetry:SummarizerNode:PendingSummaryNotFound fluid:telemetry:SummarizerNode:refreshLatestSummary_end wsK4B wsK4B 22,320 22,320 22,320 22,320
fluid:telemetry:SummarizerNode:PendingSummaryNotFound fluid:telemetry:SummarizerNode:refreshLatestSummary_end wnxk wnxk 3,231 3,231 3,231 3,231
fluid:telemetry:SummarizerNode:PendingSummaryNotFound fluid:telemetry:SummarizerNode:refreshLatestSummary_end woEQ woEQ 8,736 8,736 8,736 8,736

@NicholasCouri
Copy link
Contributor

NicholasCouri commented Aug 30, 2023

PendingSummaryNotFound

There are scenarios where we log PendingSummaryNotFound but do not log the refreshLatestSummary.

Ex. if (this.referenceSequenceNumber >= summaryRefSeq) {
if (this.referenceSequenceNumber >= fetchedSnapshotRefSeq) {

I sent you a query in which we log the pendingSummaryNotFound much more frequently than the refreshLatestSummary - I would keep it for now, TBH

@agarwal-navin
Copy link
Contributor Author

There are scenarios where we log PendingSummaryNotFound but do not log the refreshLatestSummary.

Ex. if (this.referenceSequenceNumber >= summaryRefSeq) { if (this.referenceSequenceNumber >= fetchedSnapshotRefSeq) {

In both these cases, the refresh event is logged (via event.end). The refresh event is a superset of this. Maybe we can add a property to the refresh event that says pending summary not found (wasSummaryTracked is essentially thie same but its not always logged and we could change that).

image

@agarwal-navin agarwal-navin changed the title Removed PendingSummaryNotFound telemetry log Update SummarizerNode:refreshLatestSummary to only return whether summary was tracked. Aug 31, 2023
Copy link
Contributor

@tyler-cai-microsoft tyler-cai-microsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

* 2. The latest summary was updated and the summary corresponding to the params was being tracked.
*
* 3. The latest summary was updated but the summary corresponding to the params was not tracked.
* @returns true if the summary is tracked by this node, false otherwise.
*/
public async refreshLatestSummary(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of renaming this here and elsewhere to handleSummaryAck or onSummaryAck or something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason why - Besides the obvious (we don't do any "refreshing", just tracking), the return value is typically called isSummaryTracked, but without all the context we have this reads kind of strangely:

const isSummaryTracked = refreshLatestSummary(...);

How about this:

const pendingSummaryFound = matchAckWithPendingSummary(...);

Or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not do any such renames in this PR for 3 reasons:

  1. The PR is already big and it will become even bigger.
  2. It's will be a breaking change because the call from RunningSummarizer to ContainerRuntime is on a public interface.
  3. We do "refresh" the latest summary state. It's more of an update but refresh seems fine to me. I would be fine with calling it updateLatestSummaryState or something like that.

@@ -846,8 +845,8 @@ export class GarbageCollector implements IGarbageCollector {
* Called to refresh the latest summary state. This happens when either a pending summary is acked or a snapshot
* is downloaded and should be used to update the state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment

@@ -267,28 +266,23 @@ export class GCSummaryStateTracker {
* Called to refresh the latest summary state. This happens when either a pending summary is acked or a snapshot
* is downloaded and should be used to update the state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment

this.wasGCRunInLatestSummary = this.configs.shouldRunGC;
}

if (!result.latestSummaryUpdated || !this.configs.shouldRunGC) {
if (!isSummaryTracked || !this.configs.shouldRunGC) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of this early return, I think you can just nest an if (this.configs.shouldRunGC) inside the above if statement and put the code in there. Might be clearer. Or put this early return up top to reduce nesting.

Like this:

if (!isSummaryTracked) {
	return;
}

this.wasGCRunInLatestSummary = this.configs.shouldRunGC;

if (this.configs.shouldRunGC) {
	// The rest of the stuff
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it. It looks pretty much the same TBH.

}

// If the summary for which refresh is called is older than the latest tracked summary, ignore it.
if (this.referenceSequenceNumber >= summaryRefSeq) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: This case is no longer addressed by this function. It's up to the caller. This is probably fine, but I'd want us to think really carefully about the global state here, since this.referenceSequenceNumber can change throughout this flow.

I think it's fine because it only changes if we are tracking the incoming summary, in which case before his code would be unreachable anyway. And if we're not tracking, then it doesn't change, so it's fine for the caller to check.

Sound correct? Hopefully test coverage helps here too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that this was >=. Not that they should be able to be equal but not the same (in which case it would be tracked). But you never know what could happen with Summaries!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is still there:

if (summaryRefSeq > this.referenceSequenceNumber) {
  isSummaryNewer = true;
}

isSummaryNewer is not returned because the caller has the information to gather this information. I could change it to return this value but I don't think its needed (infact I had it this way in my local repo and changed it).
IMO, the caller should decide what to do based on the return value isSummaryTracked. If the summary is tracked, the latest summary information (referenceSequenceNumber) will be updated and if not, it won't be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added isSummaryNewer to the return of this function and check that at the call site.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious Why you brought it back? I thought it was fine after I thought out loud about it. It was just a little hard to trace through the diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence about keeping it vs removing it from the point of view of not making significant changes in one go. With you comment, it seemed like it added some confusion so I added it back. We can remove it later if needed.

* current summary that is tracked, but this summarizer runtime did not produce/track that summary. Thus
* it needs to refresh its state. Today refresh is done by fetching the latest snapshot to update the cache
* and then close as the current main client is likely to be re-elected as the parent summarizer again.
*/
if (result.latestSummaryUpdated && !result.wasSummaryTracked) {
if (!isSummaryTracked && summaryRefSeq > this.summarizerNode.referenceSequenceNumber) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider inverting this to put the GC call (+early return) under the `if`` and reduce nesting of all this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it makes me curious if we even need to notify GC in the case where the summary wasn't tracked but was old. Going to read that code again now.

Copy link
Member

@markfields markfields Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass false the GC code is a no-op. So we can remove the param from the GC code and this can be

if (isSummaryTracked) {
	await this.garbageCollector.refreshLatestSummary();
}
else if (summaryRefSeq > this.summarizerNode.referenceSequenceNumber) {
	// the existing fetch-then-close block
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note - no GC test passed false for that param :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that the closeStaleSummarizer function doesn't thrown an error but disposes the container and just continues. That's why it has to still call into GC in that scenario as well. You are right that the logic in GC is no-op but I don't want container runtime to assume that. it does the regular flow and GC decides what it wants to do.

Copy link
Member

@markfields markfields left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left some suggestions for renaming/refactoring. Feel free to punt for this and one of us can do another PR if we like the changes.

@agarwal-navin agarwal-navin changed the title Update SummarizerNode:refreshLatestSummary to only return whether summary was tracked. Simplfied SummarizerNode:refreshLatestSummary logic Sep 1, 2023
@agarwal-navin agarwal-navin merged commit 48539d9 into microsoft:main Sep 1, 2023
@agarwal-navin agarwal-navin deleted the removePendingLog branch September 5, 2023 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants