-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More reliable last Summary flow #7247
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
area: runtime
Runtime related issues
public api change
Changes to a public API
labels
Aug 24, 2021
■ @fluidframework/base-host: No change
⯆ @fluid-example/bundle-size-tests: -254 Bytes
Baseline commit: 6e1ae05 |
packages/runtime/container-runtime/src/test/summarizerHeuristics.spec.ts
Show resolved
Hide resolved
agarwal-navin
approved these changes
Aug 27, 2021
vladsud
commented
Aug 27, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rework logic around managing
The primary motivation for this change is to ensure that last summary always starts and has a chance to run in cases where document has too many ops (i.e. 10K limit in ODSP). Together with PR #7230 they sort of try to pull in different directions - PR #7230 ensures that summarization process cancels as quickly as possible when summarizer client loses connection, while this PR ensures that when main client decides it needs to move to exit, it gives a chance to summarizer client (if it is still connected) to run through full summary. It also ensures that new summarizer (interactive) client has a delay before it starts summarizer instance, thus reducing chances of collisions with another (running last summary) summarizer. Together, they substantially reduce number of cases where we see summary nacks due to overlapping summarizers. It does not eliminate it completely, because RunningSummarizer.trySummarize does 3 attempts to summarize, and it needs to try only once for lastSummary (this is subject for another change).
Most of the changes are reworking logic in SummaryManager as I found it hard to follow when logic is split between multiple functions when they call each other in various ways. Now main flow is one function - SummaryManager.startSummarization(), which I find much easier to follow. Plus I've added a bunch of asserts to ensure that state transition are easier to follow (and we catch any issues).
One core change here is that there was a delay before staring a summarizer client, but it only applied to when container was booting (i.e. if summarizer client was created at that moment). There was no delay when given client (that was already running for a while) was assigned summarization role - it would start summarizer right away, increasing substantially odds of stepping on previous summarizer client that is doing last summary.
I believe there were some other smaller issues, like not re-checking if given client is still elected summarizer after various async processes, leading to more chances that we have two summarizers running in parallel.
Long term, we should move away from delays as a way to solve these problems and look into ability for current summarizing (non-interactive) client to be elected as summarizer, such that last summary that keeps running for 10 mins never has to compete with other clients (but that's subject for another design / PR).
There are certain small bits & pieces that are spill over from #7230 (i.e. they resolve to nothing after that PR), like stop() reason in couple places. These PRs can go in any order, but it's better for this go last to create cleaner diff.
Next in series: