Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More reliable last Summary flow #7247

Merged
merged 5 commits into from
Aug 27, 2021
Merged

Conversation

vladsud
Copy link
Contributor

@vladsud vladsud commented Aug 24, 2021

Rework logic around managing

The primary motivation for this change is to ensure that last summary always starts and has a chance to run in cases where document has too many ops (i.e. 10K limit in ODSP). Together with PR #7230 they sort of try to pull in different directions - PR #7230 ensures that summarization process cancels as quickly as possible when summarizer client loses connection, while this PR ensures that when main client decides it needs to move to exit, it gives a chance to summarizer client (if it is still connected) to run through full summary. It also ensures that new summarizer (interactive) client has a delay before it starts summarizer instance, thus reducing chances of collisions with another (running last summary) summarizer. Together, they substantially reduce number of cases where we see summary nacks due to overlapping summarizers. It does not eliminate it completely, because RunningSummarizer.trySummarize does 3 attempts to summarize, and it needs to try only once for lastSummary (this is subject for another change).

Most of the changes are reworking logic in SummaryManager as I found it hard to follow when logic is split between multiple functions when they call each other in various ways. Now main flow is one function - SummaryManager.startSummarization(), which I find much easier to follow. Plus I've added a bunch of asserts to ensure that state transition are easier to follow (and we catch any issues).

One core change here is that there was a delay before staring a summarizer client, but it only applied to when container was booting (i.e. if summarizer client was created at that moment). There was no delay when given client (that was already running for a while) was assigned summarization role - it would start summarizer right away, increasing substantially odds of stepping on previous summarizer client that is doing last summary.

I believe there were some other smaller issues, like not re-checking if given client is still elected summarizer after various async processes, leading to more chances that we have two summarizers running in parallel.

Long term, we should move away from delays as a way to solve these problems and look into ability for current summarizing (non-interactive) client to be elected as summarizer, such that last summary that keeps running for 10 mins never has to compete with other clients (but that's subject for another design / PR).

There are certain small bits & pieces that are spill over from #7230 (i.e. they resolve to nothing after that PR), like stop() reason in couple places. These PRs can go in any order, but it's better for this go last to create cleaner diff.

Next in series:

  1. Implement only one attempt for last Summary instead of current 3
  2. Add summary attempt field to all summarizer events (currently it's missing pretty much everywhere).

@vladsud vladsud requested a review from a team as a code owner August 24, 2021 23:19
@github-actions github-actions bot added area: runtime Runtime related issues public api change Changes to a public API labels Aug 24, 2021
@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Aug 24, 2021

@fluidframework/base-host: No change
Metric NameBaseline SizeCompare SizeSize Diff
main.js 185.72 KB 185.72 KB No change
Total Size 213.31 KB 213.31 KB No change
@fluid-example/bundle-size-tests: -254 Bytes
Metric NameBaseline SizeCompare SizeSize Diff
container.js 162.69 KB 162.44 KB -254 Bytes
map.js 44.71 KB 44.71 KB No change
matrix.js 143.51 KB 143.51 KB No change
odspDriver.js 173.42 KB 173.42 KB No change
odspPrefetchSnapshot.js 38.7 KB 38.7 KB No change
sharedString.js 165.09 KB 165.09 KB No change
Total Size 760.8 KB 760.56 KB -254 Bytes

Baseline commit: 6e1ae05

Generated by 🚫 dangerJS against 0d8b8cb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues public api change Changes to a public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants