Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Op bunching 2: Bunch contiguous ops for data store in a batch - data store part #22840

Merged
merged 13 commits into from
Oct 28, 2024

Conversation

agarwal-navin
Copy link
Contributor

@agarwal-navin agarwal-navin commented Oct 17, 2024

Reviewer guidance

This is part 2 or 3 of the op bunching feature. This part focuces on the changes in data store. Part 1 - #22839

Problem

During op processing, container runtime sends ops one at a time to data stores to DDSes. If a DDS has received M contiguous ops as part of a batch, the DDS is called M times to process them individually. This has performance implications for some DDSes and they would benefit from receiving and processing these M ops together.

Take shared tree for example:
For each op received which has a sequenced commit, all the pending commits are processed by the rebaser. So, as the number of ops received grows, so does the processing of pending commits. The following example describes this clearly:
Currently if a shared tree client has N pending commits which have yet to be sequenced, each time that client receives a sequenced commit authored by another client (an op), it will update each of its pending commits which takes at least O(N) work.
Instead, if it receives M commits at once, it could do a single update pass on each pending commit instead of M per pending commit.
It can compose the M commits together into a single change to update over, so it can potentially go from something like O (N * M) work to O (N + M) work with batching.

Solution - op bunching

The solution implemented here is a feature called "op bunching".
With this feature, contiguous ops in a grouped op batch that belong to a data store / DDS will be bunched and sent to it in an array - The grouped op is sent as an ISequencedMessageEnvelope and the individual message contents in it are sent as an array along with the clientSequenceNumber and localOpMetadata.
The container runtime will send bunch of contiguous ops for each data store to it. The data store will send bunch of contiguous ops for each DDS to it. The DDS can choose how to process these ops. Shared tree for instance, would compose the commits in all these ops and update pending commits with it.
Bunching only contiguous ops for a data store / DDS in a batch preserves the behavior of processing ops in the sequence it was received.

Couple of behavior changes to note:

  1. Op events - An implication of this change is the timing of "op" events emitted by container runtime and data store runtime will change. Currently, these layers emit an "op" event immediately after an op is processed. With this change, an upper layer will only know when a bunch has been processed by a lower layer. So, it will emit "op" events for individual ops in the bunch after the entire bunch is processed.
    From my understanding, this should be fine because we do not provide any guarantee that the "op" event will be emitted immediately after an op is processed. These events will be emitted in order of op processing and (sometime) after the op is processed.
    Take delta manager / container runtime as an example. Delta manager sends an op for processing to container runtime and emits the "op" event. However, container runtime may choose to not process these ops immediately but save them until an entire batch is received. This change was made but was reverted due to some concerns not related to the topic discussed here - Process ops by batches in remoteMessageProcessor and pendingStateManager #21785.
    The chang here is similar to the above behavior where an upper layer doesn't know and shouldn't care what lower layers do with ops.
  2. metadata property on message - With this PR, the metadata property is removed from a message before it's sent to data stores and DDS. This is because we now send one common message (the grouped op) and an array of contents. Individual messages within a grouped op have batch begin and end metadata but they are just added by the runtime to keep it like old batch messages. The data store and DDS don't care about it so removing them should be fine.

AB#20123

@agarwal-navin agarwal-navin requested a review from a team as a code owner October 17, 2024 22:06
@github-actions github-actions bot added area: runtime Runtime related issues public api change Changes to a public API base: main PRs targeted against main branch labels Oct 17, 2024
Copy link
Collaborator

@msfluid-bot msfluid-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Coverage Summary

↓ packages.runtime.container-runtime.src:
Line Coverage Change: -0.11%    Branch Coverage Change: 0.04%
Metric NameBaseline coveragePR coverageCoverage Diff
Branch Coverage 92.98% 93.02% ↑ 0.04%
Line Coverage 94.70% 94.59% ↓ -0.11%
↑ packages.runtime.datastore.src:
Line Coverage Change: 0.17%    Branch Coverage Change: 0.34%
Metric NameBaseline coveragePR coverageCoverage Diff
Branch Coverage 89.81% 90.15% ↑ 0.34%
Line Coverage 93.89% 94.06% ↑ 0.17%

Baseline commit: 362f1ad
Baseline build: 302845
Happy Coding!!

Code coverage comparison check passed!!

@agarwal-navin agarwal-navin requested a review from a team as a code owner October 18, 2024 17:48
@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Oct 18, 2024

@fluid-example/bundle-size-tests: +6.08 KB
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 461.09 KB 462.76 KB +1.66 KB
azureClient.js 558.24 KB 559.91 KB +1.68 KB
connectionState.js 724 Bytes 724 Bytes No change
containerRuntime.js 260.71 KB 261.68 KB +991 Bytes
fluidFramework.js 405.97 KB 405.98 KB +14 Bytes
loader.js 134.16 KB 134.18 KB +14 Bytes
map.js 42.46 KB 42.46 KB +7 Bytes
matrix.js 148.29 KB 148.29 KB +7 Bytes
odspClient.js 525.2 KB 526.88 KB +1.68 KB
odspDriver.js 97.84 KB 97.86 KB +21 Bytes
odspPrefetchSnapshot.js 42.81 KB 42.82 KB +14 Bytes
sharedString.js 164.48 KB 164.49 KB +7 Bytes
sharedTree.js 396.43 KB 396.43 KB +7 Bytes
Total Size 3.31 MB 3.32 MB +6.08 KB

Baseline commit: 55bed2d

Generated by 🚫 dangerJS against 575dce1

Copy link
Contributor

@alexvy86 alexvy86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for docs, with a few more suggestion below.

Copy link
Contributor

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> [email protected] ci:linkcheck /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test ci:start 1313 linkcheck:full

1: starting server using command "npm run ci:start"
and when url "[ 'http://127.0.0.1:1313' ]" is responding with HTTP status code 200
running tests using command "npm run linkcheck:full"


> [email protected] ci:start
> http-server ./public --port 1313 --silent


> [email protected] linkcheck:full
> npm run linkcheck:fast -- --external


> [email protected] linkcheck:fast
> linkcheck http://localhost:1313 --skip-file skipped-urls.txt --external

Crawling...

Stats:
  435188 links
    3358 destination URLs
       2 URLs ignored
       0 warnings
       0 errors


@agarwal-navin agarwal-navin merged commit 2e5b969 into microsoft:main Oct 28, 2024
32 checks passed
@agarwal-navin agarwal-navin deleted the opBunchingPart2 branch October 28, 2024 22:43
agarwal-navin added a commit that referenced this pull request Oct 29, 2024
…2841)

## Reviewer guidance
This is part 3 or 3 of the op bunching feature. This part focuces on the
changes in the DDS. [Part
1](#22839) and [part
2](#22840).

## Problem
During op processing, container runtime sends ops one at a time to data
stores to DDSes. If a DDS has received M contiguous ops as part of a
batch, the DDS is called M times to process them individually. This has
performance implications for some DDSes and they would benefit from
receiving and processing these M ops together.

Take shared tree for example:
For each op received which has a sequenced commit, all the pending
commits are processed by the rebaser. So, as the number of ops received
grows, so does the processing of pending commits. The following example
describes this clearly:
Currently if a shared tree client has N pending commits which have yet
to be sequenced, each time that client receives a sequenced commit
authored by another client (an op), it will update each of its pending
commits which takes at least O(N) work.
Instead, if it receives M commits at once, it could do a single update
pass on each pending commit instead of M per pending commit.
It can compose the M commits together into a single change to update
over, so it can potentially go from something like O (N * M) work to O
(N + M) work with batching.

## Solution - op bunching
The solution implemented here is a feature called "op bunching".
With this feature, contiguous ops in a grouped op batch that belong to a
data store / DDS will be bunched and sent to it in an array - The
grouped op is sent as an `ISequencedMessageEnvelope` and the individual
message `contents` in it are sent as an array along with the
`clientSequenceNumber` and `localOpMetadata`.
The container runtime will send bunch of contiguous ops for each data
store to it. The data store will send bunch of contiguous ops for each
DDS to it. The DDS can choose how to process these ops. Shared tree for
instance, would compose the commits in all these ops and update pending
commits with it.
Bunching only contiguous ops for a data store / DDS in a batch preserves
the behavior of processing ops in the sequence it was received.

Couple of behavior changes to note:
1. Op events - An implication of this change is the timing of "op"
events emitted by container runtime and data store runtime will change.
Currently, these layers emit an "op" event immediately after an op is
processed. With this change, an upper layer will only know when a bunch
has been processed by a lower layer. So, it will emit "op" events for
individual ops in the bunch after the entire bunch is processed.
From my understanding, this should be fine because we do not provide any
guarantee that the "op" event will be emitted immediately after an op is
processed. These events will be emitted in order of op processing and
(sometime) after the op is processed.
Take delta manager / container runtime as an example. Delta manager
sends an op for processing to container runtime and emits the "op"
event. However, container runtime may choose to not process these ops
immediately but save them until an entire batch is received. This change
was made but was reverted due to some concerns not related to the topic
discussed here - #21785.
The chang here is similar to the above behavior where an upper layer
doesn't know and shouldn't care what lower layers do with ops.
2. `metadata` property on message - With this PR, the metadata property
is removed from a message before it's sent to data stores and DDS. This
is because we now send one common message (the grouped op) and an array
of contents. Individual messages within a grouped op have batch begin
and end metadata but they are just added by the runtime to keep it like
old batch messages. The data store and DDS don't care about it so
removing them should be fine.


[AB#20123](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/20123)
agarwal-navin added a commit that referenced this pull request Feb 19, 2025
…ect (#23836)

This PR adds support for DDSes to process a 'bunch' of messages. Op
bunching was added via the following PRs:
#22839,
#22840 and
#22841.

Added a `processMessagesCore` function to `SharedObjectCore` that DDSes
can override to process a message bunch. This is marked optional today
because it's a breaking change. It is similar to the `processCore`
function and will replace it eventually. For now, `processCore` has been
marked as deprecated.

This change updates the relative ordering of the `pre-op` and `op`
events emitted by the shared object. Previously, the `pre-op` event
would be emitted before a message was processed and `op` event was
emitted immediately afterwards. With this change, the ordering of these
events w.r.t. to the message being processed is still the same. However,
there may be other ops processed and other event fired in between.
Note that the only guarantee these events provide is that the `pre-op`
event is emitted before a message is procesed and the `op` event is
processed after. So, this change doesn't break that guarantee.
agarwal-navin added a commit that referenced this pull request Feb 21, 2025
Deprecating `process` on `FluidDataStoreRuntime`. It was deprecated on
`IFluidDataStoreChannel` (and other places) as part of
#22840 but missed
deprecating it on `FluidDataStoreRuntime`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues base: main PRs targeted against main branch changeset-present public api change Changes to a public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants