Block large payloads inside `DocumentDeltaConnection`, size due to the 1MB Kafka limit #7987

andre4i · 2021-10-25T22:36:22Z

It would block message transmission and close the container but only with explicit feature gate.

…rom threshold counter, also send the max messagesize

packages/drivers/driver-base/package.json

msfluid-bot · 2021-10-25T23:56:33Z

⯅ @fluid-example/bundle-size-tests: +3.03 KB

Metric Name	Baseline Size	Compare Size	Size Diff
container.js	169.6 KB	169.82 KB	⯅ +219 Bytes
map.js	47.21 KB	47.21 KB	■ No change
matrix.js	143.43 KB	143.43 KB	⯅ +1 Bytes
odspDriver.js	186.55 KB	189.29 KB	⯅ +2.74 KB
odspPrefetchSnapshot.js	41.28 KB	41.36 KB	⯅ +76 Bytes
sharedString.js	164.24 KB	164.24 KB	⯅ +1 Bytes
Total Size	784.99 KB	788.02 KB	⯅ +3.03 KB

Baseline commit: 646a987

Generated by 🚫 dangerJS against b66f6e0

packages/drivers/driver-base/src/documentDeltaConnection.ts

packages/drivers/driver-base/src/messageSizeValidator.ts

… the payload

…k into track-message-size

andre4i · 2021-12-09T16:10:44Z

@markfields @vladsud please take a look at this. There is a bit of a behavior change with regards to retrying on error in the deltamanager.

andre4i · 2021-12-09T16:11:44Z

packages/loader/container-loader/src/connectionManager.ts

+        (errorMessage: string) => new GenericNetworkError(
+            fluidErrorCode,
+            errorMessage,
+            err?.canRetry === true || err?.canRetry === undefined, // unless explicitly specified, this will retry


^^ We would always retry, regardless of the type of error. Not sure if this has always been intentional or not..

Here is the story :)
In ODSP, we definitely want to reconnect always, because we may get disconnect with 403 due to token expiration. 403 is critical error (i.e., in general it's a game over event), but across layers we always do one retry with refreshed token to ensure host has a chance to provide new token.

With that said, I think that's the wrong layer to participate in this game. I.e. ODSP has to project such errors as recoverable. I'd need to look at the code to say if it's the case. And not sure about FRS.

Also, worth noting that for long period of time we consider errors without canRetry to be recoverable. We changed that (maybe 6 months back) - any exception, whether it's a bug in our code, or in host code (like token callback) is catastrophic.

So, to summarize, this code is likely wrong. But I do not think you are making it any more right :)

I don't believe there is a viable way right now for propagating non-retriable errors from the driver to the container. Maybe a TTL on the error itself? What are your thoughts?

vladsud · 2021-12-09T17:43:29Z

I know you looked into it before, but I want to poke more.
Is there really no way to determine that socket.io disconnect happened due to size violation?
Is there anything we can do to observe lower-level traffic to figure it out? That would be obviously much better direction to convert such errors to catastrophic errors (not sure about 0.9Mb limit).

andre4i · 2021-12-09T18:01:33Z

Is there really no way to determine that socket.io disconnect happened due to size violation?

None that I can find :( #8179 (comment) was the farthest I've gotten investigating/debugging this particular issue. I think the best way is to actually add the limitation to the server (our server) based on my notes here: #7599 (comment), push that value down using the client config, the documentdeltaconnection should read that config and either block OR if we fix the 1MB kafka limit do the fix in the runtime layer to support larger batches.

The current PR is to be explicit about a limit that is 'de facto' breaking things silently.

vladsud · 2021-12-14T17:54:17Z

packages/drivers/driver-base/src/messageSizeValidator.ts

+    public static sizeInBytes(message: IDocumentMessage): number {
+        const { contents, ...restOfObject } = message;
+        // `contents` is already stringified. Re-stringifying the whole message will
+        // lead to additional escape characters which will increase the size artificially.


I think this comment suggest that what we measure is not what actually gets counted by socket.io. I.e. if socket.io strigifies payload, then it will add all these escape characters and they will go against the limit, right?

andre4i · 2022-01-12T17:18:25Z

This approach is currently not preferred, as the overhead for each OP risks making this a performance bottleneck. We'll be exploring alternative solutions, such as a socket.io limitation along with an improvement of the error retry mechanism, the latter also fixing the current reconnect loop on socket.io error issue.

andre4i added 5 commits October 22, 2021 17:20

Track message size - first draft

391382b

Fix test infra, add more tests

0b43fc1

Simplify threshold counter tests, add metadata to performance event f…

a18376f

…rom threshold counter, also send the max messagesize

Small renames and refactorings

1cec55b

Merge branch 'main' into track-message-size

75f6505

andre4i requested a review from a team as a code owner October 25, 2021 22:36

github-actions bot requested review from vladsud, jatgarg, tanviraumi, znewton, anthony-murphy, markfields and wes-carlson and removed request for a team October 25, 2021 22:36

github-actions bot added area: driver Driver related issues area: loader Loader related issues public api change Changes to a public API labels Oct 25, 2021

update package version

6873e82

vladsud reviewed Oct 25, 2021

View reviewed changes

packages/drivers/driver-base/package.json Outdated Show resolved Hide resolved

vladsud reviewed Oct 25, 2021

View reviewed changes

packages/drivers/driver-base/src/documentDeltaConnection.ts Show resolved Hide resolved

vladsud reviewed Oct 25, 2021

View reviewed changes

packages/drivers/driver-base/src/documentDeltaConnection.ts Outdated Show resolved Hide resolved

vladsud reviewed Oct 25, 2021

View reviewed changes

packages/drivers/driver-base/src/messageSizeValidator.ts Outdated Show resolved Hide resolved

vladsud reviewed Oct 26, 2021

View reviewed changes

packages/drivers/driver-base/src/messageSizeValidator.ts Show resolved Hide resolved

andre4i marked this pull request as draft October 26, 2021 17:04

Fix package.json

100890d

github-actions bot requested a review from vladsud October 27, 2021 20:08

andre4i added 2 commits November 1, 2021 16:55

Change message size validation to be relative to both the message and…

77887cf

… the payload

Merge branch 'main' into track-message-size

90dc125

github-actions bot added the area: tests Tests to add, test infrastructure improvements, etc label Nov 1, 2021

github-actions bot requested a review from agarwal-navin November 1, 2021 23:59

Make feature gate private

6acfd6c

andre4i marked this pull request as ready for review November 2, 2021 00:48

andre4i added 4 commits November 1, 2021 17:49

Small comment

9a135a9

Formatting

964587e

Update comment

0dca9ae

update comment

38409c6

This was referenced Nov 4, 2021

1M Kafka message limit #7599

Closed

A lot of disconnects / summary behind: need to put message size limit not to hit socket.io limits? #8179

Closed

andre4i added 8 commits November 9, 2021 10:12

Merge branch 'main' into track-message-size

1870140

update package version

e1b3fd8

Merge branch 'track-message-size' of github.com:andre4i/FluidFramewor…

4be520c

…k into track-message-size

Merge branch 'main' into track-message-size

030a302

mock localstorage

ab79725

Close container on errors from localdocumentdeltaconn

9039f1d

Merge branch 'main' into track-message-size

b96af42

Add some comments

8ce1e41

andre4i changed the title ~~Track payload inside DocumentDeltaConnection, size due to the 1MB Kafka limit~~ Block large payloads inside DocumentDeltaConnection, size due to the 1MB Kafka limit Dec 8, 2021

andre4i added 3 commits December 8, 2021 16:53

Fix some tests

a601e54

Merge branch 'main' into track-message-size

d12ec3b

Fix package version

b66f6e0

andre4i commented Dec 9, 2021

View reviewed changes

vladsud reviewed Dec 14, 2021

View reviewed changes

andre4i mentioned this pull request Dec 15, 2021

A container will always reconnect and retry, even for non-retriable errors #8570

Closed

andre4i closed this Jan 12, 2022

vladsud mentioned this pull request Feb 4, 2022

1M limit epic #9023

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block large payloads inside `DocumentDeltaConnection`, size due to the 1MB Kafka limit #7987

Block large payloads inside `DocumentDeltaConnection`, size due to the 1MB Kafka limit #7987

andre4i commented Oct 25, 2021 •

edited

Loading

msfluid-bot commented Oct 25, 2021 •

edited

Loading

andre4i commented Dec 9, 2021

andre4i Dec 9, 2021

vladsud Dec 9, 2021

andre4i Dec 10, 2021

vladsud commented Dec 9, 2021

andre4i commented Dec 9, 2021

vladsud Dec 14, 2021

andre4i commented Jan 12, 2022

Block large payloads inside DocumentDeltaConnection, size due to the 1MB Kafka limit #7987

Block large payloads inside DocumentDeltaConnection, size due to the 1MB Kafka limit #7987

Conversation

andre4i commented Oct 25, 2021 • edited Loading

msfluid-bot commented Oct 25, 2021 • edited Loading

andre4i commented Dec 9, 2021

andre4i Dec 9, 2021

Choose a reason for hiding this comment

vladsud Dec 9, 2021

Choose a reason for hiding this comment

andre4i Dec 10, 2021

Choose a reason for hiding this comment

vladsud commented Dec 9, 2021

andre4i commented Dec 9, 2021

vladsud Dec 14, 2021

Choose a reason for hiding this comment

andre4i commented Jan 12, 2022

Block large payloads inside `DocumentDeltaConnection`, size due to the 1MB Kafka limit #7987

Block large payloads inside `DocumentDeltaConnection`, size due to the 1MB Kafka limit #7987

andre4i commented Oct 25, 2021 •

edited

Loading

msfluid-bot commented Oct 25, 2021 •

edited

Loading