Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R11s-driver: Add driver policy for long-polling downgrade #16051

Merged
merged 2 commits into from
Jun 19, 2023

Conversation

znewton
Copy link
Contributor

@znewton znewton commented Jun 17, 2023

Description

Allow R11s-driver consumers to disable long-polling downgrade functionality.

Breaking Changes

The type for R11sDriverPolicies is adding a new config, but the param in DocumentServiceFactory uses Partial<>, so it is not actually breaking the usability.

@znewton znewton requested a review from a team as a code owner June 17, 2023 18:09
@znewton znewton changed the title Add driver policy for long-polling downgrade R11s-driver: Add driver policy for long-polling downgrade Jun 17, 2023
@github-actions github-actions bot added area: driver Driver related issues base: main PRs targeted against main branch labels Jun 17, 2023
@vladsud
Copy link
Contributor

vladsud commented Jun 19, 2023

It would be great to add testing of both modes (long polling and websoclet) to a matrix of states we test in service tests.
I can't find where we do it (the best reference is for other types of options / policies, like runtimeOptionsMatrix / loaderOptionsMatrix), but I think @jatgarg should know where we have driver policies.

@vladsud
Copy link
Contributor

vladsud commented Jun 19, 2023

Terminology: Maybe it's fine as is, but technically it's not a downgrade, it's not allowing / not using websocket upgrade :)

@znewton
Copy link
Contributor Author

znewton commented Jun 19, 2023

Terminology: Maybe it's fine as is, but technically it's not a downgrade, it's not allowing / not using websocket upgrade :)

Actually it is a downgrade, right? Our default and first attempted connection is websocket, because we do not specify ["polling","websocket"] as the transports. Instead, we specify ["websocket"] as the only transport, which completely removes Socket.io's transport-upgrade behavior. Then, if that websocket connection fails, we are explicitly reconnecting with a "downgraded" transport (polling).

So, if "websocket" is considered an "upgrade" from "polling", then the reverse would be called a "downgrade", so this terminology would be accurate.

@znewton znewton requested review from msfluid-bot and a team as code owners June 19, 2023 16:22
@github-actions github-actions bot added the public api change Changes to a public API label Jun 19, 2023
@znewton
Copy link
Contributor Author

znewton commented Jun 19, 2023

It would be great to add testing of both modes (long polling and websoclet) to a matrix of states we test in service tests. I can't find where we do it (the best reference is for other types of options / policies, like runtimeOptionsMatrix / loaderOptionsMatrix), but I think @jatgarg should know where we have driver policies.

I think that's a good idea. This PR still does not allow us to configure the driver to only perform polling, so reproducing a polling connection in a test might still be a challenge. Currently, to validate polling locally, I use yarn patch to change the initial transport from ["websocket"] to ["polling"].

From an actual user standpoint, I don't think we want to make that flow easily configurable, so adding a config just for testing feels a bit odd. Would be better to find a way to make the websocket transport fail consistently in a test environment.

@msfluid-bot
Copy link
Collaborator

@fluid-example/bundle-size-tests: +30 Bytes
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 448.33 KB 448.34 KB +6 Bytes
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 239.04 KB 239.04 KB +2 Bytes
loader.js 155.61 KB 155.62 KB +4 Bytes
map.js 46.75 KB 46.75 KB +2 Bytes
matrix.js 146.68 KB 146.68 KB +2 Bytes
odspDriver.js 92.44 KB 92.44 KB +6 Bytes
odspPrefetchSnapshot.js 43.69 KB 43.69 KB +4 Bytes
sharedString.js 163.28 KB 163.29 KB +2 Bytes
sharedTree2.js 248.74 KB 248.74 KB No change
Total Size 1.7 MB 1.7 MB +30 Bytes

Baseline commit: 54cc669

Generated by 🚫 dangerJS against 3e833a1

@vladsud
Copy link
Contributor

vladsud commented Jun 19, 2023

Terminology: Maybe it's fine as is, but technically it's not a downgrade, it's not allowing / not using websocket upgrade :)

Actually it is a downgrade, right? Our default and first attempted connection is websocket, because we do not specify ["polling","websocket"] as the transports. Instead, we specify ["websocket"] as the only transport, which completely removes Socket.io's transport-upgrade behavior. Then, if that websocket connection fails, we are explicitly reconnecting with a "downgraded" transport (polling).

So, if "websocket" is considered an "upgrade" from "polling", then the reverse would be called a "downgrade", so this terminology would be accurate.

If we look at HTTP protocol, then we failed to do websocket upgrade of HTTP connection, and now trying to leverage HTTP connection as is. You are probably right RE how that is exposed in socket.io - the ideal flow would be for socket.io not to close original http socket that failed websocket upgrade and reuse it. But I think it does not provide ability to express priorities around protocols, so we only list "websocket" protocol and if http upgrade to websocket failed, then whole http socket is closed and we start over with establishing new http socket. It's a bit wasteful and slow.

You can see it even on example in https://socket.io/docs/v4/socket-io-protocol/#introduction - there is a request to upgrade to websocket:

GET /socket.io/?EIO=4&transport=polling&t=N8hyd6w
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=UTF-8
0{"sid":"lv_VI97HAXpY6yYWAAAC","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000,"maxPayload":1000000}

Any chance you can look at it a bit more RE efficient way of using socket.io?

Per https://github.com/socketio/engine.io-protocol#transports, it suggests that there might be more efficient way to deal with connections when websocket upgrade does not go through.

By default, the client SHOULD create an HTTP long-polling connection, and then upgrade to better transports if available.

Maybe there is a way to establish first long polling connection, and then attempt to upgrade? And that sequence is as efficient to using "websocket" protocol from the starrt?

@znewton
Copy link
Contributor Author

znewton commented Jun 19, 2023

Maybe there is a way to establish first long polling connection, and then attempt to upgrade? And that sequence is as efficient to using "websocket" protocol from the starrt?

This is the default socket.io behavior, but it was explicitly overwritten to do websocket first before I had even joined the project a few years ago.

According to you, @vladsud, in #8820 discussion,

based on what I read about socket.io protocols, adding long polling will result in it being used first, for some indeterminate period of time. I.e., its less of a fallback (the way it's implemented), latency of communication might be affected negatively by this change for all connections that do support websocket upgrade.
So it would be great to actually do some measurements and see if this approach is good, or it's better to first try websocket protocol, and if it fails, retry with long-polling.

I'm inclined to keep the current functionality for now, since that is how it has been for years, and it doesn't seem like external sources that you've found are consistent on which is more performant.

@znewton znewton merged commit afdc7f5 into microsoft:main Jun 19, 2023
@znewton znewton deleted the r11s-driver/long-polling-config branch June 19, 2023 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: driver Driver related issues base: main PRs targeted against main branch public api change Changes to a public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants