You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have really small-scale tests that are run as part of our post-CI flow.
It runs against frs canary & odsp prod, exactly same code and payload, and has 10 ops/s (sequenced) ops, i.e. tiny.
Here are the metrics from these runs:
union office_fluid_ffautomation_*
| where Event_Time > ago(7d)
| where Data_eventName contains "OpRoundtripTime"
| where isnotnull(Data_durationOutboundQueue)
| where Data_driverType == "routerlicious"
| summarize count(), toint(avg(Data_duration)), toint(percentile(Data_duration, 90)), toint(stdev(Data_duration))
FRS
ODSP
Average
400
105
P90
180
125
Std. dev.
2095
51
Numbers are in milliseconds, and measure end-to-end latency of op being sent by client and received back by client after ordering services acked an op. Numbers include some overhead of client code, we have measured to subtract it, but I'll use same metric across the board as data below is coming from older client builds that do not have that change yet.
Krushboo shared with me results of stress tests run that your team runs, and it has peak 1000 ops/s sequenced rate. I do not know what it runs against (i.e. what tenant, is it isolated, etc.).
The numbers are much worse and beyond reasonable.
Here are results from ODSP scalability run, run by IDC team every week against production tenant, it has sustained 3000 ops/s sequenced ops, and much higher number of broadcast ops (i.e. number of clients/doc is much higher).
More statistics about these runs at the bottom of this file.
Though it runs against cluster consisting of 5 front-end boxes, so very likely workload per box is similar to previous run (assuming FRS above uses single front-end box).
ODSP
Average
107
P90
126
Std. dev.
35
The numbers are very consistent for ODSP.
They are not consistent for FRS, and standard deviation (and as result – average) are through the roof, even for trivial runs.
Please note that these high latencies and deviations cause a lot of trouble across other areas, including
This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!
We have really small-scale tests that are run as part of our post-CI flow.
It runs against frs canary & odsp prod, exactly same code and payload, and has 10 ops/s (sequenced) ops, i.e. tiny.
Here are the metrics from these runs:
Numbers are in milliseconds, and measure end-to-end latency of op being sent by client and received back by client after ordering services acked an op. Numbers include some overhead of client code, we have measured to subtract it, but I'll use same metric across the board as data below is coming from older client builds that do not have that change yet.
Krushboo shared with me results of stress tests run that your team runs, and it has peak 1000 ops/s sequenced rate. I do not know what it runs against (i.e. what tenant, is it isolated, etc.).
The numbers are much worse and beyond reasonable.
Here are results from ODSP scalability run, run by IDC team every week against production tenant, it has sustained 3000 ops/s sequenced ops, and much higher number of broadcast ops (i.e. number of clients/doc is much higher).
More statistics about these runs at the bottom of this file.
Though it runs against cluster consisting of 5 front-end boxes, so very likely workload per box is similar to previous run (assuming FRS above uses single front-end box).
The numbers are very consistent for ODSP.
They are not consistent for FRS, and standard deviation (and as result – average) are through the roof, even for trivial runs.
Please note that these high latencies and deviations cause a lot of trouble across other areas, including
Also related:
The text was updated successfully, but these errors were encountered: