Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2 output processing make over #40925

Merged
merged 77 commits into from
Apr 16, 2022
Merged

Conversation

davidfowl
Copy link
Member

@davidfowl davidfowl commented Mar 28, 2022

This changes the HTTP/2's output processing to be use queues instead of locks (a channel to be precise). We're manually scheduling the Http2OutputProducer's intent to write instead of a write operation. This removes locks and allows the thread to do other useful work while processing in line for processing in a low allocation way.

Fixes #30235

~~Putting this up as a draft for now while I run performance tests and to get early feedback. ~~

Scenaro: Unary invocation with 70 concurrent requests with one connection

Current

application
CPU Usage (%) 31 ▃█▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▂
Cores usage (%) 867 ▃█▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▂
Working Set (MB) 152 ███████████████████████
Private Memory (MB) 1,122 ▅▇█████████████████████
Build Time (ms) 14,507
Start Time (ms) 255
Published Size (KB) 91,501
.NET Core SDK Version 7.0.100-preview.3.22128.6
ASP.NET Core Version 7.0.0-preview.4.22181.8+7399df2
.NET Runtime Version 7.0.0-preview.4.22181.3+fe0f600
Max CPU Usage (%) 31 ▄█▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▂
Max Working Set (MB) 158 ████████████████████████
Max GC Heap Size (MB) 3 ▃▃▅█████████████████████
Size of committed memory by the GC (MB) 0
Max Number of Gen 0 GCs / sec 0.00
Max Number of Gen 1 GCs / sec 0.00
Max Number of Gen 2 GCs / sec 0.00
Max Time in GC (%) 0.00
Max Gen 0 Size (B) 0
Max Gen 1 Size (B) 0
Max Gen 2 Size (B) 0
Max LOH Size (B) 0
Max Allocation Rate (B/sec) 910,024 █▃
Max GC Heap Fragmentation 0
# of Assemblies Loaded 110 ████████████████████████
Max Exceptions (#/s) 1
Max Lock Contention (#/s) 16,862 ▄█▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▁
Max ThreadPool Threads Count 48 ▅█████████████████████
Max ThreadPool Queue Length 24 █▄ ▂ ▁ ▃▁▂ ▇ ▁▁ ▃
Max ThreadPool Items (#/s) 261,988 ▂█▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▁
Max Active Timers 1 ████████████████████████
IL Jitted (B) 159,282 ▂▂▄▇▇▇▇▇▇▇██████████████
Methods Jitted 1,765 ▂▂▅▇▇▇▇▇▇▇▇▇▇▇██████████
load
CPU Usage (%) 2 ███████████████████
Cores usage (%) 58 █▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
Working Set (MB) 46 ▁▁▁▁▂▃▃▄▅▅▆▆▇██████
Private Memory (MB) 129 ███████████████████
Start Time (ms) 86
Requests 1,480,274
Bad responses 0
Socket errors 0
Mean latency (ms) 1
Max latency (ms) 18
Max RPS 98,685

PR

application
CPU Usage (%) 33 ▆███████████████████▁
Cores usage (%) 936 ▅███████████████████▁
Working Set (MB) 153 ▇▇██████████████████████
Private Memory (MB) 959 ▆▆██████████████████████
Build Time (ms) 8,232
Start Time (ms) 269
Published Size (KB) 91,501
.NET Core SDK Version 7.0.100-preview.3.22128.6
ASP.NET Core Version 7.0.0-preview.4.22181.8+7399df2
.NET Runtime Version 7.0.0-preview.4.22181.3+fe0f600
Max CPU Usage (%) 33 ▅███████████████████▂
Max Working Set (MB) 159 ▇▇███████████████████████
Max GC Heap Size (MB) 3 ▃▃▅██████████████████████
Size of committed memory by the GC (MB) 0
Max Number of Gen 0 GCs / sec 0.00
Max Number of Gen 1 GCs / sec 0.00
Max Number of Gen 2 GCs / sec 0.00
Max Time in GC (%) 0.00
Max Gen 0 Size (B) 0
Max Gen 1 Size (B) 0
Max Gen 2 Size (B) 0
Max LOH Size (B) 0
Max Allocation Rate (B/sec) 1,206,304 █▁
Max GC Heap Fragmentation 0
# of Assemblies Loaded 111 █████████████████████████
Max Exceptions (#/s) 1
Max Lock Contention (#/s) 369 ▄▆▇▆█▆▃▅▄▃▂▃▃▄▃▃▃▃▃▄▁
Max ThreadPool Threads Count 28 ▁▁████████████████████▅▅▅
Max ThreadPool Queue Length 2 █ ▄ ▄ ▄▄
Max ThreadPool Items (#/s) 417,645 ▃▇██████████████████▂
Max Active Timers 1 █████████████████████████
IL Jitted (B) 234,850 ▂▂▆▇▇████████████████████
Methods Jitted 2,878 ▂▂▆▇▇████████████████████
load
CPU Usage (%) 3 ███████████████████
Cores usage (%) 77 ██▇████████████████
Working Set (MB) 46 ▁▁▁▁▃▅▇████████████
Private Memory (MB) 129 ███████████████████
Start Time (ms) 84
Requests 4,300,984
Bad responses 0
Socket errors 0
Mean latency (ms) 0
Max latency (ms) 1
Max RPS 286,732

Scenario: Unary invocation 1 stream and 1 connection

Current

load
CPU Usage (%) 1 ███████████████████
Cores usage (%) 34 ███████████████████
Working Set (MB) 13 ▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇███
Private Memory (MB) 129 ███████████████████
Start Time (ms) 89
Requests 145,094
Bad responses 0
Socket errors 0
Mean latency (ms) 0
Max latency (ms) 0
Max RPS 9,673

PR

load
CPU Usage (%) 1 ███████████████████
Cores usage (%) 35 ▇███▇████▇████▇████
Working Set (MB) 13 ▄▄▄▄▅▅▆▆▆▆▆▇▇▇▇▇▇██
Private Memory (MB) 129 ███████████████████
Start Time (ms) 88
Requests 147,366
Bad responses 0
Socket errors 0
Mean latency (ms) 0
Max latency (ms) 0
Max RPS 9,824

@ghost ghost added the area-runtime label Mar 28, 2022
@JamesNK JamesNK force-pushed the davidfowl/h2-output-makeover branch from 7864a25 to 8b42920 Compare April 4, 2022 00:31
@JamesNK
Copy link
Member

JamesNK commented Apr 4, 2022

Fixed merge conflicts.

Copy link
Member

@halter73 halter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be clearer about whether we're referring to the connection or stream window.


if (producer.IsTimingWrite)
{
_timeoutControl.StopTimingWrite();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should avoid stopping the timing if we're just going to restart it.

@davidfowl
Copy link
Member Author

davidfowl commented Apr 11, 2022

@halter73 your last commit broke quite a few new tests, I’ll revert and apply some of the suggestions from the review.

@davidfowl davidfowl marked this pull request as ready for review April 12, 2022 15:26
@davidfowl
Copy link
Member Author

davidfowl commented Apr 12, 2022

PS: The numbers were obtained by using a 4K pause threshold for the output pipe, that isn't currently in this PR.

@davidfowl
Copy link
Member Author

I'm attempting to make the pipe buffer change and fix one more round of tests... Wish me luck...

@halter73 halter73 mentioned this pull request Apr 13, 2022
@davidfowl
Copy link
Member Author

I removed the QueueWithMovableHead and implemented the suggestion.

@davidfowl davidfowl force-pushed the davidfowl/h2-output-makeover branch from c7cf0b0 to 6557e99 Compare April 16, 2022 01:51
- Today its possible for a single stream to keep getting rescheduled and draining the connection window. This change makes it so that streams that have data to be written and haven't written anything are preferred over ones that have written something. It does this by changing the head of the queue before dequeuing everything to start at the first index of the stream that hasn't written anything.
- Reworked logic in frameworker to use the state enum as well as the read result.
- Fixed races around how work can be scheduled now that we're not running inline with a small pause threshold.
- Store the last window consumer in as a field and make sure it gets queued last on the next window update.
- Over scheduling races (cancel during a dequeue that will complete the pipe)
- Handle cancellation when there are trailers
- Added current state to make sure state is terminal. We base decisions on the current state and use the unobserved state to determine when to reschedule.
- Remove CancelPendingRead since there's never a pending read and we're scheduling the pipe reads manually.
- Rename the Canceled state to Aborted
@davidfowl davidfowl force-pushed the davidfowl/h2-output-makeover branch from 6557e99 to 02319f0 Compare April 16, 2022 02:05
@davidfowl
Copy link
Member Author

We're good @halter73 !

@JamesNK
Copy link
Member

JamesNK commented Apr 16, 2022

Nice!

One request: run the gRPC benchmark with 70 streams and 28 connections.

I’d like to see how the change performs when the server is at capacity

@davidfowl
Copy link
Member Author

@JamesNK do you have a command for me to run? 😄

@JamesNK
Copy link
Member

JamesNK commented Apr 16, 2022

It’s the same one but add “—variable connections=28”

@davidfowl
Copy link
Member Author

28 connections, 70 streams

Before

application
CPU Usage (%) 83 █▆▆▆▆▅▅▅▅▆▅▅▅▆▆▆▅▅▅▄
Cores usage (%) 2,313 █▆▆▆▆▅▅▆▅▆▅▆▅▆▆▆▅▅▅▄
Working Set (MB) 73 ▅▆██████████████████████
Private Memory (MB) 104 ▆▇██████████████████████
Build Time (ms) 4,115
Start Time (ms) 402
Published Size (KB) 91,316
.NET Core SDK Version 7.0.100-preview.3.22128.6
ASP.NET Core Version 7.0.0-preview.4.22215.9+3e8db65
.NET Runtime Version 7.0.0-preview.4.22215.4+17c2cb2
Max CPU Usage (%) 84 █▆▆▆▆▅▅▆▅▆▅▅▅▆▆▆▅▅▆▄
Max Working Set (MB) 76 ▅▅▇██████████████████████
Max GC Heap Size (MB) 15 ▁▁███████████████████████
Size of committed memory by the GC (MB) 0
Max Number of Gen 0 GCs / sec 0.00
Max Number of Gen 1 GCs / sec 0.00
Max Number of Gen 2 GCs / sec 0.00
Max Time in GC (%) 0.00
Max Gen 0 Size (B) 0
Max Gen 1 Size (B) 0
Max Gen 2 Size (B) 0
Max LOH Size (B) 0
Max Allocation Rate (B/sec) 12,821,176
Max GC Heap Fragmentation 0
# of Assemblies Loaded 112 █████████████████████████
Max Exceptions (#/s) 6
Max Lock Contention (#/s) 123,493 ▃▇▇▇█▇▇█▇▇▇█▇▇▇▇▇▆▆▆
Max ThreadPool Threads Count 48 ▅██████████████████████
Max ThreadPool Queue Length 1,078 █ ▃ ▁ ▁ ▁▁ ▂▁ ▁
Max ThreadPool Items (#/s) 912,377 ▆████▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆
Max Active Timers 0
IL Jitted (B) 152,660 ▂▂▃▅▇████████████████████
Methods Jitted 1,777 ▂▂▃▅▇████████████████████
load
CPU Usage (%) 4 ▆████▆████████▆█▆██
Cores usage (%) 100 ███████████████████
Working Set (MB) 48 ▂▂▂▂▆██████████████
Private Memory (MB) 131 ███████████████████
Start Time (ms) 82
Requests 10,365,490
Bad responses 0
Socket errors 0
Mean latency (ms) 2
Max latency (ms) 12
Max RPS 691,033

After

application
CPU Usage (%) 96 ▃█▅▅▅▅▄▄▄▅▅▄▅▄▅▅▅▅▅▄▁
Cores usage (%) 2,686 ▃█▅▅▅▅▄▄▄▅▅▄▅▄▅▅▅▅▅▄▁
Working Set (MB) 79 ▅███████████████████████
Private Memory (MB) 110 ▆███████████████████████
Build Time (ms) 4,143
Start Time (ms) 420
Published Size (KB) 91,316
.NET Core SDK Version 7.0.100-preview.3.22128.6
ASP.NET Core Version 7.0.0-preview.4.22215.9+3e8db65
.NET Runtime Version 7.0.0-preview.4.22215.4+17c2cb2
Max CPU Usage (%) 98 ▄█▅▅▅▅▄▄▄▄▄▄▅▄▄▅▅▄▅▄▂
Max Working Set (MB) 82 ▅▅███████████████████████
Max GC Heap Size (MB) 22 ▁▁███████████████████████
Size of committed memory by the GC (MB) 0
Max Number of Gen 0 GCs / sec 0.00
Max Number of Gen 1 GCs / sec 0.00
Max Number of Gen 2 GCs / sec 0.00
Max Time in GC (%) 0.00
Max Gen 0 Size (B) 0
Max Gen 1 Size (B) 0
Max Gen 2 Size (B) 0
Max LOH Size (B) 0
Max Allocation Rate (B/sec) 20,203,608
Max GC Heap Fragmentation 0
# of Assemblies Loaded 113 █████████████████████████
Max Exceptions (#/s) 0
Max Lock Contention (#/s) 968 ▂█▇██▇▇▇▇▇▇▆▇▆▇▆▆▇▇▆▃
Max ThreadPool Threads Count 48 ▅██████████████████████
Max ThreadPool Queue Length 399
Max ThreadPool Items (#/s) 950,535 ▂▆████▇▇▇▇▇▇▇▇▇▇▇▇▇▇▃
Max Active Timers 0
IL Jitted (B) 232,817 ▂▂▄▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇███
Methods Jitted 2,909 ▂▂▄▇▇████████████████████
load
CPU Usage (%) 4 ███████████▆███████
Cores usage (%) 101 ███████████████████
Working Set (MB) 48 ▂▂▂▂▆██████████████
Private Memory (MB) 131 ███████████████████
Start Time (ms) 86
Requests 11,517,940
Bad responses 0
Socket errors 0
Mean latency (ms) 1
Max latency (ms) 12
Max RPS 767,863

@@ -74,6 +71,7 @@ internal partial class Http2Connection : IHttp2StreamLifetimeHandler, IHttpStrea
internal readonly Http2KeepAlive? _keepAlive;
internal readonly Dictionary<int, Http2Stream> _streams = new Dictionary<int, Http2Stream>();
internal PooledStreamStack<Http2Stream> StreamPool;
internal Action<Http2Stream>? _onStreamCompleted;
Copy link
Member

@JamesNK JamesNK Apr 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for testing?

I don't think it needs to be fixed now, but I'd like to see this reworked to utilize IHttp2StreamLifetimeHandler. Tests should support customizing the handler so it can be used to raise this event. I did this in the HTTP/3 tests and it works well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create a follow-up issue.

Copy link
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🇮🇹

@davidfowl davidfowl merged commit 1b02b85 into main Apr 16, 2022
@davidfowl davidfowl deleted the davidfowl/h2-output-makeover branch April 16, 2022 16:17
@ghost ghost added this to the 7.0-preview4 milestone Apr 16, 2022
@davidfowl davidfowl mentioned this pull request Apr 16, 2022
4 tasks
@davidfowl
Copy link
Member Author

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/grpc.benchmarks.yml --scenario grpcaspnetcoreserver-h2loadclient --profile aspnet-citrine-win --variable streams=70 --variable connections=1 --application.framework net7.0 --application.sdkversion 7.0.100-preview.3.22128.6 --application.options.collectCounters true --chart

@davidfowl davidfowl added the Perf label Aug 26, 2022
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve HTTP/2 performance by using Channels
6 participants