Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSLStream should not use the ArrayPool for it's internal buffers #49001

Closed
davidfowl opened this issue Mar 2, 2021 · 15 comments
Closed

SSLStream should not use the ArrayPool for it's internal buffers #49001

davidfowl opened this issue Mar 2, 2021 · 15 comments
Labels
Milestone

Comments

@davidfowl
Copy link
Member

davidfowl commented Mar 2, 2021

Description

Today SSLStream uses a 4K buffer for the TLS handshake and a 16K buffer for the reading TLS frames. These buffers come from the array pool today which makes it really easy to exhaust the buckets in the shared array pool for those commonly used sizes. This affects websocket and any other scenario that results in a large number of long lived concurrent connections.

Regression?

No.

Data

A demo of 5000 websocket connections connected to an ASP.NET Core server using Kestrel:

Here's a heap dump sorted by inclusive size. Most of what is being held onto is array pool buffers.
image

On top of that, I took a trace of the same set of connections and you can see the ArrayPool event source says the number of buffers allocated is 5000. The buffers are all 32K in size (because of how the SslSteam adds the FrameOverhead).

image

Allocation profile of 5000 websocket connections. You can see the array pool has the most byte[] allocations.

image

They're mostly in the handshake and reading (I'm not writing in this app), just sitting mostly idle with ping pongs:

image

Analysis

We shouldn't use the array pool buffers for reads that may take a long time. That means both for the handshake and the encryption/decryption:

_internalBuffer = ArrayPool<byte>.Shared.Rent(ReadBufferSize);

The handshake will end up doing the same for 8K buffers in the pool but potentially for a much shorter duration as clients negotiate quickly and there's a timeout to complete the handshake (in kestrel).

@davidfowl davidfowl added the tenet-performance Performance related issue label Mar 2, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Buffers untriaged New issue has not been triaged by the area owner labels Mar 2, 2021
@ghost
Copy link

ghost commented Mar 2, 2021

Tagging subscribers to this area: @tannergooding, @pgovind, @GrabYourPitchforks
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Today SSLStream uses a 4K buffer for the TLS handshake and a 16K buffer for the reading TLS frames. These buffers come from the array pool today which makes it really easy to exhaust the buckets in the shared array pool for those commonly used sizes. This affects websocket and any other scenario that results in a large number of long lived concurrent connections.

Regression?

No.

Data

A demo of 5000 websocket connections connected to an ASP.NET Core server using Kestrel:

Here's a heap dump sorted by inclusive size. Most of what is being held onto is array pool buffers.
image

On top of that, I took a trace of the same set of connections and you can see the ArrayPool event source says the number of buffers allocated is 5000 while the number of buffers rented was 16. The buffers are all 32K in size (because of how the SslSteam adds the FrameOverhead).

image

Analysis

We shouldn't use the array pool buffers for reads that may take a long time. That means both for the handshake and the encryption/decryption:

_internalBuffer = ArrayPool<byte>.Shared.Rent(ReadBufferSize);

The handshake will end up doing the same for 8K buffers in the pool but potentially for a much shorter duration as clients negotiate quickly and there's a timeout to complete the handshake (in kestrel).

Author: davidfowl
Assignees: -
Labels:

area-System.Buffers, tenet-performance, untriaged

Milestone: -

@ghost
Copy link

ghost commented Mar 2, 2021

Tagging subscribers to this area: @dotnet/ncl, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Today SSLStream uses a 4K buffer for the TLS handshake and a 16K buffer for the reading TLS frames. These buffers come from the array pool today which makes it really easy to exhaust the buckets in the shared array pool for those commonly used sizes. This affects websocket and any other scenario that results in a large number of long lived concurrent connections.

Regression?

No.

Data

A demo of 5000 websocket connections connected to an ASP.NET Core server using Kestrel:

Here's a heap dump sorted by inclusive size. Most of what is being held onto is array pool buffers.
image

On top of that, I took a trace of the same set of connections and you can see the ArrayPool event source says the number of buffers allocated is 5000 while the number of buffers rented was 16. The buffers are all 32K in size (because of how the SslSteam adds the FrameOverhead).

image

Analysis

We shouldn't use the array pool buffers for reads that may take a long time. That means both for the handshake and the encryption/decryption:

_internalBuffer = ArrayPool<byte>.Shared.Rent(ReadBufferSize);

The handshake will end up doing the same for 8K buffers in the pool but potentially for a much shorter duration as clients negotiate quickly and there's a timeout to complete the handshake (in kestrel).

Author: davidfowl
Assignees: -
Labels:

area-System.Buffers, area-System.Net.Security, tenet-performance, untriaged

Milestone: -

@stephentoub
Copy link
Member

We shouldn't use the array pool buffers for reads that may take a long time.

That may be true, but i don't know that's supported by the data here. You can have 5000 buffers from the same bucket in the pool all out at the same time; what matters is how many end up being returned at the same time. If the arrays only spend a small amount of time in the pool, you can have a much larger number than the bucket size with few being discarded. (I've called out before we don't have the right events in the pool right now to successfully answer these questions and I'd like us to fix that. )

So, let's say SslStream stopped using the pool. What gets better here?

@davidfowl
Copy link
Member Author

So, let's say SslStream stopped using the pool. What gets better here?

Anyone renting 32K buffers wouldn't end up allocating them from the heap since the pool is drained. In websocket scenarios that end up with lots of pending reads, this could be a win. Though ultimately I think the better answer in this case is to read directly into the user's buffer, I figured I'd file this issue anyways.

The memory usage with TLS compared to non TLS websockets is stark and I'm wishing that I had spent more time looking at the implementation sooner....

@stephentoub
Copy link
Member

Anyone renting 32K buffers wouldn't end up allocating them from the heap since the pool is drained.

But instead SslStream would be allocating those same buffers, no? That's what I'm trying to understand. At the end of the day, is it not just changing who's allocating the buffer?

@davidfowl
Copy link
Member Author

But instead SslStream would be allocating those same buffers, no? That's what I'm trying to understand. At the end of the day, is it not just changing who's allocating the buffer?

I don't want to conflate the issues but yes, this particular isn't about reducing overall memory usage (maybe I should file that one too). It's other code in the app trying to use the pool and ending up allocating because the bucket(s) are drained.

@stephentoub
Copy link
Member

stephentoub commented Mar 2, 2021

It's other code in the app trying to use the pool and ending up allocating because the bucket(s) are drained.

I'm still trying to understand. Let's say the buckets are drained because SslStream consumed them. How is that different from the buckets being empty because no one has allocated / returned any arrays yet? And if these components allocate arrays and then return them, in general won't the arrays be there for them (or others other components including SslStream returned) when they then go back to get another? Sorry if I'm not understanding the concern.

Things go poorly with the array pool when arrays are returned and dropped because the pool is full. That's the main thing I was alluding to about wanting an event for, as it's actually the sign of a problem (there's also a concern if a buffer is rented and never returned, but that's much less common in well-behaved code, e.g. we sometimes won't return a buffer if an exception is thrown and have generally considered that to be ok). Is the SslStream use somehow causing such dropping to happen more often?

@davidfowl
Copy link
Member Author

davidfowl commented Mar 2, 2021

How is that different from the buckets being empty because no one has allocated / returned any arrays yet?

Maybe I'm misunderstanding something but ArrayPool.Shared is optimized for synchronous operations. When the array pool is used for arrays are being rented and returned quickly then there's very little allocation and lots of reuse (once things are "warm"). When used for operations that may be pending for a long time (like the websocket scenario) the array pool it is more likely that arrays are out of the pool than in. TL;DR Long running operations make it more likely that buffers are out of the pool for longer than need be, resulting in more overall allocations rending the pool less useful.

@jkotas
Copy link
Member

jkotas commented Mar 2, 2021

less useful != useless

We know about many situations where the current ArrayPool behavior is sub-optimal (#853, #12800, #13289, ...). The GC team has allocated time in .NET 6 to make this better.

@stephentoub
Copy link
Member

stephentoub commented Mar 2, 2021

To make sure we're on the same page, ArrayPool doesn't preallocate the max number of arrays for a given bucket size. It's just a temporary drop-off / pick-up location used to hand off arrays from one consumer to another with a size that will only allow some max number of a given size to be stored at once. If there are no buffers currently in the pool, for whatever reason, a new one will be allocated and that one returned. That array can then be returned to the pool, and will be stored and available for subsequent rental, assuming the bucket wasn't full. In this manner, you can have tons of arrays for the same bucket size all pending and all being reused with none being thrown away; what matters for the reuse is that not all of those consumers try to return their arrays at the same time in a manner that would then exceed the bucket size. It's optimized for fast synchronous operations mainly in that the same thread / core reusing the same array over and over will have that memory warm in its cache, and it's able to more efficiently rent/return an array with less overhead when it's able to do so from the thread's cache or from the queue that's more associated with the current core.

@geoffkizer
Copy link
Contributor

I think we want to pool the buffers used by SslStream (and similar); this helps reduce GC cost.

It's possible that ArrayPool.Shared is not the right way to do this; if so, we should implement a different pool with different policy. But I'm not clear how the policy would be different.

@davidfowl
Copy link
Member Author

I think we want to pool the buffers used by SslStream (and similar); this helps reduce GC cost.

I think if the underlying reads are fast and the array pool is being used then it'll reduce GC cost. Otherwise it's the same as allocating the array (like in the websockets case), since the bucket may be empty if buffers aren't being returned quickly enough.

Spoke to @stephentoub offline about this and I'm not convinced there's an issue her as yet since we don't have counters or events to understand if these pooled arrays are being dropped when these buffers are being returned.

@geoffkizer
Copy link
Contributor

Otherwise it's the same as allocating the array (like in the websockets case), since the bucket may be empty if buffers aren't being returned quickly enough.

How is it the same? Over time, as SslStreams get created/dropped, new SslStreams should reuse buffers from old ones.

@davidfowl
Copy link
Member Author

How is it the same? Over time, as SslStreams get created/dropped, new SslStreams should reuse buffers from old ones.

The scenario above is a long running websocket scenarios where SSL stream instances can last for minutes/hours/days.

@wfurt
Copy link
Member

wfurt commented Mar 2, 2021

The buffer will be returned every time SslStream consumed all data via ReturnReadBufferIfEmpty(). The overall duration does not really matter. (and "allocated" again via ResetReadBuffer if needed when more data arrives)

@ghost ghost locked as resolved and limited conversation to collaborators Apr 1, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants