-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High native memory usage during CRL check for the server not featuring OCSP stapling #108557
Comments
I'm pretty sure that the memory is allocated at this line: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/OpenSslCrlCache.cs#L114 And never is released after that. What I see is that pem_read_bio_x509_crl is called, memory used by the contained increases a lot. There are no other places where with spikes in memory utilization that I could find. It could be some issue with disposing the memory. |
Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones |
Possibly related: #7144 Another possibly related thing is https://openssl-users.openssl.narkive.com/K08kHmQg/memory-leaks-in-d2i-x509-crl-and-x509-crl-free that is talking about memory leak for large CRL after calling X509_CRL_free (my understanding it is what used in C# to free up memory in this case) |
If you're measuring the That's not really something we can control; it's just how OpenSSL and CRL behave.
The result from PEM_read_bio_x509_crl gets put in a SafeHandle, which is used in a using statement; when the using block ends we call X509_CRL_free, which then calls a bunch of calls to glibc free... which has... behaviors. |
@bartonjs I found openssl/openssl#5931. This ticket suggested that calling
Output:
As you can see memory is much smaller after calling It would be great if we can consider adding CRL that I used for testing: http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crl |
Somebody will need to double-check the following, but I was trying to understand what There are multiple ways to approach this information, but I think I will leave it to the .NET team to decide on the next steps. |
This is a very expensive call because it forces the allocator to walk every heap block with a lock in the entire process. This call can take seconds or even minutes to complete in real workload scenarios. It would cause very erratic performance behaviors for everyone. You may consider using glibc tunables to see if you can ask the libc allocator to more aggressively return memory: https://www.gnu.org/software/libc/manual/html_node/Tunables.html Alternatively, you can also call |
It makes sense, and to be honest, I anticipated this concern being raised. I'm not necessarily advocating for this specific fix. In my view, what truly matters is ensuring that memory usage remains stable after consecutive TLS handshakes. This should happen automatically, similar to how it behaves on Windows. Even if the first CRL check leads to a memory usage increase, as long as subsequent checks reuse the already allocated memory, I believe that would be an acceptable outcome. |
That's certainly what we've seen in the past. The first one makes the RSS spike, but after that it reuses memory from the same pool. Parallelization can, of course, make the peak be bigger... if thread A is building a chain using CRL B and thread C is building one against CRL D, then both B and D allocs happened in parallel, then they both free in parallel, making for a very large malloc pool. But just building the same chain in a loop should show stable memory. |
I understand (I might be mistaken) that some kind of on-disk cache for CRLs is used underneath, but each concurrent connection itself re-parses the CRL into memory. I wonder if LRU-caching and sharing the in-memory representation too (assuming it's safe to share it across threads/contexts), even keeping it alive for a short time while not referenced to avoid reparsing when a single thread opens and closes connections, would be a possible optimization/mitigation here. |
It's probably worth an experiment to put CRLs into a static that uses MRU and WeakReference semantics. It would definitely help reduce peak memory if building two chains against the same CRL in parallel. It will cost complexity, and the memory impact of each CRL will last longer. Something like a ConcurrentDictionary mapping a URL to a mutable union of Task (download in progress) and WeakReference (when the task finishes)... which would also help remove our parallel downloads "problem". |
Description
The problem appears when the .NET client connects using TCP with TLS v1.2 (also 1.3) to a server without support for OCSP stapling. Client attempts at SSL authentication lead to high native memory usage. In memory-limited environments (AKS, Minikube, VM), this leads to out-of-memory exceptions. The issue occurs only on Linux operating systems. This happens for each newly established TLS session.
In the case of the CRL: http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crl and multithreaded client native memory usage hits easily 1GB and more in seconds where CRL size is around 16MB.
Reproduction Steps
Expected behavior
Consistent and stable memory usage with reasonable size according to the size of CRL.
Actual behavior
Out-of-memory exceptions and consistent growth of native memory to unreasonable sizes.
Regression?
No response
Known Workarounds
Mitigation and not a workaround - disabled CRL check:
SslClientAuthenticationOptions.CertificateRevocationCheckMode = X509RevocationMode.NoCheck
or
HttpClientHandler.CheckCertificateRevocationList = false
Configuration
Debian GNU/Linux v11.3, v12
.NET 8.0: v8.0.42, v8.0.8
mcr.microsoft.com/dotnet/sdk:8.0
Other information
Related issues:
#52577
#101552
The text was updated successfully, but these errors were encountered: