Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High native memory usage during CRL check for the server not featuring OCSP stapling #108557

Open
sfc-gh-mhofman opened this issue Oct 4, 2024 · 11 comments

Comments

@sfc-gh-mhofman
Copy link

sfc-gh-mhofman commented Oct 4, 2024

Description

The problem appears when the .NET client connects using TCP with TLS v1.2 (also 1.3) to a server without support for OCSP stapling. Client attempts at SSL authentication lead to high native memory usage. In memory-limited environments (AKS, Minikube, VM), this leads to out-of-memory exceptions. The issue occurs only on Linux operating systems. This happens for each newly established TLS session.
In the case of the CRL: http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crl and multithreaded client native memory usage hits easily 1GB and more in seconds where CRL size is around 16MB.

Reproduction Steps

using System.Diagnostics;
using System.Net.Security;
using System.Net.Sockets;
using System.Security.Cryptography.X509Certificates;

namespace DotNetWebRequestCertMemLeak
{
    class Program
    {
        static async Task Main()
        {
            const X509RevocationMode RevocationCheckMode = X509RevocationMode.Online;
            Console.WriteLine(RevocationCheckMode);
            for (int i = 1; i <= 10000; i++) 
            {
                using var socket = new Socket(SocketType.Stream, ProtocolType.Tcp) { NoDelay = true };
                try
                {
                    const string Host = "server-not-supporting-ocsp-stapling.com";

                    await socket.ConnectAsync(Host, 443);

                    using var networkStream = new NetworkStream(socket);
                    using var sslStream = new SslStream(networkStream);

                    await sslStream.AuthenticateAsClientAsync(new SslClientAuthenticationOptions
                    {
                        TargetHost = Host,
                        CertificateRevocationCheckMode = RevocationCheckMode
                    });
                }
                catch
                {
                    Console.WriteLine($"Exception caught: {e.Message}");
                }
                PrintStatus(i);
            }
        }

        static void PrintStatus(int i) {
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            Process.GetCurrentProcess().Refresh();
            long privateMB = Process.GetCurrentProcess().PrivateMemorySize64 / 1024 / 1024;
            long managedKB = GC.GetTotalMemory(false) / 1024;
            Console.WriteLine($"{DateTime.Now:O} [{i,2}]: {privateMB} MB private, {managedKB} KB managed");
        }
    }
}

Expected behavior

Consistent and stable memory usage with reasonable size according to the size of CRL.

Actual behavior

Out-of-memory exceptions and consistent growth of native memory to unreasonable sizes.

Regression?

No response

Known Workarounds

Mitigation and not a workaround - disabled CRL check:
SslClientAuthenticationOptions.CertificateRevocationCheckMode = X509RevocationMode.NoCheck
or
HttpClientHandler.CheckCertificateRevocationList = false

Configuration

Debian GNU/Linux v11.3, v12
.NET 8.0: v8.0.42, v8.0.8
mcr.microsoft.com/dotnet/sdk:8.0

Other information

Related issues:
#52577
#101552

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 4, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Oct 4, 2024
@sfc-gh-dlisouski
Copy link

I'm pretty sure that the memory is allocated at this line: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/OpenSslCrlCache.cs#L114

And never is released after that. What I see is that pem_read_bio_x509_crl is called, memory used by the contained increases a lot. There are no other places where with spikes in memory utilization that I could find. It could be some issue with disposing the memory.

@vcsjones vcsjones added area-System.Security and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 4, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

@sfc-gh-dlisouski
Copy link

sfc-gh-dlisouski commented Oct 4, 2024

Possibly related: #7144

Another possibly related thing is https://openssl-users.openssl.narkive.com/K08kHmQg/memory-leaks-in-d2i-x509-crl-and-x509-crl-free that is talking about memory leak for large CRL after calling X509_CRL_free (my understanding it is what used in C# to free up memory in this case)

@bartonjs
Copy link
Member

bartonjs commented Oct 5, 2024

What I see is that pem_read_bio_x509_crl is called, memory used by the contained increases a lot.

If you're measuring the RES value, then that sounds consistent with what has been seen in the past: for each serial number present in the CRL, OpenSSL calls malloc to save that value into their ASN1 integer type; and a bunch of small mallocs cause the RES value to spike. They are correctly freed, but glibc decides to hold on to the small values rather than return them to the OS; and so doing the work in a loop doesn't cause a second spike.

That's not really something we can control; it's just how OpenSSL and CRL behave.

It could be some issue with disposing the memory.

The result from PEM_read_bio_x509_crl gets put in a SafeHandle, which is used in a using statement; when the using block ends we call X509_CRL_free, which then calls a bunch of calls to glibc free... which has... behaviors.

@sfc-gh-dlisouski
Copy link

@bartonjs I found openssl/openssl#5931. This ticket suggested that calling malloc_trim(0); can trigger memory return to OS. I ran the following code:

#include <malloc.h>

#include <openssl/x509.h>
#include <openssl/pem.h>

#include <string.h>
#include <string>

std::string get_from_proc(std::string what) {
    if (FILE* f = fopen("/proc/self/status", "r")) {
        what += ':';
        char line[128];
        while (fgets(line, sizeof(line), f)) {
            if (!strncmp(line, what.c_str(), what.size())) {
                fclose(f);
                return line;
            }
        }
        fclose(f);
    }
    return "";
}

int main(int argc, char* argv[]) {
    printf("After start: %s", get_from_proc("VmRSS").c_str());

    X509_CRL* crl = NULL;
    BIO* in = NULL;

    in = BIO_new_file("DigiCertGlobalG2TLSRSASHA2562020CA1-1.pem", "r");
    if (!in) {
        printf("BIO_new_file failed\n");
        return 1;
    }
    printf("After BIO_new_file: %s", get_from_proc("VmRSS").c_str());

    crl = PEM_read_bio_X509_CRL(in, NULL, NULL, NULL);
    if (!crl) {
        printf("PEM_read_bio_X509_CRL failed\n");
        return 2;
    }
    printf("After PEM_read_bio_X509_CRL: %s", get_from_proc("VmRSS").c_str());

    BIO_free(in);
    printf("After BIO_free: %s", get_from_proc("VmRSS").c_str());

    if (crl) {
        X509_CRL_free(crl);
        printf("After X509_CRL_free: %s", get_from_proc("VmRSS").c_str());

        malloc_trim(0);
        printf("After malloc_trim: %s", get_from_proc("VmRSS").c_str());
    }
    
    return 0;
}

Output:

After start: VmRSS:	    4352 kB
After BIO_new_file: VmRSS:	    5120 kB
After PEM_read_bio_X509_CRL: VmRSS:	  140564 kB
After BIO_free: VmRSS:	  140564 kB
After X509_CRL_free: VmRSS:	  140564 kB
After malloc_trim: VmRSS:	    6984 kB

As you can see memory is much smaller after calling malloc_trim.

It would be great if we can consider adding malloc_trim here and check if it fixes the issue. Do you think it could be possible to do it?

CRL that I used for testing: http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crl

@jeffhandley jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Oct 8, 2024
@jeffhandley jeffhandley added this to the Future milestone Oct 8, 2024
@sfc-gh-dlisouski
Copy link

Somebody will need to double-check the following, but I was trying to understand what X509_CRL_free() does. My understanding is that it eventually calls free(), which may explain why memory is not being returned to the OS. Most (if not all) implementations of free() simply mark the memory as available for reuse without attempting to return it to the kernel. This could also explain why calling malloc_trim(0) reduces memory usage.

There are multiple ways to approach this information, but I think I will leave it to the .NET team to decide on the next steps.

@vcsjones
Copy link
Member

It would be great if we can consider adding malloc_trim here and check if it fixes the issue. Do you think it could be possible to do it?

This is a very expensive call because it forces the allocator to walk every heap block with a lock in the entire process. This call can take seconds or even minutes to complete in real workload scenarios. It would cause very erratic performance behaviors for everyone.

You may consider using glibc tunables to see if you can ask the libc allocator to more aggressively return memory:

https://www.gnu.org/software/libc/manual/html_node/Tunables.html

Alternatively, you can also call malloc_trim yourself with pinvoke as needed from a background thread.

@sfc-gh-dlisouski
Copy link

This is a very expensive call because it forces the allocator to walk every heap block with a lock in the entire process. This call can take seconds or even minutes to complete in real workload scenarios. It would cause very erratic performance behaviors for everyone.

It makes sense, and to be honest, I anticipated this concern being raised. I'm not necessarily advocating for this specific fix. In my view, what truly matters is ensuring that memory usage remains stable after consecutive TLS handshakes. This should happen automatically, similar to how it behaves on Windows. Even if the first CRL check leads to a memory usage increase, as long as subsequent checks reuse the already allocated memory, I believe that would be an acceptable outcome.

@bartonjs
Copy link
Member

as long as subsequent checks reuse the already allocated memory, I believe that would be an acceptable outcome

That's certainly what we've seen in the past. The first one makes the RSS spike, but after that it reuses memory from the same pool.

Parallelization can, of course, make the peak be bigger... if thread A is building a chain using CRL B and thread C is building one against CRL D, then both B and D allocs happened in parallel, then they both free in parallel, making for a very large malloc pool. But just building the same chain in a loop should show stable memory.

@sfc-gh-jgarciadesoria
Copy link

I understand (I might be mistaken) that some kind of on-disk cache for CRLs is used underneath, but each concurrent connection itself re-parses the CRL into memory. I wonder if LRU-caching and sharing the in-memory representation too (assuming it's safe to share it across threads/contexts), even keeping it alive for a short time while not referenced to avoid reparsing when a single thread opens and closes connections, would be a possible optimization/mitigation here.

@bartonjs
Copy link
Member

It's probably worth an experiment to put CRLs into a static that uses MRU and WeakReference semantics. It would definitely help reduce peak memory if building two chains against the same CRL in parallel. It will cost complexity, and the memory impact of each CRL will last longer.

Something like a ConcurrentDictionary mapping a URL to a mutable union of Task (download in progress) and WeakReference (when the task finishes)... which would also help remove our parallel downloads "problem".

@bartonjs bartonjs removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants