Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SslStream hangs on client when SslStream server fails on protocol mismatch #914

Closed
the-black-wolf opened this issue Mar 28, 2019 · 52 comments · Fixed by #35549
Closed

SslStream hangs on client when SslStream server fails on protocol mismatch #914

the-black-wolf opened this issue Mar 28, 2019 · 52 comments · Fixed by #35549

Comments

@the-black-wolf
Copy link

Ok, the new preview and my bug caused the public/private key issue. However, there is still one issue left. On protocol mismatch, the server side throws an SSL error (SSL Handshake failed with OpenSSL error - SSL_ERROR_SSL). However, the client side is apparently unaware of the protocol failure and hangs (presumably waiting for server handshake to continue) . The only way to kick the client out of hang is to kill the underlaying connection on server, but then the exception showin in client is Authentication failed because the remote party has closed the transport stream. which is pretty useless for resolution.
Once again, this is done and tested on Linux, I have not tried the same on Win/Mac.

using System;
using System.Net;
using System.Net.Security;
using System.Net.Sockets;
using System.Security.Authentication;
using System.Security.Cryptography.X509Certificates;
using System.Threading;
using System.Threading.Tasks;

namespace SslFail
{
	internal static class Program
	{
		private static async Task StartClientAndAuthenticate(ClientServerState state, IPEndPoint endPoint)
		{
			try
			{
				state.Client.Connect(endPoint.Address, endPoint.Port);
				state.ClientStream = new SslStream(state.Client.GetStream(), false,
					(a1, a2, a3, a4) => true,
					(a1, a2, a3, a4, a5) => null);
				await state.ClientStream.AuthenticateAsClientAsync("localhost", null, SslProtocols.Tls12, false);
				state.ClientAuthenticated.Set();
			}
			catch (Exception e)
			{
				Console.WriteLine($"Exception in Client: {e.Message} {e.InnerException?.Message}");
			}
		}

		private static async Task StartServerAndAuthenticate(ClientServerState state)
		{
			try
			{
				state.Listener.Start();

				var s = new X509Store(StoreName.My, StoreLocation.CurrentUser, OpenFlags.ReadOnly);
				var serverCert = s.Certificates[0];

				state.ServerClient = await state.Listener.AcceptTcpClientAsync();
				state.ServerStream = new SslStream(state.ServerClient.GetStream(), false,
					(a1, a2, a3, a4) => true,
					(a1, a2, a3, a4, a5) => serverCert, EncryptionPolicy.RequireEncryption);
				await state.ServerStream.AuthenticateAsServerAsync(serverCert, false, SslProtocols.Tls13, false);
				state.ServerAuthenticated.Set();
			}
			catch (Exception e)
			{
				Console.WriteLine($"Exception in Server: {e.Message} {e.InnerException?.Message}");
			}
			finally
			{
				// if I do not do this, the client is hang (presumably until connection dies)
				// even then, the error is dropped connection and not a protocol mismatch, so there is no
				// way to course correct. 
				// state.ServerClient?.Close();  	
			}
		}

		private class ClientServerState
		{
			public TcpListener Listener { get; set; }
			public TcpClient Client { get; } = new TcpClient();
			public TcpClient ServerClient { get; set; }
			public SslStream ServerStream { get; set; }
			public SslStream ClientStream { get; set; }
			public AutoResetEvent ServerAuthenticated { get; } = new AutoResetEvent(false);
			public AutoResetEvent ClientAuthenticated { get; } = new AutoResetEvent(false);
		}

		private static void Main(string[] args)
		{
			var endPoint = new IPEndPoint(IPAddress.Loopback, 35454); 
			var state = new ClientServerState { Listener = new TcpListener(endPoint) };
			try
			{
				#pragma warning disable CS4014
				StartServerAndAuthenticate (state);
				StartClientAndAuthenticate (state, endPoint);
				#pragma warning enable CS4014
				state.ServerAuthenticated.WaitOne();
				state.ClientAuthenticated.WaitOne();
				Console.WriteLine("All is well, no bug detected!");
			} finally {
				state.ClientStream?.Dispose ();
				state.Client.Close ();
				state.ServerStream?.Dispose ();
				state.ServerClient?.Close ();
				state.Listener.Stop ();
			}
		}
	}
}
@stephentoub
Copy link
Member

Can you share a standalone repro?

@the-black-wolf
Copy link
Author

Sure. Let me just recreate it as I moved on to other testing. I actually stumbled on another issue (also Linux related). When my client's protocol is set to Tls12 and server to Tls13 it also hangs in exactly the same manner (instead of throwing protocol mismatch)

@the-black-wolf
Copy link
Author

Apologies for delay on this, I was busy last few days. I'll catch up weekend and provide a full example of both issues. I also noticed there is a newer preview of 3.0 so will also install the latest 3.0 preview to make sure it's still applicable.

@stephentoub
Copy link
Member

Thanks.

@the-black-wolf the-black-wolf changed the title SslStream hangs on server cert with no private key SslStream hangs on client when SslStream server fails on protocol mismatch Mar 31, 2019
@the-black-wolf
Copy link
Author

I updated the question. The public/private key issue has disappeared, so either it was fixed in last preview or I did something wrong. However, the protocol mismatch hang on client is still present. I added code to issue.

@wfurt
Copy link
Member

wfurt commented Apr 1, 2019

What version of Linux/OpenSSL do you use @the-black-wolf ? I can try to reproducer it and collect more info.

@the-black-wolf
Copy link
Author

mmix@BlackWolf:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.10
Release: 18.10
Codename: cosmic
mmix@BlackWolf:~$ openssl version
OpenSSL 1.1.1 11 Sep 2018
mmix@BlackWolf:~$ dotnet --version
3.0.100-preview3-010431
mmix@BlackWolf:~$ dotnet --list-sdks
3.0.100-preview3-010431 [/opt/dotnet/sdk]

@wfurt
Copy link
Member

wfurt commented Apr 1, 2019

I've seen some test failures with 1.1.1 and it may be related. There seems to be some behavior change we did not trace down yet. It may be interesting to try this on Ubuntu 18.04 if possible @the-black-wolf as that is LTS and has 1.1.0.

@the-black-wolf
Copy link
Author

Nope, no joy. I installed Ubuntu Server LTS on vmware box and setup dotnet runtime and My store and it behaves the same (sorry for screenshot, I did not find a way to copy/paste text out of linux guest terminal). After the server dies on protocol missmatch, the client stays stuck.

image

@the-black-wolf
Copy link
Author

Hmmm, I just tried this on Windows, using the same preview, and it behaves exactly the same:

c:\projects\test>dotnet --list-sdks
2.1.502 [C:\Program Files\dotnet\sdk]
2.1.503 [C:\Program Files\dotnet\sdk]
2.1.504 [C:\Program Files\dotnet\sdk]
3.0.100-preview3-010431 [C:\Program Files\dotnet\sdk]

c:\projects\test>dotnet build
Microsoft (R) Build Engine version 16.0.443+g5775d0d6bb for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.

  Persisting no-op dg to c:\projects\test\obj\SslFail.csproj.nuget.dgspec.json
  Restore completed in 23.4 ms for c:\projects\test\SslFail.csproj.
C:\Program Files\dotnet\sdk\3.0.100-preview3-010431\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.RuntimeIdentifierInference.targets(151,5): message NETSDK1057: You are using a preview version of .NET Core. See: https://aka.ms/dotnet-core-preview [c:\projects\test\SslFail.csproj]
  SslFail -> c:\projects\test\bin\Debug\netcoreapp3.0\SslFail.dll

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:01.04

c:\projects\test>dotnet run
Exception in Server: The client and server cannot communicate, because they do not possess a common algorithm.

So either this is broken across the board, or my sample is faulty in some way. Which is of course possible, but I am not seeing it. :(

@wfurt
Copy link
Member

wfurt commented Apr 3, 2019

thanks for trying @the-black-wolf I was planning to give it try on Ubuntu 18.04 but I'm getting distracted with other issues.

@davidsh
Copy link
Contributor

davidsh commented Apr 3, 2019

What happens if you reduce your repro to using .NET Core for the client (which you indicate hangs on protocol mismatch error) and use another endpoint for the server?

// Requires TLS 1.2.
"https://www.ssllabs.com:10303/"

@the-black-wolf
Copy link
Author

Ok, I have replaced endPoint with:

var endPoint = new IPEndPoint(Dns.GetHostEntry("www.ssllabs.com").AddressList[0], 10303); 

and removed server and server AutoResetEvent.

When I request Tls1.2 it negotiates ok (All is well, no bug detected!), when I demand Tls1.1 or 1.3, I get a protocol mismatch exception on client, as is supposed to

Exception in Client: Authentication failed, see inner exception. SSL Handshake failed with OpenSSL error - SSL_ERROR_SSL.

So, apparently the .net core client only hangs when talking to .net core server.

@the-black-wolf
Copy link
Author

I am not familiar with binary exchange technical details on Tls, but this all looks like AuthenticateAsServer fails to send an error condition back to client, who keeps on waiting for server to talk.

@davidsh
Copy link
Contributor

davidsh commented Apr 3, 2019

So, apparently the .net core client only hangs when talking to .net core server.

I think you might have introduced a deadlock in your code above when you use your own loopback server. There is a lot of "Wait" type patterns in that code. It's possible when the server is throwing an exception that it is doing it on the wrong thread (perhaps the main() thread) etc.

In fact lines like this in code:

await state.ServerStream.AuthenticateAsServerAsync(serverCert, false, SslProtocols.Tls13, false);

need to use .ConfigureAwait(false) to avoid these kinds of deadlocks.

@davidsh
Copy link
Contributor

davidsh commented Apr 3, 2019

but this all looks like AuthenticateAsServer fails to send an error condition back to client, who keeps on waiting for server to talk.

And if you think there is a bug in SslStream.AuthenticatAsServer, then it probably would be best to construct a repro using .NET Core for server-side but use .NET Framework for client-side. Again, this is to isolate the repro better. I still think there is a deadlock pattern in your code above.

@the-black-wolf
Copy link
Author

The .ConfigureAwait(false) did nothing. But, I don't expect it though, I do catch the server side exception, so its not throwing outside and server does reach the end of its task. I don't think its a deadlock, but I'll give one quick test with old style threads and then try with framework client later.

@stephentoub
Copy link
Member

In what situation would a server not close a connection after a failed SSL authentication? Isn't the connection basically unusable at that point?

@the-black-wolf
Copy link
Author

the-black-wolf commented Apr 3, 2019

not really, right? SSL/Tls is a tunneling protocol, you can still use high level connection if you like outside negotiated tunnel (STARTTLS works that way). Still, it would again be a dropped connection and client would not know it was rejected.

@davidsh
Copy link
Contributor

davidsh commented Apr 3, 2019

In what situation would a server not close a connection after a failed SSL authentication? Isn't the connection basically unusable at that point?

The server should close the socket after a failed TLS handshake. Or at the very least, it is supposed to send back a TLS alert.

The "hang" you are seeing is because the client is expecting data from the server. But the server has basically stopping talking with the client (because of a mismatch detected on the server-side). I think this hang is artificial and is basically a test bug due to the implementation of this test loopback server.

@the-black-wolf
Copy link
Author

SslStream should never drop a connection by closing underlaying stream if leaveInnerStreamOpen is true. But again, even on false, it should notify client before dying (the www.ssllabs.com did).
If the server has unilaterally stopped talking to client on mismatch then its a real bug.

@stephentoub
Copy link
Member

stephentoub commented Apr 3, 2019

SslStream should never drop a connection by closing underlaying stream if leaveInnerStreamOpen is true

I'm not saying SslStream would itself close it, but generally a server (the code using SslStream) would.

the www.ssllabs.com did

How? What did it send on the wire? How does that map or not map to the SSL spec?

@the-black-wolf
Copy link
Author

the-black-wolf commented Apr 3, 2019

I'm not saying SslStream would itself close it, but generally a server (the code using SslStream) would.

That will happen eventually, but Tls has to complete its cycle so that both client and server exit the "secure" stage. Keeping one side in the dark is never a good practice nor is dropping connection without explanation. Thats what douche firewalls do :)
As I said before, I am 100% certain that failed Tls negotiation is not mandatory grounds for connection termination. If both client and server fail secure communication and are both aware of it, the network streams on both sides should be empty and they could continue to communicate if they want to, maybe even try to renegotiate under different conditions. STARTTLS/Opportunistic TLS is an example, but implementation should allow you to build your own variants if needed.

How? What did it send on the wire? How does that map or not map to the SSL spec?

No idea, as I said I am not familiar with the Tls protocol, so even WireSharking wouldn't mean much to me. Its just common logic. Server is obviously the one who decides that mismatch is occurring, client has no idea that things have gone bad on server side, for it to properly conclude (and return appropriate exception) it is obviously required that server sends something back before it dies.

@the-black-wolf
Copy link
Author

the-black-wolf commented Apr 3, 2019

@davidsh OK, I completely removed async from code, I added both client and server as independent threads and the behavior is the same. I will try framework client tomorrow

Source is here: https://pastebin.com/8BN23Auu

@stephentoub
Copy link
Member

The server should close the socket after a failed TLS handshake. Or at the very least, it is supposed to send back a TLS alert.

@davidsh, it would be SslStream that would send a TLS alert, right? So does this mean we're missing that?

@davidsh
Copy link
Contributor

davidsh commented Apr 3, 2019

@davidsh, it would be SslStream that would send a TLS alert, right? So does this mean we're missing that?

Historically, SslStream has had problems in .NET Framework. For example, we wouldn't send TLS CloseNotify alerts when closing the TLS handshake due to error. And we had problems on the receiving side of SslStream in dealing with received TLS alerts. Some of these problems were fixed in servicing updates of .NET Framework.

I don't remember where we ended up in .NET Core SslStream. There might be some things we need to fix on the server-side and/or the client-side of SslStream regarding TLS alerts.

We need to tease apart the sending vs. receiving side of this repro and look at Wireshark traces to see what side might be at fault.

It is standard practice for most servers, for example, to close the Socket after a TLS handshake fails. But there might be some server scenarios that choose to only send a TLS alert on a failed handshake. And then resume normal TCP processing on that socket. I don't remember what the TLS RFC says about that in practice. I would need to investigate that.

@the-black-wolf
Copy link
Author

the-black-wolf commented Apr 4, 2019

That sounds like a probable cause of this. I'll try the framework client now, see what happens.

As for debate, as I remember from the old days of implementing some RFCs, there was always a rule to not overdo things and be tolerant in implementation where it does not break it. imho, unless Tls protocol by its nature leaves trash on sockets, there is no real need to force close connection on error, if C/S want to, they can do it themselves. One of the reasons I suspect this is the case is that whoever wrote SslStream on Framework introduced leaveInnerStreamOpen parameter on SslStream Constructor. Mandatory closure of connection would make this parameter faulty.

@wfurt
Copy link
Member

wfurt commented Apr 4, 2019

You may try to get packet captures @the-black-wolf .
That would definitely show what side is doing what.

I think if TLS handshake fails the method should throw so caller can deal with underlying stream. (and close as good citizen would do)

@the-black-wolf
Copy link
Author

the-black-wolf commented Apr 4, 2019

Well, things are getting more interesting. While testing two-process connect on Windows (when testing core client <=> core server) requesting the same protocol (Tls1.3) the client (!) declared mismatch (?). When I switch them both to Tls 1.2, they negotiate properly, so its not my code.
Do not know what to think of this, this sounds like a separate issue with Tls1.3 on Windows or SslStream issue with Tls1.3 on Windows. No idea why client and server would not connect on Tls1.3 on the same box, presumably the same cipher suits are available to both. When I let them auto-negotiate protocol they choose Tls1.2, so maybe Windows is not Tls1.3 ready? (I am using Windows Server 2019)

On Linux both attempts with matching Tls negotiated properly. Auto-negotiation goes to Tls1.3.

Continuing testing.

@wfurt wfurt self-assigned this Sep 9, 2019
@karelz
Copy link
Member

karelz commented Oct 5, 2019

@wfurt are you working on it? If not, please unassign yourself. Thanks!

@wfurt
Copy link
Member

wfurt commented Oct 6, 2019

yes, I do @karelz. I have fix for OpenSSL but schanel gives me some grief.

@maryamariyan maryamariyan transferred this issue from dotnet/corefx Dec 16, 2019
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net.Security untriaged New issue has not been triaged by the area owner labels Dec 16, 2019
@maryamariyan maryamariyan added this to the 5.0 milestone Dec 16, 2019
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Feb 13, 2020
@cheenamalhotra
Copy link
Member

I'm wondering if this is what causes hangs in SqlClient when connecting in Linux images that do not support TLS 1.0 and 1.1 protocols while servers do.

Refer issue dotnet/SqlClient#126 and dotnet/SqlClient#201

The driver does not seem to have any control over which protocol version is negotiated with server, which leads to come down to Sockets and SslStream classes.

Our implementation of TCP Handle can be found in SNITCPHandle.cs. Please let us know if there's a workaround we can use or if this issue is going to be addressed sometime.

@TechnologyAnimal
Copy link

TechnologyAnimal commented Mar 15, 2020

Hello everyone. I recently ran into an issue with System.Net.Security.SslStream and it's really ruining my day. I asked a question on reddit, and some very nice person pointed me to this thread. Details can be found in that thread, but will summarize here as well.

I wrote code that attempts to establishes ssl connections with remote systems to output what tls protocols a client can successfully establish with a server. My client is configured to use SSLv2, SSLv3, TLS1, TLS11 and TLS12, but my client hangs when testing a very small subset of the servers I am testing against. If I understand the bug correctly (protocol mismatch), considering my client has all protocols enabled except for TLS13, does that mean when my client hangs, the remote systems must be using TLS13 only?

This code is used to successfully scan hundreds of thousands of systems, but I continue to run into edge cases that cause my process thread to hang and I have no choice but to terminate the process.

In those cases, I have no problems creating a socket connection, creating the net stream and the ssl stream. The problem occurs when I attempt to use the ssl stream to authenticate as client. It doesn't matter if I use .NET Core 3.1 on Mac OSX or Windows Server 2012. It also doesn't matter if I use .NET Framework 4.8.

I am working in a very large environment without admin rights on these remote systems, but started to fingerprint these edge case services that are causing the thread to hang. Here are three example cases that I can consistently reproduce the failures:

Full source code was provided above, but here is a code snippet, with the final line being the problem:

$Socket = [System.Net.Sockets.Socket]::new([System.Net.Sockets.SocketType]::Stream, [System.Net.Sockets.ProtocolType]::Tcp)
$Socket.Connect($fqdn, $Port)
$NetStream = [System.Net.Sockets.NetworkStream]::new($Socket, $true)
$SslStream = [System.Net.Security.SslStream]::new($NetStream, $true, { $true })
$SslStream.AuthenticateAsClient($fqdn, $null, $ProtocolName, $false)

I would be more than happy to provide any additional information, test a fix, or anything else that might be helpful to bring resolution to the issue.

Any suggested workarounds would be greatly appreciated.

@TechnologyAnimal
Copy link

Bump?!

@pbsis
Copy link

pbsis commented Apr 5, 2020

I'm getting the same issue on both .NET 4.8 and Core 3.1. I've written a simple SMTP/IMAP proxy to implement some extra security and my STARTTTLS implementation won't work with either facebook or office365. When they try and send us emails I get a hang when calling AuthenticateAsServer. Tried all the various TLS versions but can't get it to work. I suspect it's a cipher mismatch but without an exception bubbling up it's difficult to debug. Just adding my +1 that it would be good to get this fixed. Thanks!

@TechnologyAnimal
Copy link

Can someone at least acknowledge this issue and offer some sort of assurance that this is being looked at? My concern is the untriaged tag being removed. I hope that doesn't mean someone is under the impression this issue has actually been fixed... Thank you.

@karelz karelz

@davidsh
Copy link
Contributor

davidsh commented Apr 14, 2020

Can someone at least acknowledge this issue and offer some sort of assurance that this is being looked at? My concern is the untriaged tag being removed. I hope that doesn't mean someone is under the impression this issue has actually been fixed... Thank you.

The issue is still Active and not Closed. It is assigned to @wfurt and will be investigated for fixing during our current .NET 5 milestone.

@TechnologyAnimal
Copy link

Can someone at least acknowledge this issue and offer some sort of assurance that this is being looked at? My concern is the untriaged tag being removed. I hope that doesn't mean someone is under the impression this issue has actually been fixed... Thank you.

The issue is still Active and not Closed. It is assigned to @wfurt and will be investigated for fixing during our current .NET 5 milestone.

Thanks for the reply! I am using this daily to scan hundreds of thousands of servers for tls configuration. Due to no tls alert and no error messages, I've had to put less than ideal workarounds in place to workaround the bug. I am eager to get that mess out of the code base, so if there is anything I can help test or clarify, I would be happy to do so. Thanks again.

@wfurt
Copy link
Member

wfurt commented Apr 14, 2020

Some improvements were maid. Part of the problem is that Schannel does not generate alert in all cases - version mismatch is one of them. However, the AuthenticateAs*() will throw and in case of server, that would typically close connection and client would see that as IOException.

If you have sample code or packet captures let me know @TechnologyAnimal. I left this one open to look at macOS

@wfurt
Copy link
Member

wfurt commented Apr 14, 2020

one more note @TechnologyAnimal is that the alert was not generated on the server side. It seems like your scenario is client. With that, if server does not sent alert and does not close connection, there is really nothing we can do besides timeout.

@TechnologyAnimal
Copy link

one more note @TechnologyAnimal is that the alert was not generated on the server side. It seems like your scenario is client. With that, if server does not sent alert and does not close connection, there is really nothing we can do besides timeout.

Hi @wfurt . I replied above with a detailed breakdown of what I've been experiencing, along with the code I have been using. Unfortunately, it's written in PowerShell, which I assume is less than ideal, but it's all that I have.

My goal is to scan a few hundred thousand servers regularly to identify what TLS protocols are supported by a server. In a handful of edge cases, there is no TLS alert sent from the server to the client. The end result is that those requests completely hang. No exception is thrown. No timeout occurs. The process hangs forever, which I assume is not the expected behavior.

I was able to workaround the issue by wrapping these calls up inside jobs and assigning a timeout value, but I find it to be really messy.

Any suggestions or advice would be greatly appreciated.

@wfurt
Copy link
Member

wfurt commented Apr 14, 2020

In that case this GH issue is really irrelevant. The servers you connecting to may or may not have updated .NET or may not run .NET at all.

I'm not familiar with PowerShell and their APIs so I'll outline few strategies.
You generally deal with situation when you expect response from server but it is not coming - neither error nor success. I think in that case (short?) timeout is really only option IMHO.

From .NET API prospective, SslStream really only creates and decodes TLS frames. It depends on underlying Stream given to it to handle IO and timeout. In typical case, the sequence would be Socket -> NetworkStream -> SslStream. You can create wrapper stream to throw after some time or you can use select/poll. You can probably use HttpCient with timeout. That could work for servers others then HTTP. You would get OperationCanceled for timeout, AuthenticationException for TLS issues and anything else for protocol mismatch.

If I would do that in shell my self, would probably spawn a process - like openssl s_client or other app to do the handshake. With that I can kill it easily if that does not finish on time as well as I could do more in parallel/background. That is more expensive than doing everything synchronously inline but given the conditions, I don't think there is other easy way unless you find way how to control timeout.

@wfurt
Copy link
Member

wfurt commented Apr 14, 2020

reading through the thread, it seems like your use case is different @pbsis. Do you limit ciphers or TLS versions when you do AuthenticaterAsServer()? What OS you running on? It may be worth of trying daily builds and posting packet capture. (from side between client and your proxy e.g. .NET plays server)

@ghost ghost locked as resolved and limited conversation to collaborators Dec 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants