-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SqlConnection.Open() is not finished with success to specified Failover Partner on linux. #583
Comments
Hi @alexanderinochkin , Thank you for the discovery. I appreciate the detailed description and explanation for this problem. I will go through the reference article you attached and test it locally. |
Hi @alexanderinochkin , I am trying to reproduce this issue locally but it actually can connect to the failover partner from my Ubuntu 18.10 after I stop the Data Source server service. For my testing, I have ServerA and ServerB which are configured as data mirroring to each other.
It always connects to the failover partner ServerB for me... Another thing I would like to check with you is whether you are connecting to a SQL Server availability group listener or SQL Server Failover Cluster Instance instead of data mirroring servers? And do you have the The Socket.Connect() establishes a network connection synchronously while Socket.BeginConnect() does it in an asynchronous mode. Could you explain more why we should use the asynchronous connection for general synchronous cases? |
Hi @karinazhou,
It is not enough to stop SQLSERVER service.
From the code of "private static Socket Connect(..)" function it is expected that call of "void Cancel()" will terminate sockets[i].Connect(ipAddresses[i], port) process. Otherwise no need to register it with cancellation token "cts.Token.Register(Cancel)". Am I right? But in real "sockets[i].Dispose()" call does not return immediatelly and does not terminate "sockets[i].Connect(ipAddresses[i], port)" call until described timeout (130 sec). |
Hi @alexanderinochkin , I tried your suggestion to block the ServerA IP on Ubuntu and I can now reproduce the hanging behavior. If the connection timeout is increased to more than 130s such as 180s, it can connect to the failover partner in the end. In fact, even without giving the failover partner in the connection string, the connection will fail after about 130s as you have discovered. Another thing I found is that if you try something like When testing the managed SNI on Windows by setting the following two lines of code in your application :
the |
Hi @alexanderinochkin , I tried a standalone application to simulate the behavior we have in driver's SNITCPHandle: Connect(). Instead of calling the Socket.Connect, it calls BeginConnect() and EndConnect() which are for Async request. However, when I test this standalone app on Ubuntu, it still needs about 130s to kill the TCP request. From the debug message I added, it is hanging at EndConnect() which is the same as Dispose() before.
You can find my testing program attached here. Since I am new to Async coding, please feel free to point out any mistakes I may have in the code. |
While investigating another issue related to SNITCPHandle.cs, we find the following change which goes back in 2018: In this PR, the original ConnectAsync() implementation was replaced by a sequential Socket.Connect() logic. This change is to fix the issue when there are multiple concurrent users connecting to the server. The detailed background can be found here: |
HI @karinazhou I think this implementation will be optimal. It should not do multiple concurrent connections, but allow to accurately measure connect timeout. Here no any operation which unexpectedly block thread in the code.
|
Hi @alexanderinochkin , I tried your proposed code snippet and it didn't hang the application after a pre-defined timeout which is good. However, I still have some concerns about this potential fix. After calling Socket.BeginConnect(), there is no corresponding EndConnect() called afterwards. Accroding to MDSN Socket.BeginConnect and Socket.EndConnect, EndConnect is supposed to be called after BeginConnect. Though the Socket.Close() is called when socket is not connected, is there any unexcepted consequence without calling EndConnect ? |
Hi @alexanderinochkin , I tested your suggested change with one of the old issues we have involving multiple concurrent users: If we change the socket connection to async calls (Socket.BeginConnect), it will bring back the old SqlException. I have attached the repro application which is originally from To test it, you can replace 'ConnectionString' with your accessible server IP and login credentials in /home/yourUserName/.nuget/packages/microsoft.data.sqlclient After coping with the whole project to your Ubuntu machine, go to
If you change the MDS version to 2.0.0 in |
Hi @karinazhou, |
Hi @alexanderinochkin , We will keep this in our backlog for further investigation until we find a proper way to fix it. For now, I am afraid that we have to keep using Socket.Connect in the driver. |
This issue is no longer reproducible in the current version of the driver. |
The main problem is:
We use sql connection string with Failover Partner server and Connect Timeout = 30.
The program runs on linux (Ubuntu 19.10).
When Data Source server is not available (server turned off or blocked by firewall),
SqlConnection.Open() does not connect to Failover Partner server despite it is available and act as SQL principal.
The reason description:
As it was discovered, the problem is at SNITCPHandle.Connect(...) procedure: It does not terminate connection at specified timeout.
https://github.com/dotnet/SqlClient/blob/3ca39848735143055d9d7d4864d5f1bfd1976b99/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNITcpHandle.cs
It uses CancellationTokenSource with Cancel() callback, but it does not really cancel socket.Connect() operation.
Here is the part of code with additional debug logging:
Below is the console output when run this methhod with timeout = TimeSpan.FromSeconds(10):
As you can see, despite Cancel() callback is occured at spesified timeout, the Socket.Connect() is continue processing and finished with 130 sec timeout.
The reason of such behavior is well described at Marek Majkowski's article "When TCP sockets refuse to die": https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
To solve the problem:
As a result: Socket.Dispose() is not proper method to cancel Socket.Connect() process.
To be able to finish connect in specified timeout, this should be refactored to use Socket.BeginConnect() call.
Where to reproduce:
It is reproduced on all versions of System.Data.SqlClient.SqlConnection and Microsoft.Data.SqlClient.SqlConnection.
The text was updated successfully, but these errors were encountered: