Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate use of LongRunning continuation option in SqlClient async reads #21739

Closed
corivera opened this issue May 16, 2017 · 12 comments · Fixed by dotnet/corefx#28627
Closed
Assignees
Milestone

Comments

@corivera
Copy link
Member

A change was made here https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/SqlClient/SNI/SNIPacket.cs#L270 to fix timeouts in Entity Framework tests due to blocked async reads. Using the LongRunning option means a new thread will be created on every async read. We need to investigate if this LongRunning option will cause other side effects, and if it can be replaced with a better threading solution.

CC @saurabh500

@corivera corivera self-assigned this May 16, 2017
@corivera
Copy link
Member Author

corivera commented May 16, 2017

CC @stephentoub @divega

@divega
Copy link
Contributor

divega commented May 17, 2017

Does this really belong in 2.1 as opposed to 2.0? @stephentoub mentioned before:

(Just note that this particular usage is potentially going to add a ton of overhead... you're spinning up a new thread for every read on the stream 😦 )

@stephentoub
Copy link
Member

I suggest that at least understanding the root cause should be in 2.0; it's possible we'll discover the fix is too complicated to make 2.0, but the investigation should at least make it.

@saurabh500
Copy link
Contributor

Cory and I looked into this issue.

The basic issue is that SqlClient tries to start a new thread for every packet read which is long running and this has scope for improvement, however the changes would be a bit involved to get into 2.0.0. We should strive to improve the thread creation.

The basic idea is to accumulate the network bytes and send them back when we have a buffer full or packet or if there are no more packets available on network. This is the strategy used by Native SNI. We needn't invoke a callback into the SqlClient processing layer for every read.

Doing some testing we figured out that the worst case and best case thread count with or without the LongRunning option remains the same. So the fix should stay in 2.0. However the issue should be solved with re-design for Async reads/writes in upcoming releases.

I am moving this issue to 2.1.0

@stephentoub
Copy link
Member

Why was the LongRunning needed in the first place though? The comment was that a bunch of threads were blocked causing long delays, and LongRunning works around that by adding additional threads, but do we know what the blocking was caused by? What calls were taking so much time synchronously on thread pool threads?

@stephentoub
Copy link
Member

Oh, I just saw your comment "the issue should be solved with re-design for Async reads/writes in upcoming releases"... are you saying that the network reads/writes under the covers are actually being done synchronously?

@saurabh500
Copy link
Contributor

Native SNI depends on IO completion ports and OVERLAPPED structs to manage the network reads. So they can be asynchronous. In case network packets are not available, a IO_Pending status is returned. We will have to model Managed SNI on similar lines.
Retrieval of network packets is asynchronous and is done very quickly in Native SNI where a thread is spun up to query the IOCompletion port and fill in the buffer passed by the managed layers.
It's not synchronous.
The difference is that Native SNI doesn't callback into the Managed parts of SqlClient for every read,

@saurabh500
Copy link
Contributor

By the re-design I meant that the completion of network read should be followed by a very quick continuation task instead of sending the partially read packet back to SqlClient to process it.

@corivera corivera removed their assignment Jul 5, 2017
@sebastienros
Copy link
Member

This https://github.com/dotnet/corefx/issues/24480 is probably the cause of the initial issue and we should look into it and hopefully revert the LogRunning flag. /cc @geleems

@stephentoub
Copy link
Member

@saurabh500, @corivera, can this be reverted now? Or if it's still an issue, investigated to determine the root cause rather than working around it in this expensive way?

@saurabh500
Copy link
Contributor

@stephentoub This cannot be reverted yet. This was added for connections using MultipleActiveResultSet however the fix applies to all async operations in SqlClient.
We are going to change this flag to be applied only in case of MARS in short term till we have a solution for fixing MARS.

MARS data is always sent asynchronously even when the sync APIs of SqlClient are called. In case of multi threaded environment, it looks like data transfer on connections using MARS start suffering from thread starvation as well.

@stephentoub
Copy link
Member

Ok. Thanks.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants