Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix connection resiliency check #310

Merged
merged 2 commits into from
Nov 15, 2019
Merged

Conversation

David-Engel
Copy link
Contributor

@David-Engel David-Engel commented Nov 14, 2019

Fix for issue #304.

The connection resiliency check was broken in corefx PR 34047. Specifically, this change:
https://github.com/dotnet/corefx/pull/34047/files#diff-78c5752d4a8ee85fdc3593a8851c86e9L914

After my previous PR failed, I looked at other suspect changes in corefx PR 34047 and tried reverting them. This one line revert fixes my non-debug repro described in the issue. (I have not been able to repro this while debugging.) The original/working Task.Run call and the Task.Factory.StartNew call are almost functionally equivalent. After sprinkling some console messages throughout the reconnect path, I found that in SqlUtil.WaitForCompletion, task.Wait does not actually wait for the reconnect task from Task.Factory.StartNew ( a System.Threading.Tasks.Task``1[System.Threading.Tasks.Task]) to complete. So execution continues back up to the reader which hits a closed connection before the connection is actually reconnected. task.Wait does actually wait for the Task.Run generated task (a System.Threading.Tasks.UnwrapPromise``1[System.Threading.Tasks.VoidTaskResult]) to complete.

Update: The subtlety was the fact that Task.Run() is really an Unwrap()'ed Task.Factory.StartNew(). So simply adding .Unwrap() to the line also fixes the issue and preserves the previous perf benefit.

@cheenamalhotra
Copy link
Member

Could it be possible that the earlier change also works in conjunction with this one and there are more than 1 line changes that impact connection resiliency in order to consistently reproduce and produce a fix? If you could try to revert all considerable changes from the PR together, do you see a different behavior?

@David-Engel
Copy link
Contributor Author

David-Engel commented Nov 15, 2019

I updated the original description to note that a Task.Run() is really an Unwrap()'ed Task.Factory.StartNew(). So simply adding .Unwrap() to the line also fixes the issue and preserves the previous perf benefit. I've updated the PR with this change.

Thanks @JRahnama for pointing me to the difference.

@Wraith2
Copy link
Contributor

Wraith2 commented Nov 15, 2019

There are a number of other place that I made a similar change to avoid the implicit closure allocations of Task.Run, it's probably worth checking all instances of Task.StartNew to see if they should all be unwrapped.

Copy link
Contributor

@saurabh500 saurabh500 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch David.

@saurabh500
Copy link
Contributor

Thanks for the explanation

@David-Engel
Copy link
Contributor Author

There are a number of other place that I made a similar change to avoid the implicit closure allocations of Task.Run, it's probably worth checking all instances of Task.StartNew to see if they should all be unwrapped.

I did check them. All the other instances of that change do not preserve the returned object for later reference and so would not be affected by this behavior.

@Wraith2
Copy link
Contributor

Wraith2 commented Nov 15, 2019

Excellent news. Do all the other calls do the unwrap or do they not need them? If the unwrap is needed it might be useful to introduce a helper for the StartNew call so any other instances don't hit this subtle behaviour.

@David-Engel
Copy link
Contributor Author

Could it be possible that the earlier change also works in conjunction with this one and there are more than 1 line changes that impact connection resiliency in order to consistently reproduce and produce a fix? If you could try to revert all considerable changes from the PR together, do you see a different behavior?

After the conversation in the other PR, I looked closer at ReadEmptyPacket and it really looked like a no-op in both native and managed code paths so I don't think its removal affected anything. However, I did not analyze that code path for possible effects on functionality if its removal actually affected the returned result and how to test them.

Excellent news. Do all the other calls do the unwrap or do they not need them? If the unwrap is needed it might be useful to introduce a helper for the StartNew call so any other instances don't hit this sutble behaviour.

You did not change any other Task.Runs to StartNew in that PR. That was the only one. But I did look at the couple other instances of StartNew in the code. They do not do the unwrap and it looks like they do not need them. Because either we aren't saving the returned object for reference, or the returned type does not make a difference for how we are referencing the object later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants