-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 659: sync and async timer state rework #906
Conversation
HI @Wraith2 This is good progress! It does improve things, I've extended my repro in last comment in #659 further so we can now consistently reproduce this issue. But I see exceptions where it's not expected: Source: https://gist.github.com/cheenamalhotra/5f4b35603c491484e51be38abf15548a#file-659_repro2-cs |
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Outdated
Show resolved
Hide resolved
I'm not suggesting this is a full fix, at best it's just a mitigation for the timing problem and as you point out it still needs further testing and dev. It is however minimally invasive and builds on the existing timeout code as well as eliminating the problem in the original test case. It's a promising avenue. |
Caught it. The timeout on the 3rd entry is a queued callback from the previous one. I forced the timer to be disposed each run through and set the state value on the timer when it was created. When the 3rd test is run and the callback runs the state value does not match the one it should have been, it's the value from before. Sadly we can't use this to detect the fault because the only way to set the timer on the state is in the constructor. It does mean that we need to identify each callback and link it to the individual request that it was generated for so we can ignore ones that don't match the current query.
|
Yes because this StateObj is reused, we need to also associate it with something that identifies the query/command that's executed. |
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Outdated
Show resolved
Hide resolved
I've updated this to include the changes I described in #907 (comment) The timer must be disposed each time, this is unavoidable. I've added the state value, source and timeout state class as well as a spare to allow safe re-use when the timer is behaving (which should be most of the time). The CI tests just aren't working correctly. It's failing on all sorts of things unreliably. Locally the tests pass. Debug and Release run through the original 4000 thread without encountering any invalid data, up to 400,000 requests in release mode at time or writing. The 2 repros you linked above also behave as expected now @cheenamalhotra and I've added a note and ifdef around the thread sleep so even if it got merged it wouldn't make it to production. The first commit made timeouts more reliable under load. This second commit adds the fix for late timeout callbacks that @David-Engel @cheenamalhotra and me have converged on by investigation. I think this is a decent candidate for the fix with both pieces of work included even though we strictly only need the second.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that your implementation improves the reliability of timeout to throw under load by calling OnTimeoutSync() directly from ReadSni instead of relying solely on the timer. But I think we need to address the rollover of _timeoutIdentitySource. I'll probably close my PR in favor of moving yours forward.
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Show resolved
Hide resolved
I've removed the spare. I realised that it would only be useful when a timeout occurs which should be rare. As such it wasn't worth the complexity. |
* Add Async API Tests for Timeout
Tests merged. CI is running now as i've changed status to open. Do you want to try and do netfx in this same PR or a separate one? I'd prefer a second one but let me know what you think. |
Let's do it here, I see pipelines are failing in NetFx due to same reason. |
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNIPacket.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Show resolved
Hide resolved
Ok Done. I was worried that the netfx version was going to be very different but it looks like the relevant pieces of code were highly conserved between the two. The only thing of note is that netfx creates the timer without suppressing execution context flow. I didn't change the behaviour but it might be a perf win on netfx to look at in future if the same thing can be done. |
c6cb20e
to
4e4a7f8
Compare
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: David Engel <[email protected]>
… as per discussions. This reverts commit 4e4a7f8.
Adds in finer grained timerout state and allows checking of timeouts synchronously in case async timers are behaving unreliably. Running the original repro I can no longer get invalid results. @cheenamalhotra @David-Engel