-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Don't try to access if thread is not running #3174
Conversation
I think @chillitom is right and there might be races here. but it already improve things... |
tcs.TrySetCanceled(cancellationToken) |> ignore | ||
| exn -> | ||
disposeReg() | ||
if tcs.Task.Status = TaskStatus.Running then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see cancelling the task when it is not still running is a problem. But shouldn't we call set exception whether the task is running or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thema actually issue issue issue setexception. I only added these other cases for symmetry
Can you take a look at this please. The code is on the face of it all right, it's just I can't help thinking that it is treating the symptoms rather than the issue. I wonder if there is something wrong with the implementation. Getting an ExecutionEngineException out of a normal use of an API seems a bit excessive. |
Yes it issue definetly only symptomatic
|
@KevinRansom @forki This looks like a sound fix to me - not just symptomatic. My understanding is that the cancellation registration has triggered, putting the task in "cancelled" state, just prior to the exception continuation being executed. There should be no race condition since only one continuation will be called (and we have already disposed of the cancellation registration at this point) I agree that calling TrySetException should no cause such a catastrophic error when a task has already been cancelled, it seems quite wrong. Perhaps there is something else lurking there. |
@stephentoub is taking a look. |
A few things:
|
I think everyone agrees on that, but not the runtime :)
The task can only finish via a) By the CancellationRegistration -> We already disposed that at this place (before doing the check) We don't leak the
Yes ultimately it looks like a runtime bug which needs a real fix. But the workaround looks good.
@forki posted a way to reproduce here. Maybe you can try to reproduce? The message only really contains some generic text and some address pointers I don't think it will help. |
@matthid I will add it to the long list of things I need to look at. |
@stephentoub that's what I thought as well. But something in TrySetException seems to create that crash. |
If something in TrySetException is crashing, most likely it actually has little to do with it, and, for example, something somewhere else has corrupted memory that TrySetException touches (e.g. something gets JIT'd during the call and that experiences a problem). If the Task is already completed, TrySetException just checks a flag and returns false. You can see the code in the reference source. What is the full stack of the exception, including the hidden "external" frames? |
@stephentoub #3171 (comment) is all I have |
I was able to repro before the change yes. But I'm not in work office this week so can't do that. But you should be able to repro with the steps I listed. |
I've followed all the steps but now can't get the repro to trigger! Frustrating. @forki do you remember which commit exactly you tested against? Is there a way to be sure I'm running my custom built version of the F# tools? Checking in Tools and Extensions shows a data from 4 days ago. |
Thanks, @chillitom. That stack shows that:
|
|
@stephentoub so what would be the proper fix? |
I have no idea... it's unrelated to Tasks as far as I can tell (Tasks here are just the infrastructure that's invoking the delegates that are then issuing the fail fast) and likely specific to the VS environment, the project system, Roslyn, etc. The fail fast being generated by VS should contain some details on what the problem was... if it's not getting output to the debugger or the console, you might look in the event log. Others like @KevinRansom will likely have a better idea. |
here are the last few lines of debug console output:
|
please remember:
^^ this is the code that "fixed" it. |
@forki Do you know what |
@chillitom can you please check? |
In particular, for example, the "fix" changed it to complete the Task only if it was running, but it may have been, for example, WaitingForActivation, in which case it would have skipped completing the task and the continuations wouldn't have run at all. |
no it does not. It's just a bit differently aligned so that it might look like this if you are not used to F# |
Yes, I misspoke. But regardless, it changed it to complete the task only if it was Running... it may have been another non-completed status, like WaitingForActivation, in which case the change simply didn't complete the task at all, preventing any of its continuations from running, and it's the body of the continuations that are issuing the fail fast. |
@chillitom there is an easy trick: |
|
Thanks. Which means the "fix" really just preventing the continuations from running, by never completing the task. That'd be like fixing a bug by simply not calling the method that throws the exception or issues the fail fast. Understanding the right fix requires understanding why that code is throwing / fail fasting. |
I must admit I find Roslyn's predilection for using FailFast a bit disturbing, though they look like odd corner cases where it is used. We hit this in one other case (unrelated to this - see #3071) |
There are only two calls to ReportFatalError in Roslyn - they are related to whacky corner cases in task settings which I don't understand but they do seem to be plausibly related Why Roslyn has to turn these into catastrophic fatal errors I don't know. |
Just took a quick look. Two things:
|
@stephentoub Thanks for the explanation. I think I'll need to try to repro this.
I have to admit that explanation makes me more and more convinced that the F# async design philosophy of "compose task generators, not tasks" is fundamentally the right one :) |
@forki I've put "WIP" on this since the current fix definitely isn't right (as @stephentoub mentions it is just suppressing the completion of the task in some cases) |
Well, I can repro where the first exception is being raised. However I still need to understand why this has become more catastrophic after the change to how we start tasks Here's a related stack which shows that the F# compiler routine GetSymbolUseAtLocation is raising an exception, and that this is causing an F# async related to the " at Microsoft.FSharp.Core.Operators.Raise[T](Exception exn) in C:\GitHub\dsyme\visualfsharp\src\fsharp\FSharp.Core\prim-types.fs:line 3803\r\n at Microsoft.FSharp.Compiler.Import.ImportTypeRefData(ImportMap env, range m, ILScopeRef scoref, String[] path, String typeName) in C:\GitHub\dsyme\visualfsharp\src\fsharp\import.fs:line 96\r\n at Microsoft.FSharp.Compiler.Import.ImportILTypeRef(ImportMap env, range m, ILTypeRef tref) in C:\GitHub\dsyme\visualfsharp\src\fsharp\import.fs:line 141\r\n at Microsoft.FSharp.Compiler.Import.ImportILType(ImportMap env, range m, FSharpList |
Ugh, previously we just returned I'll submit a PR to simply do what we used to do for exceptions. |
@dsyme Thanks mate ... |
Closing in favour of #3182 That fix still needs to be verified however |
fixes #3171