-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to capture runtime fatal errors #3375
Comments
@sdmaclea can you take a look? I was chatting with @davidfowl and there is an important web scenario that we should consider. cc @jkotas |
There are 2 parts to this:
Which part is more important for ASP.NET scenarios? The first can probably be done without much hassle, the second one is much more tricky. |
Our main goal is to get exception information to customers. The problem that we are hitting is that in addition to that we need to prevent the hosting process from crashing. To accomplish that we catch exceptions coming from the thread we run All we really want is to get CLR to log Event Log message without crashing the process. |
And what do you want CLR to do with the unhandled exception? Swallow it and hope for the best? |
+1 to Jan's comment. Or in more detail: |
That's what we do now anyway. Catch it and hope for the best. |
You are basically asking for The problem with swallowing unhandled exceptions is that they tend gradually corrupt the process state and the process will crash anyway in much harder to diagnose way. This was explained in https://docs.microsoft.com/en-us/dotnet/standard/threading/exceptions-in-managed-threads . If you really want to use |
For context, here's what the user experience is like today dotnet/aspnetcore#5153. The main thing we need to handle is exceptions in Main. Exceptions on background threads can keep crashing as usual (those also log to the event log on windows which we want). |
Can you wrap Main in that case - and just add try/catch around it - that feels like the safest solution. |
No, because we don't want to change user code to get this behavior. We need to try/catch, then set some state in native code so that we can show an error until there is a change. The flow is that the user makes a request, then the CLR fails to boot (because of the error in main), then we set state to make future requests fail until there's a app/configuration change. We want the CLR to log errors as normal in this case but we want to get control back so that we can set the state to make future requests fail. What do you suggest? |
What about this:
|
We don't want to write the code to find Main ourselves and call into it. We've basically been down this style of road before where we end up re-implementing parts of the CLR that already work to tweak 0.000001% of the logic, I'd like to avoid doing that here again. Doing this would require new methods in hostfxr that doesn't run main but instead returns a CLR host we can mutate and further configure before calling into the app. We need a simple way to pass your exception handler enough state to avoid the code that aborts the process. I think we should look for solutions in that line of thinking. |
@davidfowl have you considered using the startup_hook and providing an unhandled exception filter: https://docs.microsoft.com/en-us/dotnet/api/system.appdomain.unhandledexception?view=netcore-2.2. That could do the necessary handling as the exception occured, though the main thread would still crash. |
Yes and yes. Using the startup hook I believe doesn't give a way to stop the crash which is what we need. I did consider this though. Ideally in a hook, we'd be able to wrap Main with a try/catch. |
@davidfowl What would be the API or config setting you would propose to solve this problem that works for you? |
There would be a config setting/environment variable that would set a thread local on the thread that executes the Main entrypoint. That would flow to the exception handler logic and would disable the abort call. COMPlus_DisableAbortOnExceptionsOnMainThread=1 |
Could you please elaborate? Today, we have a global unhandled exception handler. It writes to event log, invokes Watson and aborts the process. We can remove the "invokes Watson and aborts the process" part and let the thread exit without crashing the process, but I am not sure whether it is actually what you want. |
Yes, we want to avoid crashing the process on the main thread. Everything else is fine. |
Hmm, there does not seem to be a way for us to exit the thread cleanly once the exception once it becomes unhandled. We can block the thread indefinitely that would prevent it from crashing the process. Also, the environment variable would be inherited by all child processes... |
This won't work as host process won't be able to distinguish exception in main from server never starting and display the correct message. |
I think something like @vitek-karas idea is probably the solution. @davidfowl is right having this in ASP.NET is bad from a coupling perspective. Is there any reason we couldn't just implement this in the runtime? The try catch could write to a native buffer for tracing and return an error code corresponding to uncaught exception and disable the Watson + Abort. |
Right, its mostly there we just need to disable the abort on the main thread. We're ok with background threads crashing the process. |
I assume that we would add new methods on coreclr / hostfxr to do this. Correct? |
That would be the plan I think. Possibly in an extensible way taking a options struct with a old school struct size. |
This needs to be part of the 3.0 milestone, what we have today would have been a ship stopper had all the right parties been notified |
Sorry to coming to this thread late. Do we have a proposal? I will suggest to separate two things -
Do we have a way to separate them? To be honest to you, in 2, continue really does not buy anything to us. |
@davidfowl ping |
We had a meeting to discuss this. We briefly discussed the requirements and the proposed solution. The next step is to document the proposed design. I have been focusing on infrastructure, but will be able to make progress next week. |
We added dotnet/coreclr#23054. I was thinking that was all that was remaining. Can this be closed or do we have another action item? |
Fixed by dotnet/coreclr#23054 |
As discussed in https://github.com/dotnet/core-setup/issues/4455#issuecomment-422214003 it would be nice for the host to be able to capture runtime fatal errors (including unhandled exception).
Original motivation: In ANCM we are trying to prevent the worker process from crashing by means of catching all native exceptions coming from
hostfxr_main
but doing so overrides coreclr unhandled exception handler that is logging to event log making diagnostics of unhandle exception very hard for customers.@vitek-karas @jkotas
The text was updated successfully, but these errors were encountered: