FATAL_GC_ERROR produces hard to diagnose hangs or crashes #112599

jkotas · 2025-02-15T16:18:28Z

The crashes caused by FATAL_GC_ERROR are very hard to diagnose. I helped somebody to diagnose one of these and it took more than a day to find out what's causing the problem.

Repro

Add unconditional FATAL_GC_ERROR(); to gc_heap::verify_free_lists
set DOTNET_HeapVerify=1
Run a simple test that just calls GC.Collect on x64 checked runtime

Actual behavior

This is one of the possible failure modes. I have also seen other asserts or hangs with mode complex tests.

Assert failure(PID 24832 [0x00006100], Thread: 16656 [0x4110]): unbreakableLockCount == m_pThread->GetUnbreakableLockCount() || (!m_pThread->HasUnbreakableLock() && !m_pThread->HasThreadStateNC(Thread::TSNC_OwnsSpinLock))

CORECLR! FCallCheck::~FCallCheck + 0x40 (0x00007ffa`68c11370)
CORECLR! CallSettingFrameEncoded + 0x2A (0x00007ffa`69185c8a)
CORECLR! _FrameHandler4::FrameUnwindToState + 0x28E (0x00007ffa`691844ce)
CORECLR! _FrameHandler4::FrameUnwindToEmptyState + 0x4B (0x00007ffa`6917c1eb)
CORECLR! _InternalCxxFrameHandler<__FrameHandler4> + 0x283 (0x00007ffa`691829c3)
CORECLR! _InternalCxxFrameHandlerWrapper<__FrameHandler4> + 0x6A (0x00007ffa`69182cba)
CORECLR! _CxxFrameHandler4 + 0xFB (0x00007ffa`6917d04b)
CORECLR! _GSHandlerCheck_EH4 + 0x90 (0x00007ffa`69179070)
NTDLL! chkstk + 0x11F (0x00007ffb`04a43f8f)
NTDLL! RtlUnwindEx + 0x352 (0x00007ffb`048f4d22)
    File: C:\runtime\src\coreclr\vm\fcall.cpp:196
    Image: C:\runtime\artifacts\bin\coreclr\windows.x64.Checked\corerun.exe

Expected behavior

Error message that suggests fatal GC error. No hangs or crashes.

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2025-02-15T16:19:00Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

jkotas · 2025-02-15T16:20:10Z

cc @janvorli

Can we get the exception handling out of the way when we hit breakpoints in the GC, so that these fatal errors crash cleanly?

The new exception handling doesn't work well with the DebugBreak in some cases. E.g. when it is invoked from FATAL_GC_ERROR. The new EH attempts to handle the STATUS_BREAKPOINT stemming from the DebugBreak, allocate a managed exception object and hangs since it cannot do that when the GC is running. The cause is a missing check for the breakpoint exception in the ProcessCLRExceptionNew that is present in the old ProcessCLRException. To fix it, I've copied that code to the ProcessCLRExceptionNew. Close dotnet#112599

* Fix new EH hang on DebugBreak The new exception handling doesn't work well with the DebugBreak in some cases. E.g. when it is invoked from FATAL_GC_ERROR. The new EH attempts to handle the STATUS_BREAKPOINT stemming from the DebugBreak, allocate a managed exception object and hangs since it cannot do that when the GC is running. The cause is a missing check for the breakpoint exception in the ProcessCLRExceptionNew that is present in the old ProcessCLRException. To fix it, I've copied that code to the ProcessCLRExceptionNew. Close #112599 * Never process breakpoints via the new EH

dotnet-issue-labeler bot added the area-GC-coreclr label Feb 15, 2025

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Feb 15, 2025

jkotas added area-ExceptionHandling-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Feb 15, 2025

jkotas added this to the 10.0.0 milestone Feb 15, 2025

mangod9 assigned janvorli Feb 15, 2025

janvorli mentioned this issue Feb 17, 2025

Fix new EH hang on DebugBreak #112640

Merged

dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Feb 17, 2025

jkotas closed this as completed in #112640 Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FATAL_GC_ERROR produces hard to diagnose hangs or crashes #112599

FATAL_GC_ERROR produces hard to diagnose hangs or crashes #112599

jkotas commented Feb 15, 2025

dotnet-policy-service bot commented Feb 15, 2025

jkotas commented Feb 15, 2025

FATAL_GC_ERROR produces hard to diagnose hangs or crashes #112599

FATAL_GC_ERROR produces hard to diagnose hangs or crashes #112599

Comments

jkotas commented Feb 15, 2025

Repro

Actual behavior

Expected behavior

dotnet-policy-service bot commented Feb 15, 2025

jkotas commented Feb 15, 2025