Hang during shutdown with Native AOT on IReferenceTrackerHost::ReleaseDisconnectedReferenceSources waiting for finalizers #109538

Sergio0694 · 2024-11-05T08:22:59Z

Description

We're hitting a 100% consistent hang during application shutdown, only on Native AOT. It seems that the finalizer thread and the UI thread (ASTA) are possibly in a deadlock, resulting in the application process remaining alive after closing the main window. After a few seconds, Windows proceeds to kill the process, which shows up in WER as a hang (which is expected). Only repros with Native AOT, whereas CoreCLR seems to work fine.

Reproduction Steps

I don't have a minimal repro. Please ping me on Teams for instructions on how to deploy the Store locally to repro. Alternatively, I can also share an MSIX package for sideloading, with instructions on how to install it for testing (and how to restore the retail Store after that).

Here is a memory dump on the process during the hang (process was paused from WinDbg on the presumed deadlock).

Expected behavior

The application should shutdown correctly when closing the window.

Actual behavior

Here's the two relevant stacktraces I see in WinDbg.

Finalizer thread (!FinalizerStart) (click to expand)

[0x0]   ntdll!ZwWaitForMultipleObjects+0x14
[0x1]   KERNELBASE!WaitForMultipleObjectsEx+0xe9
[0x2]   combase!MTAThreadWaitForCall+0xfb
[0x3]   combase!MTAThreadDispatchCrossApartmentCall+0x2bc
[0x4]   combase!CSyncClientCall::SwitchAptAndDispatchCall+0x707   (Inline Function)   (Inline Function)   
[0x5]   combase!CSyncClientCall::SendReceive2+0x825
[0x6]   combase!SyncClientCallRetryContext::SendReceiveWithRetry+0x2f   (Inline Function)   (Inline Function)   
[0x7]   combase!CSyncClientCall::SendReceiveInRetryContext+0x2f   (Inline Function)   (Inline Function)   
[0x8]   combase!DefaultSendReceive+0x6e
[0x9]   combase!CSyncClientCall::SendReceive+0x300
[0xa]   combase!CClientChannel::SendReceive+0x98
[0xb]   combase!NdrExtpProxySendReceive+0x58
[0xc]   RPCRT4!Ndr64pSendReceive+0x39   (Inline Function)   (Inline Function)   
[0xd]   RPCRT4!NdrpClientCall3+0x3de
[0xe]   combase!ObjectStublessClient+0x14c
[0xf]   combase!ObjectStubless+0x42
[0x10]   combase!CObjectContext::InternalContextCallback+0x2fd
[0x11]   combase!CObjectContext::ContextCallback+0x902   
[0x12]   <MICROSOFT_STORE>!WinRT_Runtime_ABI_WinRT_Interop_IContextCallbackVftbl__ContextCallback+0x102
[0x13]   <MICROSOFT_STORE>!WinRT_Runtime_WinRT_Context__CallInContext+0x87
[0x14]   <MICROSOFT_STORE>!WinRT_Runtime_WinRT_ObjectReferenceWithContext_1<WinRT_Runtime_WinRT_Interop_IUnknownVftbl>__Release+0x64
[0x15]   <MICROSOFT_STORE>!WinRT_Runtime_WinRT_IObjectReference__Dispose+0x5b
[0x16]   <MICROSOFT_STORE>!WinRT_Runtime_WinRT_IObjectReference__Finalize+0x17
[0x17]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime___Finalizer__DrainQueue+0x7a
[0x18]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime___Finalizer__ProcessFinalizers+0x47
[0x19]   <MICROSOFT_STORE>!FinalizerStart+0x56
[0x1a]   KERNEL32!BaseThreadInitThunk+0x1d
[0x1b]   ntdll!RtlUserThreadStart+0x28

UI thread (shcore!_WrapperThreadProc ApplicationView ASTA) (click to expand)

[0x0]   win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14
[...]
[0x5]   combase!CoWaitForMultipleHandles+0xc2
[0x6]   <MICROSOFT_STORE>!PalCompatibleWaitAny+0x63
[0x7]   <MICROSOFT_STORE>!CLREventStatic::Wait+0xc6
[0x8]   <MICROSOFT_STORE>!RhWaitForPendingFinalizers+0x90
[0x9]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime_RuntimeImports__RhWaitForPendingFinalizers+0x32
[0xa]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime_RuntimeImports__RhWaitForPendingFinalizers_0+0x21
[0xb]   <MICROSOFT_STORE>!S_P_CoreLib_System_GC__WaitForPendingFinalizers+0x1b
[0xc]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime_InteropServices_ComWrappers__IReferenceTrackerHost_ReleaseDisconnectedReferenceSources+0x24
[0xd]   Windows_UI_Xaml!DirectUI::ReferenceTrackerManager::TriggerFinalization+0x34
[...]

It seems that:

The window is closed
The lifecycle manager starts suspending the app
XAML decides it should finalize objects
ComWrappers::IReferenceTrackerHost::ReleaseDisconnectedReferenceSources is called
That in turn blocks on GC::WaitForPendingFinalizers
The finalizer thread gets to finalizing some ObjectReferenceWithContext<T> object
That object is from another context so it calls CallInContext (here)
That passes the callback function and invokes it in the target context (here)
That eventually just ends stuck on WaitForMultipleObjectsEx
The whole app just hangs until the OS just forcibly kills it

Some potentially relevant differences we noticed in the finalizer logic across CoreCLR (which works fine) and NativeAOT:

CoreCLR:

runtime/src/coreclr/vm/finalizerthread.cpp

Lines 493 to 548 in 302e0d4

    
           void FinalizerThread::FinalizerThreadWait() 
        
           { 
        
               ASSERT(hEventFinalizerDone->IsValid()); 
        
               ASSERT(hEventFinalizer->IsValid()); 
        
               ASSERT(GetFinalizerThread()); 
        
               // Can't call this from within a finalized method. 
        
               if (!IsCurrentThreadFinalizer()) 
        
               { 
        
                   // We may see a completion of finalization cycle that might not see objects that became 
        
                   // F-reachable in recent GCs. In such case we want to wait for a completion of another cycle. 
        
                   // However, since an object cannot be prevented from promoting, one can only rely on Full GCs 
        
                   // to collect unreferenced objects deterministically. Thus we only care about Full GCs here. 
        
                   int desiredFullGcCount = 
        
                       GCHeapUtilities::GetGCHeap()->CollectionCount(GCHeapUtilities::GetGCHeap()->GetMaxGeneration()); 
        
                   GCX_PREEMP(); 
        
           #ifdef FEATURE_COMINTEROP 
        
                   // To help combat finalizer thread starvation, we check to see if there are any wrappers 
        
                   // scheduled to be cleaned up for our context.  If so, we'll do them here to avoid making 
        
                   // the finalizer thread do a transition. 
        
                   if (g_pRCWCleanupList != NULL) 
        
                       g_pRCWCleanupList->CleanupWrappersInCurrentCtxThread(); 
        
           #endif // FEATURE_COMINTEROP 
        
               tryAgain: 
        
                   hEventFinalizerDone->Reset(); 
        
                   EnableFinalization(); 
        
                   // Under GC stress the finalizer queue may never go empty as frequent 
        
                   // GCs will keep filling up the queue with items. 
        
                   // We will disable GC stress to make sure the current thread is not permanently blocked on that. 
        
                   GCStressPolicy::InhibitHolder iholder; 
        
                   //---------------------------------------------------- 
        
                   // Do appropriate wait and pump messages if necessary 
        
                   //---------------------------------------------------- 
        
                   DWORD status; 
        
                   status = hEventFinalizerDone->Wait(INFINITE,TRUE); 
        
                   // we use unsigned math here as the collection counts, which are size_t internally, 
        
                   // can in theory overflow an int and wrap around. 
        
                   // unsigned math would have more defined/portable behavior in such case 
        
                   if ((int)((unsigned int)desiredFullGcCount - (unsigned int)g_fullGcCountSeenByFinalization) > 0) 
        
                   { 
        
                       // There were some Full GCs happening before we started waiting and possibly not seen by the 
        
                       // last finalization cycle. This is rare, but we need to be sure we have seen those, 
        
                       // so we try one more time. 
        
                       goto tryAgain; 
        
                   } 
        
                   _ASSERTE(status == WAIT_OBJECT_0); 
        
               } 
        
           }

Native AOT:

runtime/src/coreclr/nativeaot/Runtime/FinalizerHelpers.cpp

Lines 115 to 151 in 302e0d4

    
           EXTERN_C void QCALLTYPE RhWaitForPendingFinalizers(UInt32_BOOL allowReentrantWait) 
        
           { 
        
               // This must be called via p/invoke rather than RuntimeImport since it blocks and could starve the GC if 
        
               // called in cooperative mode. 
        
               ASSERT(!ThreadStore::GetCurrentThread()->IsCurrentThreadInCooperativeMode()); 
        
               // Can't call this from the finalizer thread itself. 
        
               if (ThreadStore::GetCurrentThread() != g_pFinalizerThread) 
        
               { 
        
                   // We may see a completion of finalization cycle that might not see objects that became 
        
                   // F-reachable in recent GCs. In such case we want to wait for a completion of another cycle. 
        
                   // However, since an object cannot be prevented from promoting, one can only rely on Full GCs 
        
                   // to collect unreferenced objects deterministically. Thus we only care about Full GCs here. 
        
                   int desiredFullGcCount = 
        
                       GCHeapUtilities::GetGCHeap()->CollectionCount(GCHeapUtilities::GetGCHeap()->GetMaxGeneration()); 
        
               tryAgain: 
        
                   // Clear any current indication that a finalization pass is finished and wake the finalizer thread up 
        
                   // (if there's no work to do it'll set the done event immediately). 
        
                   g_FinalizerDoneEvent.Reset(); 
        
                   g_FinalizerEvent.Set(); 
        
                   // Wait for the finalizer thread to get back to us. 
        
                   g_FinalizerDoneEvent.Wait(INFINITE, false, allowReentrantWait); 
        
                   // we use unsigned math here as the collection counts, which are size_t internally, 
        
                   // can in theory overflow an int and wrap around. 
        
                   // unsigned math would have more defined/portable behavior in such case 
        
                   if ((int)((unsigned int)desiredFullGcCount - (unsigned int)g_fullGcCountSeenByFinalization) > 0) 
        
                   { 
        
                       // There were some Full GCs happening before we started waiting and possibly not seen by the 
        
                       // last finalization cycle. This is rare, but we need to be sure we have seen those, 
        
                       // so we try one more time. 
        
                       goto tryAgain; 
        
                   } 
        
               } 
        
           }

It seems that they're similar, however:

CoreCLR passes alertable: TRUE
Native AOT passes alertable: FALSE
NativeAOT also passes the allowReentrantWait: TRUE param, which makes it call CoWaitForMultipleHandles here

Not sure whether that's intentional (why?) and whether it's related to the issue, just something we noticed.

Regression?

No.

Known Workarounds

None, this is a blocker 😅

Configuration

VS 17.12 P5
.NET 9 RC2

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-11-05T08:23:29Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Sergio0694 · 2024-11-05T08:24:26Z

cc. @AaronRobinsonMSFT @jkoritzinsky perhaps you might also have thoughts on this or some knowledge to share? 😄

agocke · 2024-11-05T19:36:40Z

Also + @VSadov in case this could be related to suspension

MichalStrehovsky · 2024-11-06T14:25:19Z

Also + @VSadov in case this could be related to suspension

It's not related to suspension, but definitely an area where expertise from @VSadov would help. Looks like native AOT will just wait on an event using a OS API and that's all there's to it. CoreCLR does a lot more to figure out how exactly to wait (Does the thread have a SynchronizationContext? Do we need to pump window messages with MsgWaitForMultipleObjects? etc.). As usual, this is a place that is a lot more complicated in CoreCLR VM and it's hard to say which part is relevant and how much of it we need to implement.

runtime/src/coreclr/vm/threads.cpp

Lines 3201 to 3484 in 3116db9

    
           DWORD Thread::DoAppropriateWaitWorker(int countHandles, HANDLE *handles, BOOL waitAll, 
        
                                                 DWORD millis, WaitMode mode, void *associatedObjectForMonitorWait) 
        
           { 
        
               CONTRACTL { 
        
                   THROWS; 
        
                   GC_TRIGGERS; 
        
               } 
        
               CONTRACTL_END; 
        
               DWORD ret = 0; 
        
               BOOL alertable = (mode & WaitMode_Alertable) != 0; 
        
               // Waits from SynchronizationContext.WaitHelper are always just WaitMode_IgnoreSyncCtx. 
        
               // So if we defer to a sync ctx, we will lose any extra bits.  We must therefore not 
        
               // defer to a sync ctx if doing any non-default wait. 
        
               // If you're doing a default wait, but want to ignore sync ctx, specify WaitMode_IgnoreSyncCtx 
        
               // which will make mode != WaitMode_Alertable. 
        
               BOOL ignoreSyncCtx = (mode != WaitMode_Alertable); 
        
               if (AppDomain::GetCurrentDomain()->MustForceTrivialWaitOperations()) 
        
                   ignoreSyncCtx = TRUE; 
        
               // Unless the ignoreSyncCtx flag is set, first check to see if there is a synchronization 
        
               // context on the current thread and if there is, dispatch to it to do the wait. 
        
               // If  the wait is non alertable we cannot forward the call to the sync context 
        
               // since fundamental parts of the system (such as the GC) rely on non alertable 
        
               // waits not running any managed code. Also if we are past the point in shutdown were we 
        
               // are allowed to run managed code then we can't forward the call to the sync context. 
        
               if (!ignoreSyncCtx 
        
                   && alertable 
        
                   && !HasThreadStateNC(Thread::TSNC_BlockedForShutdown)) 
        
               { 
        
                   GCX_COOP(); 
        
                   BOOL fSyncCtxPresent = FALSE; 
        
                   OBJECTREF SyncCtxObj = NULL; 
        
                   GCPROTECT_BEGIN(SyncCtxObj) 
        
                   { 
        
                       GetSynchronizationContext(&SyncCtxObj); 
        
                       if (SyncCtxObj != NULL) 
        
                       { 
        
                           SYNCHRONIZATIONCONTEXTREF syncRef = (SYNCHRONIZATIONCONTEXTREF)SyncCtxObj; 
        
                           if (syncRef->IsWaitNotificationRequired()) 
        
                           { 
        
                               fSyncCtxPresent = TRUE; 
        
                               ret = DoSyncContextWait(&SyncCtxObj, countHandles, handles, waitAll, millis); 
        
                           } 
        
                       } 
        
                   } 
        
                   GCPROTECT_END(); 
        
                   if (fSyncCtxPresent) 
        
                       return ret; 
        
               } 
        
               // Before going to pre-emptive mode the thread needs to be flagged as waiting for 
        
               // the debugger. This used to be accomplished by the TS_Interruptible flag but that 
        
               // doesn't work reliably, see DevDiv Bugs 699245. Some methods call in here already in 
        
               // COOP mode so we set the bit before the transition. For the calls that are already 
        
               // in pre-emptive mode those are still buggy. This is only a partial fix. 
        
               BOOL isCoop = PreemptiveGCDisabled(); 
        
               ThreadStateNCStackHolder tsNC(isCoop && alertable, TSNC_DebuggerSleepWaitJoin); 
        
               GCX_PREEMP(); 
        
               if (alertable) 
        
               { 
        
                   DoAppropriateWaitWorkerAlertableHelper(mode); 
        
               } 
        
               StateHolder<MarkOSAlertableWait,UnMarkOSAlertableWait> OSAlertableWait(alertable); 
        
               ThreadStateHolder tsh(alertable, TS_Interruptible | TS_Interrupted); 
        
               bool sendWaitEvents = 
        
                   millis != 0 && 
        
                   (mode & WaitMode_Alertable) != 0 && 
        
                   ETW_TRACING_CATEGORY_ENABLED( 
        
                       MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context, 
        
                       TRACE_LEVEL_VERBOSE, 
        
                       CLR_WAITHANDLE_KEYWORD); 
        
               // Monitor.Wait is typically a blocking wait. For other waits, when sending the wait events try a nonblocking wait first 
        
               // such that the events sent are more likely to represent blocking waits. 
        
               bool tryNonblockingWaitFirst = sendWaitEvents && associatedObjectForMonitorWait == NULL; 
        
               if (tryNonblockingWaitFirst) 
        
               { 
        
                   ret = DoAppropriateAptStateWait(countHandles, handles, waitAll, 0 /* timeout */, mode); 
        
                   if (ret == WAIT_TIMEOUT) 
        
                   { 
        
                       // Do a full wait and send the wait events 
        
                       tryNonblockingWaitFirst = false; 
        
                   } 
        
                   else if (ret != WAIT_IO_COMPLETION && ret != WAIT_FAILED) 
        
                   { 
        
                       // The nonblocking wait was successful, don't send the wait events 
        
                       sendWaitEvents = false; 
        
                   } 
        
               } 
        
               if (sendWaitEvents) 
        
               { 
        
                   if (associatedObjectForMonitorWait != NULL) 
        
                   { 
        
                       FireEtwWaitHandleWaitStart( 
        
                           ETW::WaitHandleLog::WaitHandleStructs::MonitorWait, 
        
                           associatedObjectForMonitorWait, 
        
                           GetClrInstanceId()); 
        
                   } 
        
                   else 
        
                   { 
        
                       FireEtwWaitHandleWaitStart(ETW::WaitHandleLog::WaitHandleStructs::Unknown, NULL, GetClrInstanceId()); 
        
                   } 
        
               } 
        
               ULONGLONG dwStart = 0, dwEnd; 
        
           retry: 
        
               if (millis != INFINITE) 
        
               { 
        
                   dwStart = CLRGetTickCount64(); 
        
               } 
        
               if (tryNonblockingWaitFirst) 
        
               { 
        
                   // We have a final wait result from the nonblocking wait above 
        
                   tryNonblockingWaitFirst = false; 
        
               } 
        
               else 
        
               { 
        
                   ret = DoAppropriateAptStateWait(countHandles, handles, waitAll, millis, mode); 
        
               } 
        
               if (ret == WAIT_IO_COMPLETION) 
        
               { 
        
                   _ASSERTE (alertable); 
        
                   if (m_State & TS_Interrupted) 
        
                   { 
        
                       HandleThreadInterrupt(); 
        
                   } 
        
                   // We could be woken by some spurious APC or an EE APC queued to 
        
                   // interrupt us. In the latter case the TS_Interrupted bit will be set 
        
                   // in the thread state bits. Otherwise we just go back to sleep again. 
        
                   if (millis != INFINITE) 
        
                   { 
        
                       dwEnd = CLRGetTickCount64(); 
        
                       if (dwEnd - dwStart >= millis) 
        
                       { 
        
                           ret = WAIT_TIMEOUT; 
        
                           goto WaitCompleted; 
        
                       } 
        
                       else 
        
                       { 
        
                           millis -= (DWORD)(dwEnd - dwStart); 
        
                       } 
        
                   } 
        
                   goto retry; 
        
               } 
        
               _ASSERTE((ret >= WAIT_OBJECT_0  && ret < (WAIT_OBJECT_0  + (DWORD)countHandles)) || 
        
                        (ret >= WAIT_ABANDONED && ret < (WAIT_ABANDONED + (DWORD)countHandles)) || 
        
                        (ret == WAIT_TIMEOUT) || (ret == WAIT_FAILED)); 
        
               // countHandles is used as an unsigned -- it should never be negative. 
        
               _ASSERTE(countHandles >= 0); 
        
               // We support precisely one WAIT_FAILED case, where we attempt to wait on a 
        
               // thread handle and the thread is in the process of dying we might get a 
        
               // invalid handle substatus. Turn this into a successful wait. 
        
               // There are three cases to consider: 
        
               //  1)  Only waiting on one handle: return success right away. 
        
               //  2)  Waiting for all handles to be signalled: retry the wait without the 
        
               //      affected handle. 
        
               //  3)  Waiting for one of multiple handles to be signalled: return with the 
        
               //      first handle that is either signalled or has become invalid. 
        
               if (ret == WAIT_FAILED) 
        
               { 
        
                   DWORD errorCode = ::GetLastError(); 
        
                   if (errorCode == ERROR_INVALID_PARAMETER) 
        
                   { 
        
                       if (CheckForDuplicateHandles(countHandles, handles)) 
        
                           COMPlusThrow(kDuplicateWaitObjectException); 
        
                       else 
        
                           COMPlusThrowHR(HRESULT_FROM_WIN32(errorCode)); 
        
                   } 
        
                   else if (errorCode == ERROR_ACCESS_DENIED) 
        
                   { 
        
                       // A Win32 ACL could prevent us from waiting on the handle. 
        
                       COMPlusThrow(kUnauthorizedAccessException); 
        
                   } 
        
                   else if (errorCode == ERROR_NOT_ENOUGH_MEMORY) 
        
                   { 
        
                       ThrowOutOfMemory(); 
        
                   } 
        
           #ifdef TARGET_UNIX 
        
                   else if (errorCode == ERROR_NOT_SUPPORTED) 
        
                   { 
        
                       // "Wait for any" and "wait for all" operations on multiple wait handles are not supported when a cross-process sync 
        
                       // object is included in the array 
        
                       COMPlusThrow(kPlatformNotSupportedException, W("PlatformNotSupported_NamedSyncObjectWaitAnyWaitAll")); 
        
                   } 
        
           #endif 
        
                   else if (errorCode != ERROR_INVALID_HANDLE) 
        
                   { 
        
                       ThrowWin32(errorCode); 
        
                   } 
        
                   if (countHandles == 1) 
        
                       ret = WAIT_OBJECT_0; 
        
                   else if (waitAll) 
        
                   { 
        
                       // Probe all handles with a timeout of zero. When we find one that's 
        
                       // invalid, move it out of the list and retry the wait. 
        
                       for (int i = 0; i < countHandles; i++) 
        
                       { 
        
                           // WaitForSingleObject won't pump memssage; we already probe enough space 
        
                           // before calling this function and we don't want to fail here, so we don't 
        
                           // do a transition to tolerant code here 
        
                           DWORD subRet = WaitForSingleObject (handles[i], 0); 
        
                           if (subRet != WAIT_FAILED) 
        
                               continue; 
        
                           _ASSERTE(::GetLastError() == ERROR_INVALID_HANDLE); 
        
                           if ((countHandles - i - 1) > 0) 
        
                               memmove(&handles[i], &handles[i+1], (countHandles - i - 1) * sizeof(HANDLE)); 
        
                           countHandles--; 
        
                           break; 
        
                       } 
        
                       // Compute the new timeout value by assume that the timeout 
        
                       // is not large enough for more than one wrap 
        
                       dwEnd = CLRGetTickCount64(); 
        
                       if (millis != INFINITE) 
        
                       { 
        
                           if (dwEnd - dwStart >= millis) 
        
                           { 
        
                               ret = WAIT_TIMEOUT; 
        
                               goto WaitCompleted; 
        
                           } 
        
                           else 
        
                           { 
        
                               millis -= (DWORD)(dwEnd - dwStart); 
        
                           } 
        
                       } 
        
                       goto retry; 
        
                   } 
        
                   else 
        
                   { 
        
                       // Probe all handles with a timeout as zero, succeed with the first 
        
                       // handle that doesn't timeout. 
        
                       ret = WAIT_OBJECT_0; 
        
                       int i; 
        
                       for (i = 0; i < countHandles; i++) 
        
                       { 
        
                       TryAgain: 
        
                           // WaitForSingleObject won't pump memssage; we already probe enough space 
        
                           // before calling this function and we don't want to fail here, so we don't 
        
                           // do a transition to tolerant code here 
        
                           DWORD subRet = WaitForSingleObject (handles[i], 0); 
        
                           if ((subRet == WAIT_OBJECT_0) || (subRet == WAIT_FAILED)) 
        
                               break; 
        
                           if (subRet == WAIT_ABANDONED) 
        
                           { 
        
                               ret = (ret - WAIT_OBJECT_0) + WAIT_ABANDONED; 
        
                               break; 
        
                           } 
        
                           // If we get alerted it just masks the real state of the current 
        
                           // handle, so retry the wait. 
        
                           if (subRet == WAIT_IO_COMPLETION) 
        
                               goto TryAgain; 
        
                           _ASSERTE(subRet == WAIT_TIMEOUT); 
        
                           ret++; 
        
                       } 
        
                   } 
        
               } 
        
           WaitCompleted: 
        
               _ASSERTE((ret != WAIT_TIMEOUT) || (millis != INFINITE)); 
        
               if (sendWaitEvents) 
        
               { 
        
                   FireEtwWaitHandleWaitStop(GetClrInstanceId()); 
        
               } 
        
               return ret; 
        
           }

VSadov · 2024-11-06T20:39:05Z

I think waiting with "alertable" is to allow a sleeping/waiting thread to react to Thread.Interrupt().
However the Thread.Interrupt() is not supported on NativeAOT

runtime/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Thread.NativeAot.Windows.cs

Line 402 in 6d23ef4

public void Interrupt() { throw new PlatformNotSupportedException(); }

I suspect the culprit is something with COM and message pumping, but it looks like we call CoWaitForMultipleHandles, which should do that.

jkotas · 2024-11-06T21:11:32Z

CoreCLR may be doing the wait via user installed synchronization context. It would be useful to find out the callstack of the wait in CoreCLR.

manodasanW · 2024-11-06T21:27:30Z

I did have a chat with some COM folks. On ASTA threads such as this one, CoWaitForMultipleHandles doesn't pump COM messages unless COWAIT_DISPATCH_CALLS is passed as the flag. But that isn't passed in the CoreClr case here either. So that might explain why it is waiting but doesn't explain why CoreClr doesn't hit this issue.

My initial guess was the CleanupWrappersInCurrentCtxThread call was doing some cleanup for the ASTA scenarios before it waited, but doing some debugging of the CoreCLR scenario, it seems it isn't as there is a conditional that makes it used in certain scenarios which isn't set here. And from my debugging, it seems we are doing the same wait on CoreClr similar to AOT, but with alertable set. Not sure if that is somehow making us get lucky to not hitting this issue due to other APC calls happening which is what that seems to control.

jkotas · 2024-11-06T21:35:29Z

Not sure if that is somehow making us get lucky to not hitting this issue due to other APC calls happening

You can build a local native AOT package that changes this to alertable=true and try to repro it with that to prove or disprove this hypothesis.

MichalStrehovsky · 2024-11-06T23:02:04Z

Not sure if that is somehow making us get lucky to not hitting this issue due to other APC calls happening

You can build a local native AOT package that changes this to alertable=true and try to repro it with that to prove or disprove this hypothesis.

If there's something you'd like to try, here are the accelerated steps:

Clone the runtime repo, make sure you have the build prerequisites (CMake, VS with C++ workload, Python)
CD to the clone of the repo
Assuming you're targeting 9.0 with an RC2 SDK: git checkout release/9.0-rc2
build.cmd clr.aot -c Release

Then publish your project with native AOT as usual (might want to delete all of bin/obj before since this is not incremental), but set IlcSdkPath property like this <IlcSdkPath>{REPO_PATH}\artifacts\bin\coreclr\windows.x64.Release\aotsdk\</IlcSdkPath>, replacing {REPO_PATH} with where you cloned the runtime repo. This will pick up your build of runtime/corelib. The nice thing about this is that once you do this, you can set breakpoints and debug within the code. You should also be able to pass -c Debug to the build.cmd invocation to build the debug version of the runtime (it will be dropped to a similar path under artifacts) and use that instead - it's easier to debug.

MichalStrehovsky · 2024-11-22T05:50:35Z

This is not native AOT specific.

I stepped through CoreCLR VM version of this. We end up taking a path where we wait like this:

>	coreclr.dll!MsgWaitHelper(int numWaiters, void * * phEvent, int bWaitAll, unsigned long millis, int bAlertable) Line 3140	C++
 	coreclr.dll!Thread::DoAppropriateAptStateWait(int numWaiters, void * * pHandles, int bWaitAll, unsigned long timeout, WaitMode mode) Line 3178	C++
 	coreclr.dll!Thread::DoAppropriateWaitWorker(int countHandles, void * * handles, int waitAll, unsigned long millis, WaitMode mode, void * associatedObjectForMonitorWait) Line 3363	C++
 	coreclr.dll!Thread::DoAppropriateWait(int countHandles, void * * handles, int waitAll, unsigned long millis, WaitMode mode, PendingSync * syncState) Line 3032	C++
 	[Inline Frame] coreclr.dll!CLREventBase::WaitEx(unsigned long) Line 459	C++
 	coreclr.dll!CLREventBase::Wait(unsigned long dwMilliseconds, int alertable, PendingSync * syncState) Line 413	C++
 	coreclr.dll!FinalizerThread::FinalizerThreadWait() Line 599	C++
 	coreclr.dll!InteropLibImports::WaitForRuntimeFinalizerForExternal() Line 1148	C++
 	[Inline Frame] Windows.UI.Xaml.dll!DirectUI::ReferenceTrackerManager::TriggerFinalization() Line 350	C++
 	Windows.UI.Xaml.dll!DirectUI::DXamlCore::OnAfterAppSuspend() Line 4155	C++
 	Windows.UI.Xaml.dll!XAML::PLM::PLMHandler::InvokeAfterAppSuspendCallback() Line 431	C++
 	Windows.UI.Xaml.dll!XAML::PLM::PLMHandler::DecrementAppSuspendActivityCount() Line 226	C++

There's a difference between native AOT and CoreCLR - on CoreCLR we pass COWAIT_ALERTABLE, on native AOT we don't.

In the end it doesn't make a difference because they both deadlock.

agocke · 2024-12-05T19:17:31Z

Who has the next step here? If this is possible to hit in CoreCLR as well, is there a latent bug in this code? Do we know why we're hitting it more frequently in Native AOT?

Sergio0694 · 2024-12-05T19:28:53Z

"If this is possible to hit in CoreCLR as well, is there a latent bug in this code?"

Yeah @manodasanW could consistently repro the same issue in CoreCLR as well, it seems this is not unique to NativeAOT. They're both broken in ASTA scenarios it would seem. Unclear why this is more consistent with NativeAOT. My understanding is the next steps are to debug the app shutdown with either .NETCore.Uwp or .NET Native (likely .NET Native) and compare traces and see if it's doing anything significantly different, and if so, figure out why that was changed. One hypothesis is that perhaps old .NET and .NET Native were not running finalizers at all upon shutdown (that would seem unlikely though), or perhaps doing that but not doing context switches in that scenario.

There's also the additional caveat that this is technically not even a "shutdown" scenario, it's just PLM notifying the app that it's being suspended. This may or may not mean the app will then be closed, but it's not guaranteed. So not sure if there's even a way to tell. Checking what .NET Native was doing in an equivalent scenario should hopefully give us more information though 🤔

jkotas · 2024-12-05T20:31:02Z

My hypothesis is that this is a race condition that was always there, but that was made more likely to be hit due to some unrelated changes (or by things being faster in general).

MichalStrehovsky · 2024-12-09T08:25:54Z

The .NET Native side doesn't seem to be calling WaitForPendingFinalizers (the necessary condition for the hang). The closest I found is this: https://github.com/dotnet/corert/blob/a8830fe5158c499a75b19239524a6d1092a679fe/src/System.Private.Interop/src/Shared/RCWWalker.cs#L361-L369

Sergio0694 · 2024-12-09T17:15:20Z

That's quite surprising 👀
Two questions:

If .NET Native is actually doing that, I assume we could do the same in this scenario for now? Presumably, this does not cause leaks, but rather will just be lazier in waiting for finalizers upon suspension, meaning in the worst case scenario your app will just have a slightly higher memory use when suspended, and will only finish releasing things when the next GC runs the next time it is resumed? Do I have that right? Not even sure if this applies to Win32 apps given they don't support suspension at all anyway.
As a follow up, what's the right way to actually fix this? That comment makes me think that the .NET Native folks did think there was a way to properly implement this, and just didn't have time to do that for v1. What would that look like? 🤔

jkotas · 2024-12-09T18:01:56Z

As a follow up, what's the right way to actually fix this?

Do not block main app thread by waiting for finalizers. Wait for the finalizers on a dedicated thread so that the rest of the app can run undisturbed.

Sergio0694 · 2024-12-09T18:15:19Z

But isn't the issue that PLM is requesting the app to suspend and that's what's triggered that reference tracking callback, and we have to block to ensure the suspension event handler doesn't complete before we're done? Basically if we used a different thread to wait, would that be doing anything at all vs. just letting things finalize on their own? We wouldn't be able to signal back to Jupiter when that's done anyway since we'd no longer be inside that wait callback anymore, right? Am I missing something? 😅

jkotas · 2024-12-09T18:29:43Z

You can run a message pumping loop on the app thread that allows the remote calls through and ends when the finalization is finished. I am not sure whether the reference tracking callback is compatible with that. Is the reference tracking callback the only way for the app to get notified about pending suspend or is there a lower-level API that the app can subscribe to get notified? If there is a lower-level API like that, it may be a better option.

Sergio0694 · 2024-12-09T18:58:09Z

Mmh good question. @AaronRobinsonMSFT @jkoritzinsky @manodasanW any thoughts on this?
I can also ping some XAML/WASDK folks, they might be more familiar with this (eg. @jonwis @kmahone) 😄

Related question for Jon and Keith: the docs for IReferenceTrackerHost::ReleaseDisconnectedReferenceSources just say that the method is expected to get the host to " call IUnknown::Release on any reference tracker objects that have been disconnected by a reference source", yet .NET Native clearly doesn't seem to be doing that at all (since it no-ops). Was this way purely supposed to be an optimization to reclaim as much memory as possible before suspension, or are there any gotchas here?

Eg. would it be fine and/or better than .NET Native's approach to at least eg. call a blocking full GC here to try to reclaim as much memory as possible as a best effort, and then any pending finalizers would just finish running when/if the process is potentially resumed?

Sergio0694 · 2024-12-09T19:19:41Z

@jkotas would it be reasonable to do this:

PR into 9.0.x to comment out GC.WaitForPendingFinalizers() and match .NET Native
For .NET 10, we can look into a proper fix. Two possible scenarios:
- We figure out a good way to fix the issue entirely for everyone, and we just do that
- We add a feature switch to enable/disable waiting for finalizers there. Then we can just disable it in the UWP SDK.

Sergio0694 · 2024-12-09T19:34:23Z

Useful links:

ReferenceTrackerManager.cpp in WinUI (same impl as the OS). This calls that API.
This is the only actual use of this

_Check_return_ HRESULT DXamlCore::OnAfterAppSuspend()
{
    HRESULT hr = S_OK;

    // We request additional suspend time only for managed apps because
    // GC is an asynchronous operation that can be slow.
    if (ReferenceTrackerManager::HaveHost())
    {
        // Do a GC and wait for finalizers.
        IGNOREHR(ReferenceTrackerManager::TriggerCollectionForSuspend());
        IGNOREHR(ReferenceTrackerManager::TriggerFinalization());
    }

    // Flush the release queue
    // We make this call for both native and managed apps because the queue
    // is not guaranteed to be empty for native apps (for example, if some
    // objects were recently released off thread).
    ReleaseQueuedObjects();

    if (m_pControl)
    {
        IFC(m_pControl->GetCoreServices()->OnSuspend(false /* isTriggeredByResourceTimer */, true /* allowOfferResources */));
    }

Cleanup:
    return hr;
}

So it feels like (and the comments also say that), waiting for finalizers is just "recommended", not strictly required.

jkotas · 2024-12-09T20:54:29Z

PR into 9.0.x to comment out GC.WaitForPendingFinalizers() and match .NET Native

Sounds reasonable to me.

For .NET 10, we can look into a proper fix. Two possible scenarios:

I think that the proper fix may need to be in CsWinRT or Xaml (partially at least).

…tedReferenceSources' (#110551) This PR updates the IReferenceTrackerHost::ReleaseDisconnectedReferenceSources implementation for CoreCLR and NativeAOT to match what .NET Native was doing, and not wait for finalizers, to avoid deadlocks in ASTA scenarios (UWP). Finalizers will just continue running normally, and if the process is suspended (which can only happen on UWP anyway), they'll resume when the process is resumed later on. Worst case scenario this would only cause a non-optimal memory use cleanup before suspension (which is better than a deadlock anyway). Can be revisited and improved for .NET 10 if needed. Contributes to #109538

…tedReferenceSources' (dotnet#110551) This PR updates the IReferenceTrackerHost::ReleaseDisconnectedReferenceSources implementation for CoreCLR and NativeAOT to match what .NET Native was doing, and not wait for finalizers, to avoid deadlocks in ASTA scenarios (UWP). Finalizers will just continue running normally, and if the process is suspended (which can only happen on UWP anyway), they'll resume when the process is resumed later on. Worst case scenario this would only cause a non-optimal memory use cleanup before suspension (which is better than a deadlock anyway). Can be revisited and improved for .NET 10 if needed. Contributes to dotnet#109538

MichalStrehovsky · 2025-01-02T05:54:24Z

I'm going to close this since #110551 fixed this to work same as .NET Native and given:

I think that the proper fix may need to be in CsWinRT or Xaml (partially at least).

we don't know how the actual fix is supposed to look like (i.e. XAML/CsWinRT might need to define how this is actually supposed to work; we might even need new APIs or flags - who knows). There is no current known work on this within dotnet/runtime.

…tedReferenceSources' (dotnet#110551) This PR updates the IReferenceTrackerHost::ReleaseDisconnectedReferenceSources implementation for CoreCLR and NativeAOT to match what .NET Native was doing, and not wait for finalizers, to avoid deadlocks in ASTA scenarios (UWP). Finalizers will just continue running normally, and if the process is suspended (which can only happen on UWP anyway), they'll resume when the process is resumed later on. Worst case scenario this would only cause a non-optimal memory use cleanup before suspension (which is better than a deadlock anyway). Can be revisited and improved for .NET 10 if needed. Contributes to dotnet#109538

Sergio0694 added area-NativeAOT-coreclr partner-impact This issue impacts a partner who needs to be kept updated labels Nov 5, 2024

github-project-automation bot added this to AppModel Nov 5, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 5, 2024

agocke assigned MichalStrehovsky Nov 5, 2024

agocke added this to the 10.0.0 milestone Nov 5, 2024

agocke removed the untriaged New issue has not been triaged by the area owner label Nov 5, 2024

agocke modified the milestones: 10.0.0, 9.0.x Nov 5, 2024

am11 mentioned this issue Nov 8, 2024

Linking Native AOT for android complains about -fPIC #109341

Closed

Sergio0694 mentioned this issue Dec 9, 2024

Don't wait for finalizers in 'IReferenceTrackerHost::ReleaseDisconnectedReferenceSources' #110551

Merged

This was referenced Dec 16, 2024

Hang during AddRef/Release calls from GC callbacks for IReferenceTracker support #110747

Closed

Hang during 'ComWrappers.TryGetOrCreateObjectForComInstanceInternal()' on Native AOT #110823

Closed

MichalStrehovsky closed this as completed Jan 2, 2025

github-actions bot locked and limited conversation to collaborators Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hang during shutdown with Native AOT on IReferenceTrackerHost::ReleaseDisconnectedReferenceSources waiting for finalizers #109538

Hang during shutdown with Native AOT on IReferenceTrackerHost::ReleaseDisconnectedReferenceSources waiting for finalizers #109538

Sergio0694 commented Nov 5, 2024 •

edited

Loading

dotnet-policy-service bot commented Nov 5, 2024

Sergio0694 commented Nov 5, 2024

agocke commented Nov 5, 2024

MichalStrehovsky commented Nov 6, 2024

VSadov commented Nov 6, 2024

jkotas commented Nov 6, 2024

manodasanW commented Nov 6, 2024

jkotas commented Nov 6, 2024 •

edited

Loading

MichalStrehovsky commented Nov 6, 2024

MichalStrehovsky commented Nov 22, 2024

agocke commented Dec 5, 2024

Sergio0694 commented Dec 5, 2024

jkotas commented Dec 5, 2024

MichalStrehovsky commented Dec 9, 2024

Sergio0694 commented Dec 9, 2024

jkotas commented Dec 9, 2024

Sergio0694 commented Dec 9, 2024

jkotas commented Dec 9, 2024 •

edited

Loading

Sergio0694 commented Dec 9, 2024

Sergio0694 commented Dec 9, 2024

Sergio0694 commented Dec 9, 2024

jkotas commented Dec 9, 2024

MichalStrehovsky commented Jan 2, 2025

Hang during shutdown with Native AOT on IReferenceTrackerHost::ReleaseDisconnectedReferenceSources waiting for finalizers #109538

Hang during shutdown with Native AOT on IReferenceTrackerHost::ReleaseDisconnectedReferenceSources waiting for finalizers #109538

Comments

Sergio0694 commented Nov 5, 2024 • edited Loading

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

dotnet-policy-service bot commented Nov 5, 2024

Sergio0694 commented Nov 5, 2024

agocke commented Nov 5, 2024

MichalStrehovsky commented Nov 6, 2024

VSadov commented Nov 6, 2024

jkotas commented Nov 6, 2024

manodasanW commented Nov 6, 2024

jkotas commented Nov 6, 2024 • edited Loading

MichalStrehovsky commented Nov 6, 2024

MichalStrehovsky commented Nov 22, 2024

agocke commented Dec 5, 2024

Sergio0694 commented Dec 5, 2024

jkotas commented Dec 5, 2024

MichalStrehovsky commented Dec 9, 2024

Sergio0694 commented Dec 9, 2024

jkotas commented Dec 9, 2024

Sergio0694 commented Dec 9, 2024

jkotas commented Dec 9, 2024 • edited Loading

Sergio0694 commented Dec 9, 2024

Sergio0694 commented Dec 9, 2024

Sergio0694 commented Dec 9, 2024

jkotas commented Dec 9, 2024

MichalStrehovsky commented Jan 2, 2025

Sergio0694 commented Nov 5, 2024 •

edited

Loading

jkotas commented Nov 6, 2024 •

edited

Loading

jkotas commented Dec 9, 2024 •

edited

Loading