Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang during shutdown with Native AOT on IReferenceTrackerHost::ReleaseDisconnectedReferenceSources waiting for finalizers #109538

Closed
Sergio0694 opened this issue Nov 5, 2024 · 23 comments
Assignees
Labels
area-NativeAOT-coreclr partner-impact This issue impacts a partner who needs to be kept updated
Milestone

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented Nov 5, 2024

Description

We're hitting a 100% consistent hang during application shutdown, only on Native AOT. It seems that the finalizer thread and the UI thread (ASTA) are possibly in a deadlock, resulting in the application process remaining alive after closing the main window. After a few seconds, Windows proceeds to kill the process, which shows up in WER as a hang (which is expected). Only repros with Native AOT, whereas CoreCLR seems to work fine.

Reproduction Steps

I don't have a minimal repro. Please ping me on Teams for instructions on how to deploy the Store locally to repro. Alternatively, I can also share an MSIX package for sideloading, with instructions on how to install it for testing (and how to restore the retail Store after that).

Here is a memory dump on the process during the hang (process was paused from WinDbg on the presumed deadlock).

Expected behavior

The application should shutdown correctly when closing the window.

Actual behavior

Here's the two relevant stacktraces I see in WinDbg.

Finalizer thread (!FinalizerStart) (click to expand)
[0x0]   ntdll!ZwWaitForMultipleObjects+0x14
[0x1]   KERNELBASE!WaitForMultipleObjectsEx+0xe9
[0x2]   combase!MTAThreadWaitForCall+0xfb
[0x3]   combase!MTAThreadDispatchCrossApartmentCall+0x2bc
[0x4]   combase!CSyncClientCall::SwitchAptAndDispatchCall+0x707   (Inline Function)   (Inline Function)   
[0x5]   combase!CSyncClientCall::SendReceive2+0x825
[0x6]   combase!SyncClientCallRetryContext::SendReceiveWithRetry+0x2f   (Inline Function)   (Inline Function)   
[0x7]   combase!CSyncClientCall::SendReceiveInRetryContext+0x2f   (Inline Function)   (Inline Function)   
[0x8]   combase!DefaultSendReceive+0x6e
[0x9]   combase!CSyncClientCall::SendReceive+0x300
[0xa]   combase!CClientChannel::SendReceive+0x98
[0xb]   combase!NdrExtpProxySendReceive+0x58
[0xc]   RPCRT4!Ndr64pSendReceive+0x39   (Inline Function)   (Inline Function)   
[0xd]   RPCRT4!NdrpClientCall3+0x3de
[0xe]   combase!ObjectStublessClient+0x14c
[0xf]   combase!ObjectStubless+0x42
[0x10]   combase!CObjectContext::InternalContextCallback+0x2fd
[0x11]   combase!CObjectContext::ContextCallback+0x902   
[0x12]   <MICROSOFT_STORE>!WinRT_Runtime_ABI_WinRT_Interop_IContextCallbackVftbl__ContextCallback+0x102
[0x13]   <MICROSOFT_STORE>!WinRT_Runtime_WinRT_Context__CallInContext+0x87
[0x14]   <MICROSOFT_STORE>!WinRT_Runtime_WinRT_ObjectReferenceWithContext_1<WinRT_Runtime_WinRT_Interop_IUnknownVftbl>__Release+0x64
[0x15]   <MICROSOFT_STORE>!WinRT_Runtime_WinRT_IObjectReference__Dispose+0x5b
[0x16]   <MICROSOFT_STORE>!WinRT_Runtime_WinRT_IObjectReference__Finalize+0x17
[0x17]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime___Finalizer__DrainQueue+0x7a
[0x18]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime___Finalizer__ProcessFinalizers+0x47
[0x19]   <MICROSOFT_STORE>!FinalizerStart+0x56
[0x1a]   KERNEL32!BaseThreadInitThunk+0x1d
[0x1b]   ntdll!RtlUserThreadStart+0x28
UI thread (shcore!_WrapperThreadProc ApplicationView ASTA) (click to expand)
[0x0]   win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14
[...]
[0x5]   combase!CoWaitForMultipleHandles+0xc2
[0x6]   <MICROSOFT_STORE>!PalCompatibleWaitAny+0x63
[0x7]   <MICROSOFT_STORE>!CLREventStatic::Wait+0xc6
[0x8]   <MICROSOFT_STORE>!RhWaitForPendingFinalizers+0x90
[0x9]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime_RuntimeImports__RhWaitForPendingFinalizers+0x32
[0xa]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime_RuntimeImports__RhWaitForPendingFinalizers_0+0x21
[0xb]   <MICROSOFT_STORE>!S_P_CoreLib_System_GC__WaitForPendingFinalizers+0x1b
[0xc]   <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime_InteropServices_ComWrappers__IReferenceTrackerHost_ReleaseDisconnectedReferenceSources+0x24
[0xd]   Windows_UI_Xaml!DirectUI::ReferenceTrackerManager::TriggerFinalization+0x34
[...]

It seems that:

  • The window is closed
  • The lifecycle manager starts suspending the app
  • XAML decides it should finalize objects
  • ComWrappers::IReferenceTrackerHost::ReleaseDisconnectedReferenceSources is called
  • That in turn blocks on GC::WaitForPendingFinalizers
  • The finalizer thread gets to finalizing some ObjectReferenceWithContext<T> object
  • That object is from another context so it calls CallInContext (here)
  • That passes the callback function and invokes it in the target context (here)
  • That eventually just ends stuck on WaitForMultipleObjectsEx
  • The whole app just hangs until the OS just forcibly kills it

Some potentially relevant differences we noticed in the finalizer logic across CoreCLR (which works fine) and NativeAOT:

CoreCLR:

void FinalizerThread::FinalizerThreadWait()
{
ASSERT(hEventFinalizerDone->IsValid());
ASSERT(hEventFinalizer->IsValid());
ASSERT(GetFinalizerThread());
// Can't call this from within a finalized method.
if (!IsCurrentThreadFinalizer())
{
// We may see a completion of finalization cycle that might not see objects that became
// F-reachable in recent GCs. In such case we want to wait for a completion of another cycle.
// However, since an object cannot be prevented from promoting, one can only rely on Full GCs
// to collect unreferenced objects deterministically. Thus we only care about Full GCs here.
int desiredFullGcCount =
GCHeapUtilities::GetGCHeap()->CollectionCount(GCHeapUtilities::GetGCHeap()->GetMaxGeneration());
GCX_PREEMP();
#ifdef FEATURE_COMINTEROP
// To help combat finalizer thread starvation, we check to see if there are any wrappers
// scheduled to be cleaned up for our context. If so, we'll do them here to avoid making
// the finalizer thread do a transition.
if (g_pRCWCleanupList != NULL)
g_pRCWCleanupList->CleanupWrappersInCurrentCtxThread();
#endif // FEATURE_COMINTEROP
tryAgain:
hEventFinalizerDone->Reset();
EnableFinalization();
// Under GC stress the finalizer queue may never go empty as frequent
// GCs will keep filling up the queue with items.
// We will disable GC stress to make sure the current thread is not permanently blocked on that.
GCStressPolicy::InhibitHolder iholder;
//----------------------------------------------------
// Do appropriate wait and pump messages if necessary
//----------------------------------------------------
DWORD status;
status = hEventFinalizerDone->Wait(INFINITE,TRUE);
// we use unsigned math here as the collection counts, which are size_t internally,
// can in theory overflow an int and wrap around.
// unsigned math would have more defined/portable behavior in such case
if ((int)((unsigned int)desiredFullGcCount - (unsigned int)g_fullGcCountSeenByFinalization) > 0)
{
// There were some Full GCs happening before we started waiting and possibly not seen by the
// last finalization cycle. This is rare, but we need to be sure we have seen those,
// so we try one more time.
goto tryAgain;
}
_ASSERTE(status == WAIT_OBJECT_0);
}
}

Native AOT:

EXTERN_C void QCALLTYPE RhWaitForPendingFinalizers(UInt32_BOOL allowReentrantWait)
{
// This must be called via p/invoke rather than RuntimeImport since it blocks and could starve the GC if
// called in cooperative mode.
ASSERT(!ThreadStore::GetCurrentThread()->IsCurrentThreadInCooperativeMode());
// Can't call this from the finalizer thread itself.
if (ThreadStore::GetCurrentThread() != g_pFinalizerThread)
{
// We may see a completion of finalization cycle that might not see objects that became
// F-reachable in recent GCs. In such case we want to wait for a completion of another cycle.
// However, since an object cannot be prevented from promoting, one can only rely on Full GCs
// to collect unreferenced objects deterministically. Thus we only care about Full GCs here.
int desiredFullGcCount =
GCHeapUtilities::GetGCHeap()->CollectionCount(GCHeapUtilities::GetGCHeap()->GetMaxGeneration());
tryAgain:
// Clear any current indication that a finalization pass is finished and wake the finalizer thread up
// (if there's no work to do it'll set the done event immediately).
g_FinalizerDoneEvent.Reset();
g_FinalizerEvent.Set();
// Wait for the finalizer thread to get back to us.
g_FinalizerDoneEvent.Wait(INFINITE, false, allowReentrantWait);
// we use unsigned math here as the collection counts, which are size_t internally,
// can in theory overflow an int and wrap around.
// unsigned math would have more defined/portable behavior in such case
if ((int)((unsigned int)desiredFullGcCount - (unsigned int)g_fullGcCountSeenByFinalization) > 0)
{
// There were some Full GCs happening before we started waiting and possibly not seen by the
// last finalization cycle. This is rare, but we need to be sure we have seen those,
// so we try one more time.
goto tryAgain;
}
}
}

It seems that they're similar, however:

  • CoreCLR passes alertable: TRUE
  • Native AOT passes alertable: FALSE
  • NativeAOT also passes the allowReentrantWait: TRUE param, which makes it call CoWaitForMultipleHandles here

Not sure whether that's intentional (why?) and whether it's related to the issue, just something we noticed.

Regression?

No.

Known Workarounds

None, this is a blocker 😅

Configuration

  • VS 17.12 P5
  • .NET 9 RC2
@Sergio0694 Sergio0694 added area-NativeAOT-coreclr partner-impact This issue impacts a partner who needs to be kept updated labels Nov 5, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 5, 2024
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@Sergio0694
Copy link
Contributor Author

cc. @AaronRobinsonMSFT @jkoritzinsky perhaps you might also have thoughts on this or some knowledge to share? 😄

@agocke
Copy link
Member

agocke commented Nov 5, 2024

Also + @VSadov in case this could be related to suspension

@agocke agocke added this to the 10.0.0 milestone Nov 5, 2024
@agocke agocke removed the untriaged New issue has not been triaged by the area owner label Nov 5, 2024
@agocke agocke modified the milestones: 10.0.0, 9.0.x Nov 5, 2024
@MichalStrehovsky
Copy link
Member

Also + @VSadov in case this could be related to suspension

It's not related to suspension, but definitely an area where expertise from @VSadov would help. Looks like native AOT will just wait on an event using a OS API and that's all there's to it. CoreCLR does a lot more to figure out how exactly to wait (Does the thread have a SynchronizationContext? Do we need to pump window messages with MsgWaitForMultipleObjects? etc.). As usual, this is a place that is a lot more complicated in CoreCLR VM and it's hard to say which part is relevant and how much of it we need to implement.

DWORD Thread::DoAppropriateWaitWorker(int countHandles, HANDLE *handles, BOOL waitAll,
DWORD millis, WaitMode mode, void *associatedObjectForMonitorWait)
{
CONTRACTL {
THROWS;
GC_TRIGGERS;
}
CONTRACTL_END;
DWORD ret = 0;
BOOL alertable = (mode & WaitMode_Alertable) != 0;
// Waits from SynchronizationContext.WaitHelper are always just WaitMode_IgnoreSyncCtx.
// So if we defer to a sync ctx, we will lose any extra bits. We must therefore not
// defer to a sync ctx if doing any non-default wait.
// If you're doing a default wait, but want to ignore sync ctx, specify WaitMode_IgnoreSyncCtx
// which will make mode != WaitMode_Alertable.
BOOL ignoreSyncCtx = (mode != WaitMode_Alertable);
if (AppDomain::GetCurrentDomain()->MustForceTrivialWaitOperations())
ignoreSyncCtx = TRUE;
// Unless the ignoreSyncCtx flag is set, first check to see if there is a synchronization
// context on the current thread and if there is, dispatch to it to do the wait.
// If the wait is non alertable we cannot forward the call to the sync context
// since fundamental parts of the system (such as the GC) rely on non alertable
// waits not running any managed code. Also if we are past the point in shutdown were we
// are allowed to run managed code then we can't forward the call to the sync context.
if (!ignoreSyncCtx
&& alertable
&& !HasThreadStateNC(Thread::TSNC_BlockedForShutdown))
{
GCX_COOP();
BOOL fSyncCtxPresent = FALSE;
OBJECTREF SyncCtxObj = NULL;
GCPROTECT_BEGIN(SyncCtxObj)
{
GetSynchronizationContext(&SyncCtxObj);
if (SyncCtxObj != NULL)
{
SYNCHRONIZATIONCONTEXTREF syncRef = (SYNCHRONIZATIONCONTEXTREF)SyncCtxObj;
if (syncRef->IsWaitNotificationRequired())
{
fSyncCtxPresent = TRUE;
ret = DoSyncContextWait(&SyncCtxObj, countHandles, handles, waitAll, millis);
}
}
}
GCPROTECT_END();
if (fSyncCtxPresent)
return ret;
}
// Before going to pre-emptive mode the thread needs to be flagged as waiting for
// the debugger. This used to be accomplished by the TS_Interruptible flag but that
// doesn't work reliably, see DevDiv Bugs 699245. Some methods call in here already in
// COOP mode so we set the bit before the transition. For the calls that are already
// in pre-emptive mode those are still buggy. This is only a partial fix.
BOOL isCoop = PreemptiveGCDisabled();
ThreadStateNCStackHolder tsNC(isCoop && alertable, TSNC_DebuggerSleepWaitJoin);
GCX_PREEMP();
if (alertable)
{
DoAppropriateWaitWorkerAlertableHelper(mode);
}
StateHolder<MarkOSAlertableWait,UnMarkOSAlertableWait> OSAlertableWait(alertable);
ThreadStateHolder tsh(alertable, TS_Interruptible | TS_Interrupted);
bool sendWaitEvents =
millis != 0 &&
(mode & WaitMode_Alertable) != 0 &&
ETW_TRACING_CATEGORY_ENABLED(
MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context,
TRACE_LEVEL_VERBOSE,
CLR_WAITHANDLE_KEYWORD);
// Monitor.Wait is typically a blocking wait. For other waits, when sending the wait events try a nonblocking wait first
// such that the events sent are more likely to represent blocking waits.
bool tryNonblockingWaitFirst = sendWaitEvents && associatedObjectForMonitorWait == NULL;
if (tryNonblockingWaitFirst)
{
ret = DoAppropriateAptStateWait(countHandles, handles, waitAll, 0 /* timeout */, mode);
if (ret == WAIT_TIMEOUT)
{
// Do a full wait and send the wait events
tryNonblockingWaitFirst = false;
}
else if (ret != WAIT_IO_COMPLETION && ret != WAIT_FAILED)
{
// The nonblocking wait was successful, don't send the wait events
sendWaitEvents = false;
}
}
if (sendWaitEvents)
{
if (associatedObjectForMonitorWait != NULL)
{
FireEtwWaitHandleWaitStart(
ETW::WaitHandleLog::WaitHandleStructs::MonitorWait,
associatedObjectForMonitorWait,
GetClrInstanceId());
}
else
{
FireEtwWaitHandleWaitStart(ETW::WaitHandleLog::WaitHandleStructs::Unknown, NULL, GetClrInstanceId());
}
}
ULONGLONG dwStart = 0, dwEnd;
retry:
if (millis != INFINITE)
{
dwStart = CLRGetTickCount64();
}
if (tryNonblockingWaitFirst)
{
// We have a final wait result from the nonblocking wait above
tryNonblockingWaitFirst = false;
}
else
{
ret = DoAppropriateAptStateWait(countHandles, handles, waitAll, millis, mode);
}
if (ret == WAIT_IO_COMPLETION)
{
_ASSERTE (alertable);
if (m_State & TS_Interrupted)
{
HandleThreadInterrupt();
}
// We could be woken by some spurious APC or an EE APC queued to
// interrupt us. In the latter case the TS_Interrupted bit will be set
// in the thread state bits. Otherwise we just go back to sleep again.
if (millis != INFINITE)
{
dwEnd = CLRGetTickCount64();
if (dwEnd - dwStart >= millis)
{
ret = WAIT_TIMEOUT;
goto WaitCompleted;
}
else
{
millis -= (DWORD)(dwEnd - dwStart);
}
}
goto retry;
}
_ASSERTE((ret >= WAIT_OBJECT_0 && ret < (WAIT_OBJECT_0 + (DWORD)countHandles)) ||
(ret >= WAIT_ABANDONED && ret < (WAIT_ABANDONED + (DWORD)countHandles)) ||
(ret == WAIT_TIMEOUT) || (ret == WAIT_FAILED));
// countHandles is used as an unsigned -- it should never be negative.
_ASSERTE(countHandles >= 0);
// We support precisely one WAIT_FAILED case, where we attempt to wait on a
// thread handle and the thread is in the process of dying we might get a
// invalid handle substatus. Turn this into a successful wait.
// There are three cases to consider:
// 1) Only waiting on one handle: return success right away.
// 2) Waiting for all handles to be signalled: retry the wait without the
// affected handle.
// 3) Waiting for one of multiple handles to be signalled: return with the
// first handle that is either signalled or has become invalid.
if (ret == WAIT_FAILED)
{
DWORD errorCode = ::GetLastError();
if (errorCode == ERROR_INVALID_PARAMETER)
{
if (CheckForDuplicateHandles(countHandles, handles))
COMPlusThrow(kDuplicateWaitObjectException);
else
COMPlusThrowHR(HRESULT_FROM_WIN32(errorCode));
}
else if (errorCode == ERROR_ACCESS_DENIED)
{
// A Win32 ACL could prevent us from waiting on the handle.
COMPlusThrow(kUnauthorizedAccessException);
}
else if (errorCode == ERROR_NOT_ENOUGH_MEMORY)
{
ThrowOutOfMemory();
}
#ifdef TARGET_UNIX
else if (errorCode == ERROR_NOT_SUPPORTED)
{
// "Wait for any" and "wait for all" operations on multiple wait handles are not supported when a cross-process sync
// object is included in the array
COMPlusThrow(kPlatformNotSupportedException, W("PlatformNotSupported_NamedSyncObjectWaitAnyWaitAll"));
}
#endif
else if (errorCode != ERROR_INVALID_HANDLE)
{
ThrowWin32(errorCode);
}
if (countHandles == 1)
ret = WAIT_OBJECT_0;
else if (waitAll)
{
// Probe all handles with a timeout of zero. When we find one that's
// invalid, move it out of the list and retry the wait.
for (int i = 0; i < countHandles; i++)
{
// WaitForSingleObject won't pump memssage; we already probe enough space
// before calling this function and we don't want to fail here, so we don't
// do a transition to tolerant code here
DWORD subRet = WaitForSingleObject (handles[i], 0);
if (subRet != WAIT_FAILED)
continue;
_ASSERTE(::GetLastError() == ERROR_INVALID_HANDLE);
if ((countHandles - i - 1) > 0)
memmove(&handles[i], &handles[i+1], (countHandles - i - 1) * sizeof(HANDLE));
countHandles--;
break;
}
// Compute the new timeout value by assume that the timeout
// is not large enough for more than one wrap
dwEnd = CLRGetTickCount64();
if (millis != INFINITE)
{
if (dwEnd - dwStart >= millis)
{
ret = WAIT_TIMEOUT;
goto WaitCompleted;
}
else
{
millis -= (DWORD)(dwEnd - dwStart);
}
}
goto retry;
}
else
{
// Probe all handles with a timeout as zero, succeed with the first
// handle that doesn't timeout.
ret = WAIT_OBJECT_0;
int i;
for (i = 0; i < countHandles; i++)
{
TryAgain:
// WaitForSingleObject won't pump memssage; we already probe enough space
// before calling this function and we don't want to fail here, so we don't
// do a transition to tolerant code here
DWORD subRet = WaitForSingleObject (handles[i], 0);
if ((subRet == WAIT_OBJECT_0) || (subRet == WAIT_FAILED))
break;
if (subRet == WAIT_ABANDONED)
{
ret = (ret - WAIT_OBJECT_0) + WAIT_ABANDONED;
break;
}
// If we get alerted it just masks the real state of the current
// handle, so retry the wait.
if (subRet == WAIT_IO_COMPLETION)
goto TryAgain;
_ASSERTE(subRet == WAIT_TIMEOUT);
ret++;
}
}
}
WaitCompleted:
_ASSERTE((ret != WAIT_TIMEOUT) || (millis != INFINITE));
if (sendWaitEvents)
{
FireEtwWaitHandleWaitStop(GetClrInstanceId());
}
return ret;
}

@VSadov
Copy link
Member

VSadov commented Nov 6, 2024

I think waiting with "alertable" is to allow a sleeping/waiting thread to react to Thread.Interrupt().
However the Thread.Interrupt() is not supported on NativeAOT

public void Interrupt() { throw new PlatformNotSupportedException(); }

I suspect the culprit is something with COM and message pumping, but it looks like we call CoWaitForMultipleHandles, which should do that.

@jkotas
Copy link
Member

jkotas commented Nov 6, 2024

CoreCLR may be doing the wait via user installed synchronization context. It would be useful to find out the callstack of the wait in CoreCLR.

@manodasanW
Copy link
Contributor

I did have a chat with some COM folks. On ASTA threads such as this one, CoWaitForMultipleHandles doesn't pump COM messages unless COWAIT_DISPATCH_CALLS is passed as the flag. But that isn't passed in the CoreClr case here either. So that might explain why it is waiting but doesn't explain why CoreClr doesn't hit this issue.

My initial guess was the CleanupWrappersInCurrentCtxThread call was doing some cleanup for the ASTA scenarios before it waited, but doing some debugging of the CoreCLR scenario, it seems it isn't as there is a conditional that makes it used in certain scenarios which isn't set here. And from my debugging, it seems we are doing the same wait on CoreClr similar to AOT, but with alertable set. Not sure if that is somehow making us get lucky to not hitting this issue due to other APC calls happening which is what that seems to control.

@jkotas
Copy link
Member

jkotas commented Nov 6, 2024

Not sure if that is somehow making us get lucky to not hitting this issue due to other APC calls happening

You can build a local native AOT package that changes this to alertable=true and try to repro it with that to prove or disprove this hypothesis.

@MichalStrehovsky
Copy link
Member

Not sure if that is somehow making us get lucky to not hitting this issue due to other APC calls happening

You can build a local native AOT package that changes this to alertable=true and try to repro it with that to prove or disprove this hypothesis.

If there's something you'd like to try, here are the accelerated steps:

  • Clone the runtime repo, make sure you have the build prerequisites (CMake, VS with C++ workload, Python)
  • CD to the clone of the repo
  • Assuming you're targeting 9.0 with an RC2 SDK: git checkout release/9.0-rc2
  • build.cmd clr.aot -c Release

Then publish your project with native AOT as usual (might want to delete all of bin/obj before since this is not incremental), but set IlcSdkPath property like this <IlcSdkPath>{REPO_PATH}\artifacts\bin\coreclr\windows.x64.Release\aotsdk\</IlcSdkPath>, replacing {REPO_PATH} with where you cloned the runtime repo. This will pick up your build of runtime/corelib. The nice thing about this is that once you do this, you can set breakpoints and debug within the code. You should also be able to pass -c Debug to the build.cmd invocation to build the debug version of the runtime (it will be dropped to a similar path under artifacts) and use that instead - it's easier to debug.

@MichalStrehovsky
Copy link
Member

This is not native AOT specific.

I stepped through CoreCLR VM version of this. We end up taking a path where we wait like this:

>	coreclr.dll!MsgWaitHelper(int numWaiters, void * * phEvent, int bWaitAll, unsigned long millis, int bAlertable) Line 3140	C++
 	coreclr.dll!Thread::DoAppropriateAptStateWait(int numWaiters, void * * pHandles, int bWaitAll, unsigned long timeout, WaitMode mode) Line 3178	C++
 	coreclr.dll!Thread::DoAppropriateWaitWorker(int countHandles, void * * handles, int waitAll, unsigned long millis, WaitMode mode, void * associatedObjectForMonitorWait) Line 3363	C++
 	coreclr.dll!Thread::DoAppropriateWait(int countHandles, void * * handles, int waitAll, unsigned long millis, WaitMode mode, PendingSync * syncState) Line 3032	C++
 	[Inline Frame] coreclr.dll!CLREventBase::WaitEx(unsigned long) Line 459	C++
 	coreclr.dll!CLREventBase::Wait(unsigned long dwMilliseconds, int alertable, PendingSync * syncState) Line 413	C++
 	coreclr.dll!FinalizerThread::FinalizerThreadWait() Line 599	C++
 	coreclr.dll!InteropLibImports::WaitForRuntimeFinalizerForExternal() Line 1148	C++
 	[Inline Frame] Windows.UI.Xaml.dll!DirectUI::ReferenceTrackerManager::TriggerFinalization() Line 350	C++
 	Windows.UI.Xaml.dll!DirectUI::DXamlCore::OnAfterAppSuspend() Line 4155	C++
 	Windows.UI.Xaml.dll!XAML::PLM::PLMHandler::InvokeAfterAppSuspendCallback() Line 431	C++
 	Windows.UI.Xaml.dll!XAML::PLM::PLMHandler::DecrementAppSuspendActivityCount() Line 226	C++

There's a difference between native AOT and CoreCLR - on CoreCLR we pass COWAIT_ALERTABLE, on native AOT we don't.

In the end it doesn't make a difference because they both deadlock.

@agocke
Copy link
Member

agocke commented Dec 5, 2024

Who has the next step here? If this is possible to hit in CoreCLR as well, is there a latent bug in this code? Do we know why we're hitting it more frequently in Native AOT?

@Sergio0694
Copy link
Contributor Author

"If this is possible to hit in CoreCLR as well, is there a latent bug in this code?"

Yeah @manodasanW could consistently repro the same issue in CoreCLR as well, it seems this is not unique to NativeAOT. They're both broken in ASTA scenarios it would seem. Unclear why this is more consistent with NativeAOT. My understanding is the next steps are to debug the app shutdown with either .NETCore.Uwp or .NET Native (likely .NET Native) and compare traces and see if it's doing anything significantly different, and if so, figure out why that was changed. One hypothesis is that perhaps old .NET and .NET Native were not running finalizers at all upon shutdown (that would seem unlikely though), or perhaps doing that but not doing context switches in that scenario.

There's also the additional caveat that this is technically not even a "shutdown" scenario, it's just PLM notifying the app that it's being suspended. This may or may not mean the app will then be closed, but it's not guaranteed. So not sure if there's even a way to tell. Checking what .NET Native was doing in an equivalent scenario should hopefully give us more information though 🤔

@jkotas
Copy link
Member

jkotas commented Dec 5, 2024

My hypothesis is that this is a race condition that was always there, but that was made more likely to be hit due to some unrelated changes (or by things being faster in general).

@MichalStrehovsky
Copy link
Member

The .NET Native side doesn't seem to be calling WaitForPendingFinalizers (the necessary condition for the hang). The closest I found is this: https://github.com/dotnet/corert/blob/a8830fe5158c499a75b19239524a6d1092a679fe/src/System.Private.Interop/src/Shared/RCWWalker.cs#L361-L369

@Sergio0694
Copy link
Contributor Author

That's quite surprising 👀
Two questions:

  • If .NET Native is actually doing that, I assume we could do the same in this scenario for now? Presumably, this does not cause leaks, but rather will just be lazier in waiting for finalizers upon suspension, meaning in the worst case scenario your app will just have a slightly higher memory use when suspended, and will only finish releasing things when the next GC runs the next time it is resumed? Do I have that right? Not even sure if this applies to Win32 apps given they don't support suspension at all anyway.
  • As a follow up, what's the right way to actually fix this? That comment makes me think that the .NET Native folks did think there was a way to properly implement this, and just didn't have time to do that for v1. What would that look like? 🤔

@jkotas
Copy link
Member

jkotas commented Dec 9, 2024

As a follow up, what's the right way to actually fix this?

Do not block main app thread by waiting for finalizers. Wait for the finalizers on a dedicated thread so that the rest of the app can run undisturbed.

@Sergio0694
Copy link
Contributor Author

But isn't the issue that PLM is requesting the app to suspend and that's what's triggered that reference tracking callback, and we have to block to ensure the suspension event handler doesn't complete before we're done? Basically if we used a different thread to wait, would that be doing anything at all vs. just letting things finalize on their own? We wouldn't be able to signal back to Jupiter when that's done anyway since we'd no longer be inside that wait callback anymore, right? Am I missing something? 😅

@jkotas
Copy link
Member

jkotas commented Dec 9, 2024

You can run a message pumping loop on the app thread that allows the remote calls through and ends when the finalization is finished. I am not sure whether the reference tracking callback is compatible with that. Is the reference tracking callback the only way for the app to get notified about pending suspend or is there a lower-level API that the app can subscribe to get notified? If there is a lower-level API like that, it may be a better option.

@Sergio0694
Copy link
Contributor Author

Mmh good question. @AaronRobinsonMSFT @jkoritzinsky @manodasanW any thoughts on this?
I can also ping some XAML/WASDK folks, they might be more familiar with this (eg. @jonwis @kmahone) 😄

Related question for Jon and Keith: the docs for IReferenceTrackerHost::ReleaseDisconnectedReferenceSources just say that the method is expected to get the host to " call IUnknown::Release on any reference tracker objects that have been disconnected by a reference source", yet .NET Native clearly doesn't seem to be doing that at all (since it no-ops). Was this way purely supposed to be an optimization to reclaim as much memory as possible before suspension, or are there any gotchas here?

Eg. would it be fine and/or better than .NET Native's approach to at least eg. call a blocking full GC here to try to reclaim as much memory as possible as a best effort, and then any pending finalizers would just finish running when/if the process is potentially resumed?

@Sergio0694
Copy link
Contributor Author

@jkotas would it be reasonable to do this:

  • PR into 9.0.x to comment out GC.WaitForPendingFinalizers() and match .NET Native
  • For .NET 10, we can look into a proper fix. Two possible scenarios:
    • We figure out a good way to fix the issue entirely for everyone, and we just do that
    • We add a feature switch to enable/disable waiting for finalizers there. Then we can just disable it in the UWP SDK.

@Sergio0694
Copy link
Contributor Author

Useful links:

_Check_return_ HRESULT DXamlCore::OnAfterAppSuspend()
{
    HRESULT hr = S_OK;

    // We request additional suspend time only for managed apps because
    // GC is an asynchronous operation that can be slow.
    if (ReferenceTrackerManager::HaveHost())
    {
        // Do a GC and wait for finalizers.
        IGNOREHR(ReferenceTrackerManager::TriggerCollectionForSuspend());
        IGNOREHR(ReferenceTrackerManager::TriggerFinalization());
    }

    // Flush the release queue
    // We make this call for both native and managed apps because the queue
    // is not guaranteed to be empty for native apps (for example, if some
    // objects were recently released off thread).
    ReleaseQueuedObjects();

    if (m_pControl)
    {
        IFC(m_pControl->GetCoreServices()->OnSuspend(false /* isTriggeredByResourceTimer */, true /* allowOfferResources */));
    }

Cleanup:
    return hr;
}

So it feels like (and the comments also say that), waiting for finalizers is just "recommended", not strictly required.

@jkotas
Copy link
Member

jkotas commented Dec 9, 2024

PR into 9.0.x to comment out GC.WaitForPendingFinalizers() and match .NET Native

Sounds reasonable to me.

For .NET 10, we can look into a proper fix. Two possible scenarios:

I think that the proper fix may need to be in CsWinRT or Xaml (partially at least).

jkotas pushed a commit that referenced this issue Dec 10, 2024
…tedReferenceSources' (#110551)

This PR updates the IReferenceTrackerHost::ReleaseDisconnectedReferenceSources implementation for CoreCLR and NativeAOT to match what .NET Native was doing, and not wait for finalizers, to avoid deadlocks in ASTA scenarios (UWP). Finalizers will just continue running normally, and if the process is suspended (which can only happen on UWP anyway), they'll resume when the process is resumed later on. Worst case scenario this would only cause a non-optimal memory use cleanup before suspension (which is better than a deadlock anyway). Can be revisited and improved for .NET 10 if needed.

Contributes to #109538
hez2010 pushed a commit to hez2010/runtime that referenced this issue Dec 14, 2024
…tedReferenceSources' (dotnet#110551)

This PR updates the IReferenceTrackerHost::ReleaseDisconnectedReferenceSources implementation for CoreCLR and NativeAOT to match what .NET Native was doing, and not wait for finalizers, to avoid deadlocks in ASTA scenarios (UWP). Finalizers will just continue running normally, and if the process is suspended (which can only happen on UWP anyway), they'll resume when the process is resumed later on. Worst case scenario this would only cause a non-optimal memory use cleanup before suspension (which is better than a deadlock anyway). Can be revisited and improved for .NET 10 if needed.

Contributes to dotnet#109538
@MichalStrehovsky
Copy link
Member

I'm going to close this since #110551 fixed this to work same as .NET Native and given:

I think that the proper fix may need to be in CsWinRT or Xaml (partially at least).

we don't know how the actual fix is supposed to look like (i.e. XAML/CsWinRT might need to define how this is actually supposed to work; we might even need new APIs or flags - who knows). There is no current known work on this within dotnet/runtime.

Sergio0694 added a commit to Sergio0694/runtime that referenced this issue Jan 7, 2025
…tedReferenceSources' (dotnet#110551)

This PR updates the IReferenceTrackerHost::ReleaseDisconnectedReferenceSources implementation for CoreCLR and NativeAOT to match what .NET Native was doing, and not wait for finalizers, to avoid deadlocks in ASTA scenarios (UWP). Finalizers will just continue running normally, and if the process is suspended (which can only happen on UWP anyway), they'll resume when the process is resumed later on. Worst case scenario this would only cause a non-optimal memory use cleanup before suspension (which is better than a deadlock anyway). Can be revisited and improved for .NET 10 if needed.

Contributes to dotnet#109538
@github-actions github-actions bot locked and limited conversation to collaborators Feb 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr partner-impact This issue impacts a partner who needs to be kept updated
Projects
Archived in project
Development

No branches or pull requests

6 participants