-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing\eventpipe\providervalidation\providervalidation\providervalidation.cmd fails in gcstress3, Windows x86 #2230
Comments
Taking a look today |
Took a look at the dump that got generated and picked out a few things. I think the linked list of exception handlers might be corrupted. I took a look at runtime/src/coreclr/src/vm/i386/excepx86.cpp Line 153 in 2cc926e
I'm trying to run this test locally so I can figure out what is setting this top @BruceForstall, any idea what might cause the exception handler list to get corrupted inside P_Invoke code? It looks like The P_Invoke code ended up in |
I don't have an easy idea for why this corruption would occur. My first suggestion would be to run with @janvorli Any advice here? |
Could this be related to #2215? |
I'll take a look. |
@josalem I cannot find the dump - could you please tell me where to get it? |
It got uploaded as part of the Helix workitem. Here's a direct link. The build artifacts can be downloaded from AzDO for symbols. |
I have figured out what's going on. It is not a corruption. The check that fails is actually wrong in this state where we have just returned from a PInvoke and the stack walk was triggered from the JIT_PInvokeEndRarePath. The problem is actually quite interesting and it has to do with the calling convention of the function that was pinvoked - namely the fact that the callee pops the arguments from stack. So here is what happens:
Now the question is why we were not hitting this before. I think that the likely reason could be that the JIT was not pushing the pinvoke arguments to the stack before calling the PINVOKE_BEGIN. cc: @jkotas |
I do not think that this is the problem. The contract for
I think the problem is https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/i386/cgenx86.cpp#L617 in combination with dotnet/coreclr#22560 (comment) . The ReadyToRun helper does not set the MethodDesc for the PInvoke frame. It means that the unwinder is not able to adjust the stack for popped arguments and we get this problem. |
cc @fadimounir |
@jkotas thank you, I've missed the comment on the m_pCallSiteSP. |
Thoughts about how to fix this? |
Can we make some changes to the JIT to make it pass the PInvoke MethodDesc to JIT_PInvokeBegin helper? The MethodDesc can be loaded by a R2R helper call |
Yes, that may be one option. Another option is to make the JIT to pass the size of the stack-arguments to |
#if !defined(_TARGET_64BIT_)
// On 32-bit targets, indirect calls need the size of the stack args in InlinedCallFrame.m_Datum.
const unsigned numStkArgBytes = call->fgArgInfo->GetNextSlotNum() * TARGET_POINTER_SIZE;
src = comp->gtNewIconNode(numStkArgBytes, TYP_INT);
#else Is that applicable also to arm32? |
I do not see why this would be needed for anything but Windows x86. |
Interesting... probably legacy code before the time of arm/arm64 then. @BruceForstall @josalem did you try to repro this with complus_gcstress=0xf? If so, did it start to fail deterministically? |
Same test Error message: Stack trace: |
@fadimounir Looks like #26834 fixed this? Can we close this? The "new failure" above is crossgen2, and is unrelated. |
@BruceForstall The fix has been reverted due to #31809. I'll close the issue once I merge the new PR |
This test fails in the CI in the last 3 runs of Windows x86 GCStress=0x3:
\r\nAssert failure(PID 4216 [0x00001078], Thread: 7224 [0x1c38]): Consistency check failed: Invalid transition into managed code!\r\n\r\nWe're walking this thread's stack and we've reached a managed frame at Esp=0x00F7D758. (The method is Advapi32::EventWriteTransfer_PInvoke) The very next FS:0 record (0x00F7D760) up from this point on the stack should be one of our 'unmanaged to managed SEH handlers', but its not... its something else, and that's very bad. It indicates that someone managed to call into managed code without setting up the proper exception handling.\r\n\r\nGet a good unmanaged stack trace for this thread. All FS:0 records are on the stack, so you can see who installed the last handler. Somewhere between that function and where the thread is now is where the bad transition occurred.\r\n\r\nA little extra info: FS:0 = 0x00F7D370, pEHR->Handler = 0x72E381BD\r\nFAILED: IsUnmanagedToManagedSEHHandler(pEHR)\r\n\r\nCORECLR! CHECK::Trigger + 0x2FB (0x727675be)\r\nCORECLR! VerifyValidTransitionFromManagedCode + 0x165 (0x7283af20)\r\nCORECLR! StackFrameIterator::NextRaw + 0x887 (0x72806395)\r\nCORECLR! StackFrameIterator::Next + 0x46 (0x72805aca)\r\nCORECLR! Thread::StackWalkFramesEx + 0x180 (0x7280705c)\r\nCORECLR! Thread::StackWalkFrames + 0x159 (0x72806e5c)\r\nCORECLR! ScanStackRoots + 0x198 (0x72cbc639)\r\nCORECLR! GCToEEInterface::GcScanRoots + 0x104 (0x72cbb9b6)\r\nCORECLR! WKS::gc_heap::mark_phase + 0x1AD (0x72c71931)\r\nCORECLR! WKS::gc_heap::gc1 + 0x167 (0x72c6bb08)\r\n File: F:\workspace\_work\1\s\src\coreclr\src\vm\i386\excepx86.cpp Line: 329\r\n Image: C:\h\w\A86D08D2\p\CoreRun.exe\r\n\r\n\r\nReturn code: 1\r\nRaw output file: C:\h\w\A86D08D2\w\9D9C0906\e\tracing\eventpipe\Reports\tracing.eventpipe\providervalidation\providervalidation\providervalidation.output.txt\r\nRaw output:\r\nBEGIN EXECUTION\r\n "C:\h\w\A86D08D2\p\corerun.exe" providervalidation.dll \r\n 0.0s: ==TEST STARTING==\r\n 5.6s: Started sending sentinel events...\r\n 6.0s: Connecting to EventPipe...\r\n 10.5s: Connected to EventPipe with sessionID '0x7633a78'\r\n 10.5s: Creating EventPipeEventSource...\r\n 22.7s: EventPipeEventSource created\r\n 26.4s: Dynamic.All callback registered\r\n 26.4s: Starting stream processing...\r\n 33.4s: Saw new provider 'Microsoft-DotNETCore-SampleProfiler'\r\n 50.9s: Saw sentinel event\r\n 51.0s: Stopped sending sentinel events\r\n 51.3s: Starting event generating action...\r\n 66.5s: Fired MyEvent 0/100,000 times...\r\n 70.3s: Saw new provider 'MyEventSource'\r\n 74.1s: Fired MyEvent 10,000/100,000 times...\r\n 75.0s: Fired MyEvent 20,000/100,000 times...\r\nExpected: 100\r\nActual: -1073740286\r\nEND EXECUTION - FAILED\r\nFAILED\r\nTest Harness Exitcode is : 1\r\nTo run the test:\r\n> set CORE_ROOT=C:\h\w\A86D08D2\p\r\n> C:\h\w\A86D08D2\w\9D9C0906\e\tracing\eventpipe\providervalidation\providervalidation\providervalidation.cmd\r\nExpected: True\r\nActual: False
https://dev.azure.com/dnceng/public/_build/results?buildId=497294&view=ms.vss-test-web.build-test-results-tab&runId=15824298&resultId=110450&paneView=debug
The text was updated successfully, but these errors were encountered: