Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix calling convention mismatch in GC callouts #110685

Merged
merged 2 commits into from
Dec 16, 2024

Conversation

MichalStrehovsky
Copy link
Member

@MichalStrehovsky MichalStrehovsky commented Dec 13, 2024

Fixes #110607.

The native side expected fastcall, managed side wasn't fastcall.

Filed #110684 on the test hole. We would have caught it during native AOT x86 bringup if this had any tests since this is a guaranteed stack corruption and crash.

Cc @dotnet/interop-contrib

Fixes dotnet#110607.

The native side expects fastcall.

Filed dotnet#110684 on the test hole. We would have caught it during x86 bringup if this had _any_ tests since this is a guaranteed stack corruption and crash.
Copy link
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the JIT support fastcall here? I'm pretty sure this will just fail with another exception. I don't think we ever brought Fastcall support online.

Can we add a test or run some existing test suite on the PR?

Copy link
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I went looking through the code and found something very interesting. It looks like when I did the unmanaged calling-convention support rewrite in the JIT, I implemented fastcall correctly.

However, as we have never supported it for P/Invokes, our exceptions that block fastcall are P/Invoke-only. Additionally, the exceptions are only thrown in CoreCLR (NativeAOT likely fails in a more inscrutible way).

In any case, looks like Fastcall + UnmanagedCallersOnly should work here just fine.

@MichalStrehovsky
Copy link
Member Author

Does the JIT support fastcall here? I'm pretty sure this will just fail with another exception. I don't think we ever brought Fastcall support online.

I did test this and it worked.

Looking through our test tree, I don't see any test exercising Fastcall though. Do we have any testing for this? Should we have testing? It doesn't look like CoreCLR VM blocks UnmanagedCallersOnly with fastcall either.

Either way, I made this use not-fastcall per Jan's request.

@MichalStrehovsky
Copy link
Member Author

FCall callbacks probably are implicitly tested assuming tests exercise this code:

Do any of those call into managed code? We're not discussing test coverage of fastcall in the Visual C++ compiler but UnmanagedCallersOnly methods with CallConvs = typeof(CallConvFastcall).

@MichalStrehovsky MichalStrehovsky merged commit 20dc146 into dotnet:main Dec 16, 2024
88 checks passed
@MichalStrehovsky MichalStrehovsky deleted the fix110607 branch December 16, 2024 08:37
@MichalStrehovsky
Copy link
Member Author

This is fixed in main. I consider backport to 9.0 blocked on #110684 - I don't want to be laughed out of the shiproom when I say "no we don't have test coverage, but we do have a concept of a plan for it".

@Sergio0694
Copy link
Contributor

Would it be sufficient to validate this on both x86 and x64 with the Store? We won't be able to ship to x86 with Native AOT otherwise. In theory we could keep that target on .NET Native, but it'd be simpler for us if we could migrate all 3 targets. And would the fact x86 right now is flat out broken on NAOT in XAML scenario help make the case for servicing?

@jkotas
Copy link
Member

jkotas commented Dec 17, 2024

If you can validate the fix with Store, it should be good enough test coverage for servicing backport.

@Sergio0694
Copy link
Contributor

I tested this on x64 and it works fine (except for #110823, but that's unrelated). Currently blocked by other dependencies on testing with x86, but I hope to get that resolved by end of this week, hopefully. Will report back as soon as I can also validate on x86 and confirm that things work correctly, and then hopefully we can backport this too 🙂

@MichalStrehovsky
Copy link
Member Author

I tested this on x64 and it works fine (except for #110823, but that's unrelated).

The bug can only be observed on x86. There's no fastcall or stdcall on x64, there's only one calling convention. This PR is essentially no-op on x64 or arm64.

@Sergio0694
Copy link
Contributor

Oh I know, but I figured it wouldn't hurt to validate everything end to end too just in case, to make things a bit simpler for tactics to backport this. Also until we unblock x86 in the Store it was the only thing I could do 😅

@Sergio0694
Copy link
Contributor

Didn't want to delay this any further, so I ended up just yeeting all problematic code from the Store for testing 😆

Followed similar steps as #110762 (review):
Tested with the Microsoft Store and could not repro any hangs:

  • Checked out release/9.0-staging (f580364)
  • Cherry-pick 1033fb3 (Move ComWrappers AddRef to C/C++ #110762)
  • Cherry-pick c69ac7c (this merged PR)
  • Build with .\build.cmd clr.aot -c Release -arch x86
  • Set IlcSdkPath in the Store .csproj to runtime\artifacts\bin\coreclr\windows.x86.Release\aotsdk\
  • Set GenerateAppxPackageOnBuild
  • Switch to Release x86
  • Deploy

I could run the Store just fine. Didn't notice hangs, also I guess the fact the app didn't just explode as soon as the GC triggered should be enough to show the fix for the specific issue for this PR should be working fine. Can we backport this? 😅

Thank you!

@AaronRobinsonMSFT
Copy link
Member

/backport to release/9.0

Copy link
Contributor

github-actions bot commented Jan 6, 2025

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/12625972448

@AaronRobinsonMSFT
Copy link
Member

/backport to release/9.0-staging

Copy link
Contributor

github-actions bot commented Jan 6, 2025

Started backporting to release/9.0-staging: https://github.com/dotnet/runtime/actions/runs/12626169723

Sergio0694 pushed a commit to Sergio0694/runtime that referenced this pull request Jan 7, 2025
Fixes dotnet#110607.

The native side expected fastcall, managed side didn't.

Filed dotnet#110684 on the test hole. We would have caught it during x86 bringup if this had _any_ tests since this is a guaranteed stack corruption and crash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GC crash in .NET 9 UWP and WinUI 3 apps published with Native AOT on win-x86
5 participants