Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC-report all associated LoaderAllocators #110856

Merged
merged 7 commits into from
Jan 21, 2025

Conversation

tomeksowi
Copy link
Contributor

@tomeksowi tomeksowi commented Dec 20, 2024

Fixes intermittent (about 1 fail in 100 runs) assert failure (refCollectionObject != NULL) in Interop/ICustomMarshaler/ConflictingNames/MultipleALCs with GCStress=0x3 on RISC-V. Symptom same as #47696 (abandoned because the issue relented spontaneously, the root cause was not found).

I did some more digging and it always failed when walking the following stack trace:

System.SpanHelpers.SequenceEqual    <----- assertion fails here
Filter.Match
MemberInfoCache`1[__Canon].PopulateMethods
MemberInfoCache`1[__Canon].GetListByName
MemberInfoCache`1[__Canon].Populate
System.RuntimeType.GetMethodCandidates
System.RuntimeType.GetMethodImplCommon
RunInALC.Run
RunInALC.ConflictingCustomMarshalerNamesInNoncollectibleLoadContexts_Succeeds
RunInALC.TestEntryPoint

The problem was after ConflictingCustomMarshalerNamesInCollectibleLoadContexts_Succeeds() the LoaderAllocator's weak ref to its object was severed but LoaderAllocator was still associated with the loaded image memory range. Then in ConflictingCustomMarshalerNamesInNoncollectibleLoadContexts_Succeeds() the same file is loaded and another LoaderAllocator got associated with the reused image. The refs passed to SequenceEqual pointed to a method name whose bytes were in the loaded image. During GC in PromoteCarefully LoaderAllocator::GetAssociatedLoaderAllocator_Unsafe returned the first associated LoaderAllocator with the severed weak ref, which tripped the assertion.

I didn't find anything to prevent many LoaderAllocators being associated with the same memory range so I'm assuming it's valid state. The fix is to report all associated LoaderAllocators.

Part of #84834, cc @dotnet/samsung @mangod9 @janvorli

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 20, 2024
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@clamp03 clamp03 added arch-riscv Related to the RISC-V architecture and removed arch-riscv Related to the RISC-V architecture labels Dec 20, 2024
@risc-vv
Copy link

risc-vv commented Dec 20, 2024

27573ee is being scheduled for building and testing

GIT: 27573ee4a2110113adb028a8347845071f30eb38
REPO: dotnet/runtime
BRANCH: main

Release-CLR-build FAILED

buildinfo.json

Compilation failed during coreclr-tests build```

</details>

@risc-vv
Copy link

risc-vv commented Dec 20, 2024

27573ee is being scheduled for building and testing

GIT: 27573ee4a2110113adb028a8347845071f30eb38
REPO: tomeksowi/runtime
BRANCH: gc-report-all-LoaderAllocators

1 similar comment
@risc-vv
Copy link

risc-vv commented Dec 20, 2024

27573ee is being scheduled for building and testing

GIT: 27573ee4a2110113adb028a8347845071f30eb38
REPO: tomeksowi/runtime
BRANCH: gc-report-all-LoaderAllocators

@risc-vv
Copy link

risc-vv commented Dec 20, 2024

27573ee is being scheduled for building and testing

GIT: 27573ee4a2110113adb028a8347845071f30eb38
REPO: tomeksowi/runtime
BRANCH: gc-report-all-LoaderAllocators

Release-CLR-build FAILED

buildinfo.json

Compilation failed during coreclr-tests build```

</details>

@janvorli
Copy link
Member

I believe the bug is actually in reusing the image for a different loader allocator. We will need to fix that instead of the symptoms.

@tomeksowi
Copy link
Contributor Author

I believe the bug is actually in reusing the image for a different loader allocator. We will need to fix that instead of the symptoms.

I can have a look. To be exact, the fix would be suppressing image reuse for a different allocator or keeping the same allocator for reused images?

In general, is multiple loader allocators per memory range valid? It's worth asserting if not.

@jkotas
Copy link
Member

jkotas commented Dec 20, 2024

I believe the bug is actually in reusing the image for a different loader allocator.

We do maintain one mapping per file path. It is how we end up reusing same image for different loader allocators.

If we change to multiple mappings per file path, we will end up with different set of issues - for example, we won't be able to use LoadLibrary to map images on Windows anymore.

@janvorli
Copy link
Member

I got mislead, I have forgotten that we don't allow loading R2R stuff into ALCs, so I have thought that the issue is about sharing the actual machine code, which we should not share. We share the IL stuff though, as @jkotas mentioned.

@janvorli
Copy link
Member

Looking into it more - I have not remembered that we have added handling of RVA fields in collectible assemblies couple of years ago. That's why we record the ranges of collectible assemblies images to associate them with loader allocator and we can have multiple loader allocators associated with the range of the same image for this purpose.

@janvorli
Copy link
Member

The fix looks correct then. I am sorry for the confusion.

@tomeksowi
Copy link
Contributor Author

Release-CLR-build FAILED

Infrastructure failure, I'll undraft when tests pass locally.

@tomeksowi tomeksowi marked this pull request as ready for review December 23, 2024 09:50
@tomeksowi
Copy link
Contributor Author

Infrastructure failure, I'll undraft when tests pass locally.

Looks ok, PR ready for review.

@tomeksowi
Copy link
Contributor Author

Check failures look similar to other PRs lately, don't think they are related.

@janvorli @jkotas can you review?

@clamp03 clamp03 closed this Jan 16, 2025
@clamp03 clamp03 reopened this Jan 16, 2025
@clamp03 clamp03 requested review from jkotas and janvorli January 16, 2025 05:08
@am11 am11 added the arch-riscv Related to the RISC-V architecture label Jan 19, 2025
src/coreclr/utilcode/loaderheap.cpp Outdated Show resolved Hide resolved
src/coreclr/inc/utilcode.h Outdated Show resolved Hide resolved
@jkotas
Copy link
Member

jkotas commented Jan 21, 2025

Thank you!

@jkotas jkotas merged commit 21e92cb into dotnet:main Jan 21, 2025
91 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-riscv Related to the RISC-V architecture area-VM-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants