Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT/ARM64] Generate frames compatible with Apple compact unwinding #107766

Merged
merged 3 commits into from
Jan 12, 2025

Conversation

filipnavara
Copy link
Member

@filipnavara filipnavara commented Sep 12, 2024

Contributes to #76371

The are two changes in the PR that work in tandem. The first is a JIT change for generating slightly different frame layout on NativeAOT/ARM64/Apple platforms (iOS, macOS and tvOS). The second change is ObjWriter code that recognizes the structurally compatible unwinding information and generates compact unwinding codes in the object files instead of verbose DWARF unwinding information.

For NativeAOT/ARM64/Apple ABI do the following:

  1. Save callee registers in opposite order and in pairs.
  2. Prefer saving FP/LR on the top of the frame. Heuristics are used to avoid worse code quality outside of prolog/epilog due to addressing range limits of the ARM64 instruction set.
  3. Added optimization to lvaFrameAddress to rewrite FP-x references to SP+y when possible. This allows efficient addressing using positive indexes when FP points to the top of the frame. It mimics similar optimization on ARM32.

Each of these changes comes with some caveats:

  1. Saving registers in pairs may lead to 16 bytes more used in the stack space. The ARM64 stack has to be 16 byte aligned. If the prolog previously saved odd number of integer callee saved registers and odd number of floating point callee saved registers, we'd now generate one more 16 byte stack slot. This seems to be rather rare occurrence. It does not affect the number of instructions used. Likewise, the changed order doesn't have any impact on code size.
  2. Saving FP/LR at the top of the frame may cause one additional instruction in the prolog if any local or temporary variables are used. Additionally, addressing using negative offsets from FP is less efficient than addressing with positive offsets since the ARM64 instruction encoding can only address negative offsets with non-scaled encoding which significantly reduces the addressable range. This is mostly mitigated by the optimization in lvaFrameAddress. The additional increase in prolog size also causes a cascading effect where loop alignment and related code alignment (32-byte alignment for method start) significantly contribute to the code section size. In some cases the same alignment logic may also reduce the size.

As you can see, this is a bit of a trade-off and it's valid to ask if it's worth it.

Firstly, let me address the size question. The loop alignment seems to be the biggest contributor to the code size variation, and I filed issue #107284 to investigate whether we can come up with a better defaults for Apple platforms. The code size changes are predominantly contained to the small fixed change in the prolog size and the additional alignment. When testing the prototype on MAUI apps the biggest culprit to increased code size was the code generated from XAML that generates methods with ton of local variables. The change in lvaFrameAddress practically eliminated any negative effect of the frame layout on this type of code. Without the change the regression was nearly 50% due to stack references needing an indirect load with a register and extra instruction. Outside of MAUI the biggest visible effect is on code from Regex source generator but in that case it's quite evenly split between size improvements and regressions. That suggests the regex code may be a good candidate for measuring the performance characteristics of the loop alignment. The code size regressions on few examples I tried amount to around 2% (incl. the alignment). The saved space in the DWARF unwinding section is hovering around 90% +- 3%. To put that into absolute numbers we are looking at savings around 0.7 Mb for dotnet new maui app and around 3 Mb for System.Runtime.Tests in the linked executables. The savings in the size of the unwinding information far outweigh any increase in the code size.

Secondly, part of the motivation is that the Apple linker is notoriously buggy with processing the DWARF unwinding data. The compact unwinding tables are used as an index to the DWARF data for anything that cannot be expressed using compact unwinding code directly. Due to the structure of the tables this limits the effective offset of DWARF info to 24 bits and places a hard limit on the DWARF unwinding info size. At least with some versions of the Apple linker, breaking this limit results in silent corruption and runtime failures.

Comparison between `main` and PR for System.Runtime.Tests in Release build

Compare raw size of linked binaries:

ls -l ./artifacts/bin/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/publish/System.Runtime.Tests ../runtime-main/artifacts/bin/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/publish/System.Runtime.Tests
-rwxr-xr-x  1 filipnavara  staff  48913112 Sep 12 22:26 ../runtime-main/artifacts/bin/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/publish/System.Runtime.Tests
-rwxr-xr-x  1 filipnavara  staff  45827128 Sep 12 22:34 ./artifacts/bin/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/publish/System.Runtime.Tests

Bloaty check of linked binaries:

bloaty -d symbols ./artifacts/bin/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/publish/System.Runtime.Tests -- ../runtime-main/artifacts/bin/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/publish/System.Runtime.Tests
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +1.9%  +406Ki  +1.9%  +406Ki    [__TEXT,__managedcode]
   +55% +8.80Ki   +54% +8.80Ki    [__TEXT]
  +0.1% +2.51Ki  +0.1% +2.51Ki    [__DATA,.dotnet_eh_table]
  +0.1%    +272  +0.1%    +272    [__DATA,__data]
  -0.0%      -5  -0.0%      -5    [__TEXT,__cstring]
  -0.0%     -16  -0.0%     -16    [__TEXT,__const]
 -33.0% -2.78Ki -25.0% -2.78Ki    [__DATA]
  -3.6% -21.7Ki  -2.6% -16.0Ki    [__LINKEDIT]
 -20.5%  -437Ki -20.5%  -437Ki    [__TEXT,__unwind_info]
 -88.7% -2.90Mi -88.7% -2.90Mi    [__TEXT,__eh_frame]
  -6.3% -2.94Mi  -5.4% -2.94Mi    TOTAL

Bloaty check of object files:

bloaty ./artifacts/obj/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/native/System.Runtime.Tests.o -- ../runtime-main/artifacts/obj/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/native/System.Runtime.Tests.o 
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +1.9%  +406Ki  +1.9%  +406Ki    ,__managedcode
  +0.1% +9.64Ki  +0.1% +9.64Ki    ,__debug_loc
  +0.1% +2.51Ki  +0.1% +2.51Ki    ,.dotnet_eh_table
  +0.0%    +348  +0.0%    +348    ,__debug_line
  +0.1%    +272  +0.1%    +272    ,__data
  +0.0%     +20  +0.0%     +20    ,__debug_info
  -0.1%      -1  -2.3%      -1    []
  -0.0%      -9  -0.0%      -9    ,__const
  -6.8% -1.76Mi  [ = ]       0    [Unmapped]
 -88.7% -2.90Mi -88.7% -2.90Mi    ,__eh_frame
  -2.4% -4.25Mi  -2.4% -2.49Mi    TOTAL

Bloaty check of object files (detailed):

bloaty -d symbols ./artifacts/obj/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/native/System.Runtime.Tests.o -- ../runtime-main/artifacts/obj/System.Runtime.Tests/Release/net9.0-unix/osx-arm64/native/System.Runtime.Tests.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +1.9%  +410Ki  +1.6%  +410Ki    [36252 Others]
  +0.1% +9.64Ki  +0.1% +9.64Ki    [,__debug_loc]
  [NEW] +7.19Ki  [NEW] +6.97Ki    _TestUtilities_Unicode_System_Text_RegularExpressions_Generated__RegexGenerator_g_F026929D4AD63EB6EDB749A9B02B133C084237B3AFB8777B3DE0107C755B91565__GetRegex_0_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [NEW] +2.42Ki  [NEW] +2.20Ki    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F417AD2970E27AC5D022778EC7081B94838CBCB072379596BB27CB8792C23ED76__EnsureArrayIndexRegex_5_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [NEW] +2.02Ki  [NEW] +1.81Ki    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F417AD2970E27AC5D022778EC7081B94838CBCB072379596BB27CB8792C23ED76__Regex1_3_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [NEW] +1.54Ki  [NEW] +1.51Ki    _lsda0_TestUtilities_Unicode_System_Text_RegularExpressions_Generated__RegexGenerator_g_F026929D4AD63EB6EDB749A9B02B133C084237B3AFB8777B3DE0107C755B91565__GetRegex_0_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [NEW] +1.44Ki  [NEW] +1.23Ki    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F417AD2970E27AC5D022778EC7081B94838CBCB072379596BB27CB8792C23ED76__P0Regex_6_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [NEW] +1.18Ki  [NEW]    +976    _System_Runtime_Tests_System_Text_RegularExpressions_Generated__RegexGenerator_g_FB4BB6619E624A9FE5F4E687DA0CF38E2970A0CF70BB059A398625E558ACC7DAC__IanaAbbreviationRegex_0_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [NEW] +1.14Ki  [NEW]    +960    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F417AD2970E27AC5D022778EC7081B94838CBCB072379596BB27CB8792C23ED76__Regex2_4_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [NEW] +1.11Ki  [NEW]    +912    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F417AD2970E27AC5D022778EC7081B94838CBCB072379596BB27CB8792C23ED76__UnknownNodeObjectEmptyRegex_8_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [DEL]   -1017  [DEL]    -800    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F24C2B164CF72F5E63C813A2B442D56F3F17BFFC74EAF7C2818EB7F6278C5183A__EncodeCharRegex_1_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [DEL] -1.11Ki  [DEL]    -912    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F24C2B164CF72F5E63C813A2B442D56F3F17BFFC74EAF7C2818EB7F6278C5183A__UnknownNodeObjectEmptyRegex_8_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [DEL] -1.14Ki  [DEL]    -960    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F24C2B164CF72F5E63C813A2B442D56F3F17BFFC74EAF7C2818EB7F6278C5183A__Regex2_4_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [DEL] -1.17Ki  [DEL]    -960    _System_Runtime_Tests_System_Text_RegularExpressions_Generated__RegexGenerator_g_F23FFE14CED6C53CC123B603EF102D84787AD8CF9A59E83434F2BDF516814C2CC__IanaAbbreviationRegex_0_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [DEL] -1.44Ki  [DEL] -1.23Ki    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F24C2B164CF72F5E63C813A2B442D56F3F17BFFC74EAF7C2818EB7F6278C5183A__P0Regex_6_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [DEL] -1.57Ki  [DEL] -1.54Ki    _lsda0_TestUtilities_Unicode_System_Text_RegularExpressions_Generated__RegexGenerator_g_F25E40F247ED975110B5E93851ECC9A2CB78C4DC17A73BF5F5EE6D76B142C41B6__GetRegex_0_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [DEL] -2.00Ki  [DEL] -1.80Ki    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F24C2B164CF72F5E63C813A2B442D56F3F17BFFC74EAF7C2818EB7F6278C5183A__Regex1_3_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [DEL] -2.42Ki  [DEL] -2.20Ki    _S_P_Xml_System_Text_RegularExpressions_Generated__RegexGenerator_g_F24C2B164CF72F5E63C813A2B442D56F3F17BFFC74EAF7C2818EB7F6278C5183A__EnsureArrayIndexRegex_5_RunnerFactory_Runner__TryMatchAtCurrentPosition
  [DEL] -7.19Ki  [DEL] -6.97Ki    _TestUtilities_Unicode_System_Text_RegularExpressions_Generated__RegexGenerator_g_F25E40F247ED975110B5E93851ECC9A2CB78C4DC17A73BF5F5EE6D76B142C41B6__GetRegex_0_RunnerFactory_Runner__TryMatchAtCurrentPosition
  -6.8% -1.76Mi  [ = ]       0    [Unmapped]
 -88.7% -2.90Mi -88.7% -2.90Mi    lsection3
  -2.4% -4.25Mi  -2.4% -2.49Mi    TOTAL

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 12, 2024
@filipnavara filipnavara requested review from BruceForstall and removed request for MichalStrehovsky September 12, 2024 21:18
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Sep 12, 2024
@filipnavara
Copy link
Member Author

filipnavara commented Sep 12, 2024

The dominant part are the JIT changes, hence area-CodeGen-coreclr is the right label.

cc @dotnet/ilc-contrib @VSadov @ivanpovazan

@filipnavara filipnavara added arch-arm64 os-mac-os-x macOS aka OSX os-ios Apple iOS os-tvos Apple tvOS labels Sep 12, 2024
Copy link
Contributor

Tagging subscribers to 'os-ios': @vitek-karas, @kotlarmilos, @ivanpovazan, @steveisok, @akoeplinger
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Tagging subscribers to 'os-tvos': @vitek-karas, @kotlarmilos, @ivanpovazan, @steveisok, @akoeplinger
See info in area-owners.md if you want to be subscribed.

@marek-safar
Copy link
Contributor

@jkotas would could help to move this PR forward?

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas would could help to move this PR forward?

The change looks good to me. Somebody from the codegen team needs to sign-off on the JIT changes.

cc @JulieLeeMSFT

@jkotas
Copy link
Member

jkotas commented Oct 21, 2024

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JulieLeeMSFT JulieLeeMSFT requested a review from EgorBo October 21, 2024 18:04
@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Oct 21, 2024
@JulieLeeMSFT
Copy link
Member

@EgorBo, please do code review for Filip's compact unwinding PR.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over it and they sounds reasonable. @BruceForstall - you might want to double check the changes in lclvars.cpp. I don't understand those much.

@@ -2807,6 +2807,16 @@ inline
{
*pBaseReg = REG_SPBASE;
}
#elif defined(TARGET_ARM64)
if (FPbased && !codeGen->isFramePointerRequired() && varOffset < 0 &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be under compiler->IsTargetAbi(CORINFO_NATIVEAOT_ABI) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and other changes in genRestoreCalleeSavedRegisterGroup/ genSaveCalleeSavedRegisterGroup of codegenarm64.cpp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization is valid even for non-NativeAOT scenarios. The reason I explicitly opted not to guard it is that this is practically only reachable in stress tests that enable SaveFpLrWithAllCalleeSavedRegisters on CoreCLR and the general guidance was that we rely on CoreCLR stress testing coverage for JIT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in gen[Restore/Save]CalleeSavedRegisterGroup are guarded by genReverseAndPairCalleeSavedRegisters which is only set for NativeAOT/Apple. Additional guards would only contribute to the throughput cost. The change where LR/FP is separated into its own condition is no-op for other ARM64 scenarios, it just moves one iteration of the loop into its own condition. We don't want to reverse the order for them so this was the easiest way to achieve the desired behavior without too many ifs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good to me

@@ -845,12 +845,19 @@ void CodeGen::genSaveCalleeSavedRegisterGroup(regMaskTP regsMask, int spDelta, i

for (int i = 0; i < regStack.Height(); ++i)
{
RegPair regPair = regStack.Bottom(i);
RegPair regPair = genReverseAndPairCalleeSavedRegisters ? regStack.Top(i) : regStack.Bottom(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to consider to have #ifdef for NativeAOT to see if that erases any TP regression that we are seeing now:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how an #ifdef could be used here. There's a single universal build of the ARM64 Unix JIT and it's the same for Crossgen2 and ILC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant compiler->IsTargetAbi(CORINFO_NATIVEAOT_ABI) check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that to make things worse than checking a single [more local] boolean. If this is a concern we can try to manually duplicate the loop and move the check outside of the loop.

@xoofx
Copy link
Member

xoofx commented Nov 28, 2024

@filipnavara I'm investigating how much trouble it would be to do a fast stack walking on macOS arm64 for profiling purpose, and I have a question. The Writing ARM64 code for Apple platforms mandates that:

The frame pointer register (x29) must always address a valid frame record. Some functions — such as leaf functions or tail calls — may opt not to create an entry in this list. As a result, stack traces are always meaningful, even without debug information.

So, while trying to reconnect the dots with the various issues related to this PR, I'm unclear what kind of frames the JIT today generates for Apple ARM64? Is it following the fp recommendation above?

@filipnavara
Copy link
Member Author

So, while trying to reconnect the dots with the various issues related to this PR, I'm unclear what kind of frames the JIT today generates for Apple ARM64? Is it following the fp recommendation above?

TL;DR: Yes, it is following the recommendation AFAICT. It doesn't follow the relative stack layout of FP/LR in relation to locals and callee saved registers. Unless you need to do precise unwinding you should be fine.

The long description is https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/jit/arm64-jit-frame-layout.md (not all frame types are used, eg. frame type 9).

@JulieLeeMSFT
Copy link
Member

@EgorBo, and @kunalspathak, this PR is waiting for JIT sign off.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please merge with main? I want to run some jitstress tests because compJitSaveFpLrWithCalleeSavedRegisters was updated.

else
{
// Default configuration
codeGen->SetSaveFpLrWithAllCalleeSavedRegisters((getNeedsGSSecurityCookie() && compLocallocUsed) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be (getNeedsGSSecurityCookie() || compLocallocUsed)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just a moved code from here with unchanged condition. I think it was intentional.

    // Decide where to save FP and LR registers. We store FP/LR registers at the bottom of the frame if there is
    // a frame pointer used (so we get positive offsets from the frame pointer to access locals), but not if we
    // need a GS cookie AND localloc is used, since we need the GS cookie to protect the saved return value,
    // and also the saved frame pointer. See CodeGen::genPushCalleeSavedRegisters() for more details about the
    // frame types. Since saving FP/LR at high addresses is a relatively rare case, force using it during stress.
    // (It should be legal to use these frame types for every frame).

unwinding format.

For NativeAOT/ARM64/Apple API do the following:
- Save callee registers in opposite order and in pairs.
- Prefer saving FP/LR on the top of the frame. Heuristics are used to
  avoid worse code quality outside of prolog/epilog due to addressing
  range limits of the ARM64 instruction set.
- Added optimization to lvaFrameAddress to rewrite FP-x references to
  SP+y when possible. This allows efficient addressing using positive
  indexes when FP points to the top of the frame. It mimics similar
  optimization on ARM32.
@filipnavara
Copy link
Member Author

I re-run the tests locally and they no longer fail. Can you retrigger the test runs please?

@EgorBo
Copy link
Member

EgorBo commented Jan 10, 2025

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@kunalspathak
Copy link
Member

@filipnavara - the original failures no longer repros and some tests are failing which is most likely #83659, but can you please double check if jitstress is kicking in for it?

@filipnavara
Copy link
Member Author

...but can you please double check if jitstress is kicking in for it?

Yes, I put breakpoints on the code path in WinDBG and they do get hit during the stress test run:

image

@kunalspathak
Copy link
Member

So they tests are failing with the new jitstress switch?

@filipnavara
Copy link
Member Author

So they tests are failing with the new jitstress switch?

No, they are NOT failing, and they stress the new code path.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak kunalspathak merged commit 4536b96 into dotnet:main Jan 12, 2025
128 of 134 checks passed
grendello added a commit to grendello/runtime that referenced this pull request Jan 13, 2025
* main:
  JIT: Model GT_RETURN kills with contained operand (dotnet#111230)
  Update dependencies from https://github.com/dotnet/runtime-assets build 20250110.2 (dotnet#111290)
  [NativeAOT/ARM64] Generate frames compatible with Apple compact unwinding (dotnet#107766)
  Cleanup unused JIT stubs in vm (dotnet#111237)
  Ensure that Shuffle is marked as HW_Flag_CanBenefitFromConstantProp (dotnet#111303)
  Fix CMP0173 policy warning with cmake 3.31 (dotnet#110522)
  [RISC-V] Fix HostActivation.Tests unknown-rid (dotnet#110687)
  Fix accidentally duplicated global-build-step.yml in runtime-official.yml (dotnet#111302)
  JIT: run extra SPMI queries for arrays (dotnet#111293)
  Split the Runtime Shared Framework project and combine legs in the official build (dotnet#111136)
  Do not ignore `MemoryMarshal.TryWrite` result (dotnet#108661)
  Update dependencies from https://github.com/dotnet/emsdk build 20250109.1 (dotnet#111263)
  Clean up in Number.Formatting.cs (dotnet#110955)
@EgorBo
Copy link
Member

EgorBo commented Jan 14, 2025

@filipnavara Looks like this PR introduced GC hole like asserts in gcstress pipeline (linux-arm64), I've just confirmed it.

The simplest way to reproduce is to take BufferMemmove test and run with

set DOTNET_TieredCompilation=0
set DOTNET_GCStress=0xC
set DOTNET_JitStress=2
set DOTNET_ReadyToRun=0

(many times)

cc @dotnet/jit-contrib

EgorBo added a commit to EgorBo/runtime-1 that referenced this pull request Jan 14, 2025
EgorBo added a commit that referenced this pull request Jan 14, 2025
filipnavara added a commit to filipnavara/runtime that referenced this pull request Jan 15, 2025
…ding (dotnet#107766)

* JIT/ARM64: Add ability to generate frames compatible with Apple compact
unwinding format.

For NativeAOT/ARM64/Apple API do the following:
- Save callee registers in opposite order and in pairs.
- Prefer saving FP/LR on the top of the frame. Heuristics are used to
  avoid worse code quality outside of prolog/epilog due to addressing
  range limits of the ARM64 instruction set.
- Added optimization to lvaFrameAddress to rewrite FP-x references to
  SP+y when possible. This allows efficient addressing using positive
  indexes when FP points to the top of the frame. It mimics similar
  optimization on ARM32.

* ObjWriter: For Mach-O ARM64 try to convert the DWARF CFI unwinding codes
into compact unwinding code

* Disable lvaFrameAddress FP->SP optimization for OSR methods
kunalspathak added a commit that referenced this pull request Jan 22, 2025
…ding (#111451)

* [NativeAOT/ARM64] Generate frames compatible with Apple compact unwinding (#107766)

* JIT/ARM64: Add ability to generate frames compatible with Apple compact
unwinding format.

For NativeAOT/ARM64/Apple API do the following:
- Save callee registers in opposite order and in pairs.
- Prefer saving FP/LR on the top of the frame. Heuristics are used to
  avoid worse code quality outside of prolog/epilog due to addressing
  range limits of the ARM64 instruction set.
- Added optimization to lvaFrameAddress to rewrite FP-x references to
  SP+y when possible. This allows efficient addressing using positive
  indexes when FP points to the top of the frame. It mimics similar
  optimization on ARM32.

* ObjWriter: For Mach-O ARM64 try to convert the DWARF CFI unwinding codes
into compact unwinding code

* Disable lvaFrameAddress FP->SP optimization for OSR methods

* Fix variable offsets used in emitGCvarLiveUpd by suppressing the
FP-n => SP+m (for m = frameSize - n) optimization.

---------

Co-authored-by: Kunal Pathak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member os-ios Apple iOS os-mac-os-x macOS aka OSX os-tvos Apple tvOS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants