Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Allow pregenerating most HW intrinsics in CoreLib #24917

Merged
merged 8 commits into from
Jun 13, 2019

Conversation

MichalStrehovsky
Copy link
Member

@MichalStrehovsky MichalStrehovsky commented Jun 3, 2019

This is a follow up to #24689 that lets us pregenerate most hardware intrinsics in CoreLib.

We ensures the potentially unsupported code will never be reachable at runtime on CPUs that don't support it by not reporting the IsSupported property as intrinsic in crossgen. This ensures the support checks are always JITted. JITting the support checks is very cheap.

There is cost in the form of an extra call and failure to do constant propagation of the return value, but the cost is negligible in practice and gets eliminated once the tiered JIT tiers the method up.

We only do this in CoreLib because user code could technically not guard intrinsic use in IsSupported checks and pregenerating the code could lead to illegal instruction traps at runtime (instead of PlatformNotSupportedException throws) - it's a bad user experience.

@MichalStrehovsky
Copy link
Member Author

There's a slight regression when tiering is disabled since before the change, the methods that use the intrinsics would always be JITted.

The difference with tiered compilation enabled seems to be just noise.

Default TieredCompilation=0
After 311 ms 320 ms
Before 309 ms 315 ms

Test program:

using System;
using System.Diagnostics;
using System.Text.Unicode;

class Program
{
    static void Main()
    {
        string source = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis a auctor sem. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Integer non sagittis sem, ut feugiat eros. Praesent commodo ligula et magna efficitur, nec tempus ex elementum. Sed condimentum hendrerit massa ut rhoncus.";
        byte[] destination = new byte[1024];

        Utf8.FromUtf16(source, destination, out int _, out int _);

        for (int x = 0; x < 10; x++)
        {
            Stopwatch sw = Stopwatch.StartNew();
            for (int i = 0; i < 10000000; i++)
            {
                Utf8.FromUtf16(source, destination, out int _, out int _);
            }
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}

@MichalStrehovsky
Copy link
Member Author

Cc @sergiy-k

This is a follow up to dotnet#24689 that lets us pregenerate all hardware intrinsics in CoreLib.

We ensures the potentially unsupported code will never be reachable at runtime on CPUs that don't support it by not reporting the `IsSupported` property as intrinsic in crossgen. This ensures the support checks are always JITted. JITting the support checks is very cheap.

There is cost in the form of an extra call and failure to do constant propagation of the return value, but the cost is negligible in practice and gets eliminated once the tiered JIT tiers the method up.

We only do this in CoreLib because user code could technically not guard intrinsic use in `IsSupported` checks and pregenerating the code could lead to illegal instruction traps at runtime (instead of `PlatformNotSupportedException` throws) - it's a bad user experience.
if (m_pEECompileInfo->GetAssemblyModule(m_hAssembly) == m_pEECompileInfo->GetLoaderModuleForMscorlib())
{
m_pOpt->m_compilerFlags.Set(CORJIT_FLAGS::CORJIT_FLAG_FEATURE_SIMD);

#if defined(_TARGET_X86_) || defined(_TARGET_AMD64_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially cause problems. The JIT currently decides to emit the VEX or non-VEX encoding based on whether or not AVX is supported.

This isn't going to correctly handle that difference for an SSE exclusive vs an AVX checked code path, which will prevent the generated code from running on some downstream processors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the "do not crossgen" behavior for AVX/FMA/AVX2, would that fix the issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the safest thing to do for now would be to effectively treat the AOT codegen as if only SSE and SSE2 had been reported as supported by the CPUID checks the VM does and to treat everything else as "not supported" (we should likely also consider whether or not crossgen ignores the COMPlus_EnableISA=0 flags).

That should mitigate the startup time regressions and throughput should get fairly quickly resolved by rejitting the hot methods.

Anything else is going to have larger potential for a bug tail or may impact perf in interesting ways (hurting benchmark numbers, for example) and likely needs more investigating before we settle on a solution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confess that this hadn't occurred to me. I'd like to see us have the ability to include runtime-checked dual-version code, but I'm pretty sure the JIT can't generate both SSE and VEX encodings in the same method.
I'm not sure why we couldn't also handle the other Sse* intrinsics, with the assumption that such code will (as expected) be guarded with the appropriate IsSupported runtime call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we couldn't also handle the other Sse* intrinsics, with the assumption that such code will (as expected) be guarded with the appropriate IsSupported runtime call.

I agree.

Long term, we should be able to support a static AOT target or one of a couple forms of dynamic dispatch. The two most common, that I am aware of, are cached CPUID checks (which result in a branch per ISA target) and changing the method target (which involves fixing up which target the method points to).

But, those are post 3.0 goals and likely require some fixups to support generating both VEX and non-VEX code paths, for example.

#if defined(_TARGET_X86_) || defined(_TARGET_AMD64_)
if (attribs & CORINFO_FLG_JIT_INTRINSIC)
{
// Do not report the get_IsSupported method as an intrinsic. This will turn the call into a regular
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely going to improve startup but hurt throughput for some of the core functions. That will likely be addressed by rejitting fairly quickly, but its still something to keep in mind.

Long term, if this type of dynamic dispatch is supported, it should likely be changed so that the startup logic caches the CPUID checks and these resolve to be inlined checks of those cached checks (which is how the C Runtime library handles this, for example).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative might be to just have these be treated as intrinsic for the purposes of AOT, except for when it is a recursive call.

That is, when called from another method, the intrinsic would be jitted normally. However, the intrinsic calls themselves would be marked with "requires jitting" so that reflection and other scenarios continue working as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a specific scenario in mind? I can add a benchmark for it (see the benchmark above where I picked one of the affected places). The extra call to a method that returns a constant 1 or 0 didn't hurt perf much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely dependent on how the code was setup and where the ISA check exists. If you have a helper method that was providing a software fallback path (such as BitOperations.LeadingZeroCount) and you are calling it in a loop, you are going to have an additional call and branch per loop iteration now which can significantly hurt throughput.

If you just have a centralized check that handles a large block of logic, then the cost of the call is going to be amortized (as was the case for the benchmark above).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, BitOperations.LeadingZeroCount in a loop is probably the most pathological case.

I used this to benchmark it:

class Program
{
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    static void Main()
    {
        int sum = 0;

        for (int x = 0; x < 10; x++)
        {
            Stopwatch sw = Stopwatch.StartNew();
            for (int i = 0; i < 1000000000; i++)
            {
                sum += BitOperations.LeadingZeroCount((uint)i);
            }
            Console.WriteLine(sw.ElapsedMilliseconds);
        }

        sum.ToString();
    }
}

I'm not able to measure a slowdown.

  Default TieredCompilation=0
After 511 381
Before 510 381

Maybe my CPU is just really good at optimizing that extra call and branch away.

I'm a bit surprised disabling tiering makes this so much faster, given I decorated the method with AggressiveOptimization. Cc @kouvel on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to repro this initially on a recent SDK, but eventually I was able to repro by removing the AggressiveOptimization flag and running with COMPlus_ReadyToRun=0, even though that doesn't make sense. On my machine at the moment anyway, the numbers are similar to yours but opposite, where it's faster with tiered compilation enabled. The code generated is identical, there doesn't seem to be any alignment issues in the small hot loop, and the method is slightly differently aligned. Looks like in the slow version, there is a decoding stall somewhere, haven't looked into the details.

AggressiveOptimization would also prevent prejitting the method such that it gets jitted at runtime with full opts as quickly as possible.

if (m_pEECompileInfo->GetAssemblyModule(m_hAssembly) == m_pEECompileInfo->GetLoaderModuleForMscorlib())
{
m_pOpt->m_compilerFlags.Set(CORJIT_FLAGS::CORJIT_FLAG_FEATURE_SIMD);

#if defined(_TARGET_X86_) || defined(_TARGET_AMD64_)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confess that this hadn't occurred to me. I'd like to see us have the ability to include runtime-checked dual-version code, but I'm pretty sure the JIT can't generate both SSE and VEX encodings in the same method.
I'm not sure why we couldn't also handle the other Sse* intrinsics, with the assumption that such code will (as expected) be guarded with the appropriate IsSupported runtime call.

@MichalStrehovsky
Copy link
Member Author

In the second iteration, I restored the old behavior for Avx, Avx2 and FMA intrinsics (we'll not pass the flag to RyuJIT and we'll prevent these from loading). Methods that use these don't show up in the PowerShell startup path anyway.

@MichalStrehovsky MichalStrehovsky changed the title Allow pregenerating all HW intrinsics in CoreLib Allow pregenerating most HW intrinsics in CoreLib Jun 4, 2019
@sergiy-k
Copy link

sergiy-k commented Jun 8, 2019

What else is required before this change can be merged?

@MichalStrehovsky
Copy link
Member Author

What else is required before this change can be merged?

I think I addressed all code review feedback at this point.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me except for one nit about a comment.

m_pOpt->m_compilerFlags.Set(CORJIT_FLAGS::CORJIT_FLAG_USE_POPCNT);
// Leaving out CORJIT_FLAGS::CORJIT_FLAG_USE_AVX, CORJIT_FLAGS::CORJIT_FLAG_USE_FMA
// CORJIT_FLAGS::CORJIT_FLAG_USE_AVX2 on purpose - these enable JIT to use VEX encoding
// and that's not safe.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this comment to say "these require VEX encodings and the JIT doesn't support generating code for methods with mixed encodings."

@MichalStrehovsky
Copy link
Member Author

I ended up writing down the expected behavior after this pull request (still have to push the commits out that do this - what's in the branch now doesn't match). This also captures what I did in the previous pull request:

  • For everything but S.P.CoreLib:
    • CORJIT_FLAG_FEATURE_SIMD, CORJIT_FLAG_USE_* (for all intrinsic types) is not specified to RyuJIT.
    • Any attempt to access hardware intrinsics in the System.Runtime.Intrinsics.Arm.Arm64 (on ARM64) or System.Runtime.Intrinsics.X86 (on XArch) will throw at crossgen time, leading to no precompilation for the method that has the reference
    • Any attempt to access the generic Vector64/128/256 type will throw at crossgen time as well
  • For S.P.CoreLib:
    • CORJIT_FLAG_FEATURE_SIMD, CORJIT_FLAG_USE_* (for everything but AVX/FMA/AVX2/BMI1/BMI2) is specified to RyuJIT.
    • Calls to get_IsSupported on all intrinsic classes in the Intrinsics.Arm.Arm64, Intrinsics.X86, and Intrinsics namespaces, except for Sse/Sse.X64/Sse2/Sse2.X64/AVX/FMA/AVX2/BMI1/BMI2 are quirked so that RyuJIT doesn’t see them as intrinsics and generates a normal/non-inlined method call instead.
    • Calls to intrinsics (except for get_IsSupported) on AVX/FMA/AVX2/BMI1/BMI2 get quirked so that RyuJIT doesn’t see them as intrinsics and generates a normal/non-inlined method call instead
    • None of the method bodies for intrinsics in the Intrinsics.Arm.Arm64, Intrinsics.X86, and Intrinsics namespaces are pregenerated

@tannergooding
Copy link
Member

tannergooding commented Jun 12, 2019

I ended up writing down the expected behavior

That reads correct to me.

Any attempt to access the generic Vector64/128/256 type will throw at crossgen time as well

You might clarify the above to also include Vector<T>, which can't be used due to its variable size

Calls to get_IsSupported on all intrinsic classes in the Intrinsics.Arm.Arm64, Intrinsics.X86, and Intrinsics namespaces, except for ...

As per #24917 (comment), the ARM IsSupported checks are intrinsics on x86 and should be treated as always returning false (this will never change even upon rejit/etc). Likewise for the x86 IsSupported checks on ARM.

Calls to intrinsics (except for get_IsSupported) on AVX/FMA/AVX2/BMI1/BMI2 get quirked so that RyuJIT doesn’t see them as intrinsics and generates a normal/non-inlined method call instead

Just to confirm. For the ARM intrinsics on x86, they are not intrinsic, so they are treated as regular calls. Correct? (same goes for x86 intrinsics on ARM).

// If it's IsSupported on Sse/Sse2, we can expand unconditionally since this is reliably
bool fIsX86intrinsic = strcmp(namespaceName, "System.Runtime.Intrinsics.X86") == 0;

// If it's anything related to Sse/Sse2, we can expand unconditionally since this is reliably
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would probably be better to say: "since this is a baseline requirement for CoreCLR".

return attribs;

// Treat as a regular method call (into a JITted method).
return (attribs & ~CORINFO_FLG_JIT_INTRINSIC) | CORINFO_FLG_DONT_INLINE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be correct? The calls are recursive and the mustExpand support is dependent on CORINFO_FLG_JIT_INTRINSIC being set.

So, if it is recursive and CORINFO_FLG_JIT_INTRINSIC isn't set, it will be AOT'd as if it was a regular recursive call...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we want is the caller to treat these as regular, non-inline method calls and for the JIT to not AOT the method bodies of the intrinsics themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what CORINFO_FLG_DONT_INLINE will fix up?

The recursive call within the actual intrinsic method body will never go through here because we don't AOT compile them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a compiler check for "is compiling for R2R/crossgen/AOT", right?

It might be simpler to just update the logic in Compiler::impHWIntrinsic method (see https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsicxarch.cpp#L61-L70) to do the following

    bool isIsaSupported = comp->compSupports(isa) && comp->compSupportsHWIntrinsic(isa);

    if (strcmp(methodName, "get_IsSupported") == 0)
    {
        return isIsaSupported ? NI_IsSupported_True : NI_IsSupported_False;
    }
    else if (!isIsaSupported)
    {
+       if (isCompilingForAOT)
+       {
+           return nullptr;
+       }
        return NI_Throw_PlatformNotSupportedException;
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that would allow us to treat the AVX/AVX2/FMA/BMI1/BMI2 code paths as normal (you don't need to mark them as DONT_INLINE or strip the JIT_INTRINSIC flag).

The existing logic for not expanding method bodies would kick-in here: 6b5b161#diff-04a590b55693db432b3a24d0d9a59cceR451

And the added check would ensure that not supported ISAs (which these won't be due to us not setting the compiler flags in the zapper here: 6b5b161#diff-efefdb418a1b8be4f78c7c9aedf91cb4L1212) will return nullptr (forcing them to be emitted as regular method calls).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding is correct here - the intrinsics are recognized during importing, prior to the call being considered for inlining.

I think it makes some sense to unconditionally fail to expand (i.e. return nullptr) for these when pre-jitting (aka AOT compiling). I don't think the JIT can distinguish when it's compiling SPC.dll vs. something else, but I don't think it makes a difference in practice, since any intrinsics that are called under the check for IsSupported will fail to pre-jit anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so. Intrinsics are special and are handled before the inlining logic.

But this line strips the intrinsic bit from the method. It should be treated as a simple method call, no?

That's what we want for BMI and friends - keep IsSupported untouched and reported as an intrinsic so that it expands to false, but treat everything else as a regular non-inlined method call - the fact that the IL is recursive shouldn't matter because RyuJIT is not going to look at it due to the no inline flag, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fact that the IL is recursive shouldn't matter because RyuJIT is not going to look at it due to the no inline flag, no?

The problem ends up coming in when we go to AOT those methods. You stripped the JIT_INTRINSIC flag so the logic for "skip generating hardware intrinsic method bodies" won't kick in. Which means we will AOT the BMI and friends method bodies as regular recursive calls.

Copy link
Member Author

@MichalStrehovsky MichalStrehovsky Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You stripped the JIT_INTRINSIC flag so the logic for "skip generating hardware intrinsic method bodies" won't kick in

Attributes in that spot are retrieved from the standard JitInterface - we don't call the getMethodAttribs that messes with the intrinsic bit there.

Precompiling method bodies on e.g. the Bmi1 class would be wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. That definitely wasn't clear to me. A comment calling it out might be helpful.

// call. We also make sure none of the hardware intrinsic method bodies get pregenerated in crossgen
// (see ZapInfo::CompileMethod) but get JITted instead. The JITted method will have the correct
// answer for the CPU the code is running on.
if (fIsGetIsSupportedMethod &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per #24917 (comment), I think the ARM IsSupported checks should be constant on x86 as should the x86 IsSupported checks on ARM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what this code is doing?

This logic is going to strip the intrinsic flag - so we're making sure we only strip it for x86 intrinsics on x86 and arm64 intrinsics on arm64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I misread the combination of the comment plus the if checks. 👍

strcmp(namespaceName, "System.Runtime.Intrinsics") == 0)
{
// Treat as a regular method call (into a JITted method).
return (attribs & ~CORINFO_FLG_JIT_INTRINSIC) | CORINFO_FLG_DONT_INLINE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as I called out above. Intrinsics are special and are handled in the importer before inlining checks and the recursive calls are dependent on mustExpand (which itself depends on JIT_INTRINSIC flag).

It might be better to just fixup the compiler::impIntrinsic method to have a if (isCompilingForAOT) path for the IsSupported checks, which returns nullptr rather than a gtNewIconNode (see https://github.com/dotnet/coreclr/blob/master/src/jit/importer.cpp#L3486-L3494, as we handle the IsSupported and PlatformNotSupportedException intrinsics a bit earlier).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me, though note that the JIT doesn't know whether it's SPC.dll or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that actually a concern now that we have a mechanism to handle functions which consume AVX/AVX2/FMA/BMI1/BMI2 methods, so things work even if they are called without an IsSupported check?

I believe the "worst case" scenario is someone doesn't use the check and calls, for example Sse41.DotProduct on a processor that doesn't support it. We will have emitted the correct instruction encoding and that will trigger a hardware exception as the instruction isn't supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the chance for subtle bugs, if users don't provide software fallbacks, but I think it might be better to just have that be upfront in the initial release. If we ever want to make this more generally available, users will always have that scenario.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - I don't think it is a concern that we don't specially handle SPC.dll here.

I believe the "worst case" scenario is someone doesn't use the check and calls, for example Sse41.DotProduct on a processor that doesn't support it.

But the code won't work in any event. I guess if they were actually relying on getting PNSE it might. I suppose we could trap the hardware exception and throw PNSE, but I'm not sure that complication is justified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think it would be better to encourage people to have the appropriate IsSupported checks 😄

@MichalStrehovsky
Copy link
Member Author

Updated comments per the conversations above. I think all the open issues have been addressed at this point.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

As I called out here, I think that all of the initial concerns around enabling this for non S.P.Corelib binaries have actually been addressed now.

So, I think it may be worth considering either:
A. Allowing this for everything by default (we can always flip it back off if we get feedback)
B. Provide a switch allowing users to explicitly opt-into AOT'ing their methods which contain HWIntrinsics (you could also provide an opt-out switch if option A was chosen)

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for your patience @MichalStrehovsky !

@CarolEidt
Copy link

So, I think it may be worth considering either:
A. Allowing this for everything by default (we can always flip it back off if we get feedback)
B. Provide a switch allowing users to explicitly opt-into AOT'ing their methods which contain HWIntrinsics (you could also provide an opt-out switch if option A was chosen)

I would suggest that initially we offer this as an opt-in switch. And we could possibly change that in future depending on usage & feedback.

@tannergooding
Copy link
Member

I would suggest that initially we offer this as an opt-in switch. And we could possibly change that in future depending on usage & feedback.

That sounds good to me, if other people agree (and could be done as a secondary PR). That would also allow CoreFX to use this as well, which would likely improve perf in a couple of other places.

@sergiy-k sergiy-k merged commit e73c8e6 into dotnet:master Jun 13, 2019
@MichalStrehovsky MichalStrehovsky deleted the pregenHw branch June 14, 2019 07:57
@MichalStrehovsky
Copy link
Member Author

I would suggest that initially we offer this as an opt-in switch. And we could possibly change that in future depending on usage & feedback.

That sounds good to me, if other people agree (and could be done as a secondary PR). That would also allow CoreFX to use this as well, which would likely improve perf in a couple of other places.

I think we would still not be able to lift the restriction around loading Vector64/128/256 type inside the crossgen process for things outside S.P.CoreLib, which somewhat limits the usefulness of the switch (even for CoreFX) - same problem also applies to switches requested in #20034.

I think we block those types in crossgen because we expect an ABI breaking change around them in the future and we don't want to invalidate older R2R images.

@CarolEidt
Copy link

we expect an ABI breaking change around them in the future and we don't want to invalidate older R2R images.

Yes, very good point, though I hadn't realized that we supported retaining R2R images across versions.

Perhaps it's something to keep in mind for the next release, after we've added support for the vector aspects of the ABIs.

@tannergooding
Copy link
Member

we don't want to invalidate older R2R images.

I don't think we are going to be able to keep this contract across every major version (especially when roll forward across major versions isn't supported).

I think it would be fine, given that it is opt-in and we would likely explicitly document around that switch that it may be invalidated by the next release.

I think given that the release next year (and earliest we would ship the vector ABI changes) is .NET 5 where we are likely going to have some non-trivial changes anyways, I don't think there will be any negative fallout from this.

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Allow pregenerating all HW intrinsics in CoreLib

This is a follow up to dotnet/coreclr#24689 that lets us pregenerate all hardware intrinsics in CoreLib.

We ensures the potentially unsupported code will never be reachable at runtime on CPUs that don't support it by not reporting the `IsSupported` property as intrinsic in crossgen. This ensures the support checks are always JITted. JITting the support checks is very cheap.

There is cost in the form of an extra call and failure to do constant propagation of the return value, but the cost is negligible in practice and gets eliminated once the tiered JIT tiers the method up.

We only do this in CoreLib because user code could technically not guard intrinsic use in `IsSupported` checks and pregenerating the code could lead to illegal instruction traps at runtime (instead of `PlatformNotSupportedException` throws) - it's a bad user experience.



Commit migrated from dotnet/coreclr@e73c8e6
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants