Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup #11701

tannergooding · 2018-12-21T17:41:45Z

We should finish fleshing out the design around the various CLR Configuration Knobs that allow users to control the various System.Numerics.Vectors and System.Runtime.Intrinsics support.

Prior to HWIntrinsics

This section discusses the state of the world in netcoreapp1.0 through netcoreapp2.0. Desktop had similar, but slightly different behavior that I will attempt to call out where relevant.

`COMPlus_EnableSSE3_4` and `COMPlus_EnableAVX`

The former (COMPlus_EnableSSE3_4) appears to be .NETCore only but defaults to 1 and is used in combination with a CPUID check that the VM does. It controls whether the S.N.Vectors intrinsics generate SSE3 through SSE4.2 instructions. The flag is completely ignored if COMPlus_EnableAVX and the corresponding VM check are 1.

The latter (COMPlus_EnableAVX) is available for both Desktop and .NETCore, it defaults to 1 ans is used in combination with a CPUID check that the VM does. It controls:

Whether the S.N.Vectors intrinsics generate AVX/AVX2 instructions
The size of Vector<T> (setting it to zero, forces Vector<T> to be sized to 16)
Whether the compiler emits VEX encoded instructions

`COMPlus_FeatureSIMD`

This flag is available for both Desktop and .NETCore, it defaults to 1 and controls various bits of code related to Vector<T>, the S.N.Vectors compiler support, and the TYP_SIMD* support.

Setting this to 0 causes Vector<T> to be sized at 16, none of the S.N.Vectors code to be treated as intrinsic, and prevents the various types from being resolved as TYP_SIMD* (which also generally prevents these types from appearing).

Current State

This section discusses the current state of the world for netcoreapp3.0.

`COMPlus_Enable<ISA>`

We have a number of Enable<ISA> flags, including: the pre-existing SSE3_4 and AVX flags as well as the SSE, SSE2, SSE3, SSSE3, SSE41, SSE42, AVX2, AES, PCLMULQDQ, POPCNT, FMA, LZCNT, BMI1, and BMI2` flags. All of these are used in combination with a corresponding CPUID check that the VM does.

These flags impact the compiler support for a given ISA and any ISAs that are "descendants" of that ISA (e.g. SSE=0 would also disable SSE2 which would disable any ISAs dependent on SSE2, etc). The flags are currently used primarily for the HWIntrinsics feature as that is the only thing that will cause the various instructions to be generated. In the future, these flags might be applicable more generally depending on other optimizations the JIT could consider. An exception to this is SSE and SSE2 which are considered "baseline support" by RyuJIT. These ISAs only impact the corresponding HWIntrinsic ISAs and do not actually impact compiler support for generating these instructions.

The pre-existing SSE3_4 flag is now treated as equivalent to SSE3. It impacts the SSE3 ISA and any child ISAs (including AVX). It otherwise functions identically and continues impacting the codgen support for S.N.Vectors.

The pre-existing AVX flag continues impacting the size of Vector<T> and whether the compiler emits VEX encoded instructions. However, for the size of Vector<T> it now does so indirectly (in that disabling AVX also disables AVX2), as the check was shifted onto AVX2.

`COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`

The COMPlus_FeatureSIMD flag continues functioning as it did before.

The COMPlus_EnableHWIntrinsic flag controls whether the System.Runtime.Intrinsic methods are treated as intrinsic, and therefore, whether they throw a PNSE or generate actual code when the given ISA is supported by the CPU/Compiler. There is currently a bug that setting EnableHWIntrinsic=0 will also disable compiler support for all the various ISAs listed above. This also means that it currently impacts the size of Vector<T> and whether the compiler will emit VEX encoded instructions.

Proposal for Cleanup

In this section, I will attempt to describe where we want to be with the various flags.

New Flag: `EnableVEX`

Currently we control the VEX support for the compiler by checking the EnableAVX flag (and corresponding CPUID check done by the VM). However, there are two ISAs that require the VEX encoding but not for AVX to also be enabled, these are BMI1 and BMI2. While we should never encounter a CPU that has BMI1/BMI2 but that does not also support AVX, AVX requires an additional check that the OS supports saving/restoring the 256-bit YMM registers. This support is not guaranteed and, at least on Windows, can be toggled by the user. Due to this, we need the check to be updated so that the BMI1/BMI2 ISAs (and any future ISAs with similar requirements) can still use the VEX encoding. Additionally, the VEX encoding is generally more efficient (it removes the RMW requirement from most of the instructions and supports unaligned memory addresses) and it may be desirable to still emit the VEX encoded instructions for SSE through SSE42 when the user has set EnableAVX=0.

The proposal is then to expose a new COMPlus_EnableVEX flag that is used to control the VEX encoding. Setting it to 0 would disable any ISA that requires the VEX encoding (AVX, AVX2, FMA, BMI1, and BMI2, as well as any future ISAs). Its default value (1) would allow the compiler to emit the VEX encoding for SSE through SSE42 when the CPU/OS support AVX but when the user has set EnableAVX=0. It would also allow other ISAs not in the AVX hierarchy (BMI1 and BMI2) to be emitted even when the OS does not support the saving/restoring the 256-bit YMM registers.

An alternative would be to not expose a new flag and instead just update the emitter to know that it can use the VEX encoding for the BMI1/BMI2 ISAs. The only difference from the above would be that SSE through SSE42 would not use the VEX encoding when AVX=0 (and when the OS supports saving/restoring the 256-bit YMM registers). This might be a more accurate state since the VEX encoded forms of the SSE through SSE42 instructions are technically part of the AVX instruction set.

New Flag: `VectorTSize`

Currently we control the sizeof Vector<T> by defaulting it to 16 and changing it to 32 if AVX2 is supported. However, this is not very extensible (what do we do when/if AVX-512 becomes supported and the size can be 64) and it is very much tied to x86 (you wouldn't want this to impact ARM if we add SVE support). It also means that if you need a smaller Vector<T>, you must also disable the general compiler support for the AVX2 ISA (at a minimum). This also impacts the HWIntrinsics feature.

The proposal is then to expose a new COMPlus_VectorTSize flag that is used to control the sizeof Vector<T>. The value would default to 0 which would mean to follow the normal logic we have today (size to 16 by default and change to 32 if AVX2 is supported). We would then come up with an additional scheme such that other values allow the user to explicitly size Vector<T> (to a supported size).

My current thinking is that any unsupported value is treated as 0 (default). Otherwise, the supported values are the exact sizes (16 or 32, in the future 64 if AVX-512 becomes supported, etc). Another option would be that the value is treated as the nearest size that is less than the given size. As an example, if the user gives 31, it would be sized 16. If the user gave 64 and we only support 32 and 16, it would be 32. If the user gave 100 and we support 128, 64, 32, and 16; they would get 64.

The flag would continue being used in conjunction with the Enable<ISA> checks for a given platform, as you can't size Vector<T> to 32 if AVX is not supported (for example).

`COMPlus_Enable<ISA>`

These flags are currently in a fairly good state, some considerations might be:

Should we be exposing SSE and SSE2 or should they be folded back into the EnableHWIntrinsic flag (given that they are considered "baseline" for CoreCLR).
Can we get rid of SSE3_4, since this is now covered by the individual SSE3, SSSE3, SSE41, and SSE42 flags and since it is treated as equivalent to SSE3 (which will also disable the child ISAs).

`COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`

COMPlus_FeatureSIMD should have its scope reduced so that it only impacts the S.N.Vectors codegen. The TYP_SIMD* support should be split out into its own feature that FEATURE_SIMD and FEATURE_HW_INTRINSICS can sit ontop of.

COMPlus_EnableHWIntrinsic should be fixed so that it only impacts the S.R.Intrinsics codegen. It should have no impact on the various ISAs the compiler lists as supported.

category:implementation
theme:vector-codegen
skill-level:intermediate
cost:medium
impact:small

The text was updated successfully, but these errors were encountered:

tannergooding · 2018-12-21T17:43:14Z

CC. @CarolEidt, @fiigii

I tried to capture the previous state of the world, the current state of the world, and what the various discussions we've had look to point to for the future state of the world in relation to the various CLR configuration knobs we have around FEATURE_SIMD and FEATURE_HW_INTRINSICS.

Feel free to let me know if there is anything here that you feel was captured incorrectly or could be clarified.

tannergooding · 2018-12-21T17:47:53Z

I think we should also determine what of this falls into the 3.0 basket and what falls into post 3.0. I believe the only thing that we should put a hard requirement on fixing is the state of the EnableHWIntrinsic flag (it currently impacts all ISAs the compiler lists as supported).

tannergooding · 2018-12-21T17:49:47Z

This is related to https://github.com/dotnet/coreclr/issues/19221, which was the initial proposal around EnableVEX; but captures more and gives more context on the various states we've exposed.

BruceForstall · 2019-03-13T23:53:51Z

@tannergooding @CarolEidt @fiigii Do you expect changes to be made here for 3.0?

fiigii · 2019-03-14T00:02:18Z

IMO, the current knobs are enough for 3.0. And some new features mentioned in this issue (like VectorTSize) may depend on the work (1) decoupling hardware intrinsic from FEATURE_SIMD and (2) implementing S.N.Vector<T> operations in hardware intrinsic. So, I suggest putting the issue to "future".

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

tannergooding mentioned this issue Apr 29, 2020

COMPlus_EnableHWIntrinsic=0 no longer disables SSE+ #35605

Closed

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

danmoseley mentioned this issue Oct 6, 2021

Need infrastructure to run tests in varying hardware intrinsic modes #950

Open

kunalspathak mentioned this issue Mar 9, 2022

Arm64: Always use SIMD features #66411

Merged

TIHan removed the JitUntriaged CLR JIT issues needing additional triage label Oct 31, 2022

tannergooding mentioned this issue Nov 7, 2024

Move GenTreeVecCon and GenTreeMskCon under the respective FEATURE_* defines #104932

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup #11701

Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup #11701

tannergooding commented Dec 21, 2018 •

edited by BruceForstall

Loading

tannergooding commented Dec 21, 2018

tannergooding commented Dec 21, 2018

tannergooding commented Dec 21, 2018

BruceForstall commented Mar 13, 2019

fiigii commented Mar 14, 2019

Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup #11701

Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup #11701

Comments

tannergooding commented Dec 21, 2018 • edited by BruceForstall Loading

Prior to HWIntrinsics

COMPlus_EnableSSE3_4 and COMPlus_EnableAVX

COMPlus_FeatureSIMD

Current State

COMPlus_Enable<ISA>

COMPlus_FeatureSIMD and COMPlus_EnableHWIntrinsic

Proposal for Cleanup

New Flag: EnableVEX

New Flag: VectorTSize

COMPlus_Enable<ISA>

COMPlus_FeatureSIMD and COMPlus_EnableHWIntrinsic

tannergooding commented Dec 21, 2018

tannergooding commented Dec 21, 2018

tannergooding commented Dec 21, 2018

BruceForstall commented Mar 13, 2019

fiigii commented Mar 14, 2019

tannergooding commented Dec 21, 2018 •

edited by BruceForstall

Loading

`COMPlus_EnableSSE3_4` and `COMPlus_EnableAVX`

`COMPlus_FeatureSIMD`

`COMPlus_Enable<ISA>`

`COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`

New Flag: `EnableVEX`

New Flag: `VectorTSize`

`COMPlus_Enable<ISA>`

`COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`