-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup #11701
Comments
CC. @CarolEidt, @fiigii I tried to capture the previous state of the world, the current state of the world, and what the various discussions we've had look to point to for the future state of the world in relation to the various CLR configuration knobs we have around Feel free to let me know if there is anything here that you feel was captured incorrectly or could be clarified. |
I think we should also determine what of this falls into the 3.0 basket and what falls into post 3.0. I believe the only thing that we should put a hard requirement on fixing is the state of the |
This is related to https://github.com/dotnet/coreclr/issues/19221, which was the initial proposal around |
@tannergooding @CarolEidt @fiigii Do you expect changes to be made here for 3.0? |
IMO, the current knobs are enough for 3.0. And some new features mentioned in this issue (like |
We should finish fleshing out the design around the various
CLR Configuration Knobs
that allow users to control the variousSystem.Numerics.Vectors
andSystem.Runtime.Intrinsics
support.Prior to HWIntrinsics
This section discusses the state of the world in netcoreapp1.0 through netcoreapp2.0. Desktop had similar, but slightly different behavior that I will attempt to call out where relevant.
COMPlus_EnableSSE3_4
andCOMPlus_EnableAVX
The former (
COMPlus_EnableSSE3_4
) appears to be .NETCore only but defaults to 1 and is used in combination with a CPUID check that the VM does. It controls whether theS.N.Vectors
intrinsics generate SSE3 through SSE4.2 instructions. The flag is completely ignored ifCOMPlus_EnableAVX
and the corresponding VM check are1
.The latter (
COMPlus_EnableAVX
) is available for both Desktop and .NETCore, it defaults to 1 ans is used in combination with a CPUID check that the VM does. It controls:S.N.Vectors
intrinsics generate AVX/AVX2 instructionsVector<T>
(setting it to zero, forcesVector<T>
to be sized to 16)VEX
encoded instructionsCOMPlus_FeatureSIMD
This flag is available for both Desktop and .NETCore, it defaults to 1 and controls various bits of code related to
Vector<T>
, theS.N.Vectors
compiler support, and theTYP_SIMD*
support.Setting this to 0 causes
Vector<T>
to be sized at 16, none of theS.N.Vectors
code to be treated as intrinsic, and prevents the various types from being resolved asTYP_SIMD*
(which also generally prevents these types from appearing).Current State
This section discusses the current state of the world for netcoreapp3.0.
COMPlus_Enable<ISA>
We have a number of
Enable<ISA>
flags, including: the pre-existingSSE3_4
andAVX
flags as well as theSSE
,SSE2
,SSE3
, SSSE3,
SSE41,SSE42
,AVX2
,AES
,PCLMULQDQ
,POPCNT
, FMA,
LZCNT,
BMI1, and
BMI2` flags. All of these are used in combination with a corresponding CPUID check that the VM does.These flags impact the compiler support for a given ISA and any ISAs that are "descendants" of that ISA (e.g.
SSE=0
would also disableSSE2
which would disable any ISAs dependent onSSE2
, etc). The flags are currently used primarily for theHWIntrinsics
feature as that is the only thing that will cause the various instructions to be generated. In the future, these flags might be applicable more generally depending on other optimizations the JIT could consider. An exception to this isSSE
andSSE2
which are considered "baseline support" by RyuJIT. These ISAs only impact the correspondingHWIntrinsic
ISAs and do not actually impact compiler support for generating these instructions.The pre-existing
SSE3_4
flag is now treated as equivalent toSSE3
. It impacts theSSE3
ISA and any child ISAs (includingAVX
). It otherwise functions identically and continues impacting the codgen support forS.N.Vectors
.The pre-existing
AVX
flag continues impacting the size ofVector<T>
and whether the compiler emitsVEX
encoded instructions. However, for the size ofVector<T>
it now does so indirectly (in that disablingAVX
also disablesAVX2
), as the check was shifted ontoAVX2
.COMPlus_FeatureSIMD
andCOMPlus_EnableHWIntrinsic
The
COMPlus_FeatureSIMD
flag continues functioning as it did before.The
COMPlus_EnableHWIntrinsic
flag controls whether theSystem.Runtime.Intrinsic
methods are treated as intrinsic, and therefore, whether they throw aPNSE
or generate actual code when the given ISA is supported by the CPU/Compiler. There is currently a bug that settingEnableHWIntrinsic=0
will also disable compiler support for all the various ISAs listed above. This also means that it currently impacts the size ofVector<T>
and whether the compiler will emit VEX encoded instructions.Proposal for Cleanup
In this section, I will attempt to describe where we want to be with the various flags.
New Flag:
EnableVEX
Currently we control the
VEX
support for the compiler by checking theEnableAVX
flag (and corresponding CPUID check done by the VM). However, there are two ISAs that require the VEX encoding but not for AVX to also be enabled, these areBMI1
andBMI2
. While we should never encounter a CPU that hasBMI1
/BMI2
but that does not also supportAVX
,AVX
requires an additional check that the OS supports saving/restoring the 256-bit YMM registers. This support is not guaranteed and, at least on Windows, can be toggled by the user. Due to this, we need the check to be updated so that theBMI1
/BMI2
ISAs (and any future ISAs with similar requirements) can still use the VEX encoding. Additionally, theVEX
encoding is generally more efficient (it removes the RMW requirement from most of the instructions and supports unaligned memory addresses) and it may be desirable to still emit the VEX encoded instructions forSSE
throughSSE42
when the user has setEnableAVX=0
.The proposal is then to expose a new
COMPlus_EnableVEX
flag that is used to control the VEX encoding. Setting it to0
would disable any ISA that requires the VEX encoding (AVX
,AVX2
,FMA
,BMI1
, andBMI2
, as well as any future ISAs). Its default value (1
) would allow the compiler to emit the VEX encoding forSSE
throughSSE42
when the CPU/OS support AVX but when the user has setEnableAVX=0
. It would also allow other ISAs not in the AVX hierarchy (BMI1
andBMI2
) to be emitted even when the OS does not support the saving/restoring the 256-bit YMM registers.An alternative would be to not expose a new flag and instead just update the emitter to know that it can use the
VEX
encoding for theBMI1
/BMI2
ISAs. The only difference from the above would be thatSSE
throughSSE42
would not use the VEX encoding whenAVX=0
(and when the OS supports saving/restoring the 256-bit YMM registers). This might be a more accurate state since theVEX
encoded forms of theSSE
throughSSE42
instructions are technically part of the AVX instruction set.New Flag:
VectorTSize
Currently we control the sizeof
Vector<T>
by defaulting it to16
and changing it to32
ifAVX2
is supported. However, this is not very extensible (what do we do when/ifAVX-512
becomes supported and the size can be64
) and it is very much tied to x86 (you wouldn't want this to impact ARM if we addSVE
support). It also means that if you need a smallerVector<T>
, you must also disable the general compiler support for theAVX2
ISA (at a minimum). This also impacts the HWIntrinsics feature.The proposal is then to expose a new
COMPlus_VectorTSize
flag that is used to control the sizeofVector<T>
. The value would default to0
which would mean to follow the normal logic we have today (size to16
by default and change to32
ifAVX2
is supported). We would then come up with an additional scheme such that other values allow the user to explicitly sizeVector<T>
(to a supported size).My current thinking is that any unsupported value is treated as
0
(default). Otherwise, the supported values are the exact sizes (16 or 32, in the future 64 ifAVX-512
becomes supported, etc). Another option would be that the value is treated as the nearest size that is less than the given size. As an example, if the user gives31
, it would be sized16
. If the user gave64
and we only support32
and16
, it would be32
. If the user gave100
and we support128
,64
,32
, and16
; they would get64
.The flag would continue being used in conjunction with the
Enable<ISA>
checks for a given platform, as you can't sizeVector<T>
to32
ifAVX
is not supported (for example).COMPlus_Enable<ISA>
These flags are currently in a fairly good state, some considerations might be:
SSE
andSSE2
or should they be folded back into theEnableHWIntrinsic
flag (given that they are considered "baseline" for CoreCLR).SSE3_4
, since this is now covered by the individualSSE3
,SSSE3
,SSE41
, andSSE42
flags and since it is treated as equivalent toSSE3
(which will also disable the child ISAs).COMPlus_FeatureSIMD
andCOMPlus_EnableHWIntrinsic
COMPlus_FeatureSIMD
should have its scope reduced so that it only impacts theS.N.Vectors
codegen. TheTYP_SIMD*
support should be split out into its own feature thatFEATURE_SIMD
andFEATURE_HW_INTRINSICS
can sit ontop of.COMPlus_EnableHWIntrinsic
should be fixed so that it only impacts theS.R.Intrinsics
codegen. It should have no impact on the various ISAs the compiler lists as supported.category:implementation
theme:vector-codegen
skill-level:intermediate
cost:medium
impact:small
The text was updated successfully, but these errors were encountered: