Consider a new flag COMPlus_EnableVEX #10808

tannergooding · 2018-07-31T21:02:45Z

Proposal

Introduce a new flag (COMPlus_EnableVEX) and make some minor changes to the existing flag (COMPlus_EnableAVX).

Rationale

Prior to the HWIntrinsics feature, the COMPlus_EnableAVX flag only controlled support for AVX/AVX2 (which it did by enabling/disabling the VEX-encoding path in the emitter).

However, now that HWIntrinsics exist, there are a number of additional ISAs that are impacted by this flag (BMI1, BMI2, FMA). Since this flag now controls more than its original scope, and that may cause additional confusion later on, we should instead split the functionality into separate logical flags before the HWIntrinsics feature ships.

Changes

Introduce COMPlus_EnableVEX, this would control whether or not the VEX-encoding is support by the emitter. It would impact all ISAs that use the VEX encoding.

Change COMPlus_EnableAVX to only control the AVX/AVX2 ISAs.

category:testing
theme:emitter
skill-level:beginner
cost:small
impact:small

The text was updated successfully, but these errors were encountered:

tannergooding · 2018-07-31T21:05:48Z

Additional Thoughts

Just COMPlus_EnableAVX may still be "insufficient".

I would imagine that its primary use-case is actually to control whether 128-bit or 256-bit instructions are used for Vector<T>. As such, It may be desirable to have the flag directly control 128-bit vs 256-bit support (rather than support for the AVX isa).

Some scenarios I am thinking of are:

User doesn't want to use 256-bit Vector, but they still want Sse.Add (or the 128-bit Vector<T>) to use the more efficient VEX-encoding.
If/when AVX-512 support comes into play, where users may want to force Vector<T> to be 256-bits, rather than be 512-bits or 128-bits.

tannergooding · 2018-07-31T21:05:57Z

CC. @CarolEidt, @eerhardt, @fiigii

CarolEidt · 2018-07-31T21:13:33Z

As you point out, COMPlus_EnableAVX currently conflates VEX encodings, and 256-bit Vector<T> along with the AVX instruction set. I think it might make sense to separate them out, though I think it would incur a lot of complication to change the size of Vector<T> while still allowing Vector256<T>. Something ti think about.

tannergooding · 2018-07-31T21:22:51Z

I think it would incur a lot of complication to change the size of Vector while still allowing Vector256

I agree, this could get complicated depending on how the Simd.cpp code is setup.

I think, a good place to be in (ignoring the cost to get there) would be:

A flag that controls VEX encoding support (works in release mode)
A flag that controls the size of Vector<T> (works in release mode)
A flag that controls support for the AVX ISA (works in debug mode only)

The first matters because of all the various ISAs that use VEX, and because it is a generally more efficient encoding.

The second matters because Vector<T> is dynamically sized, and we already provide this support today. We are only concerned about two-sizes right now, but we may have other sizes to worry about in the future (x86 has AVX-512 and ARM has up to 2048-bit simd register support). I also don't think we can have an option that only allows picking the biggest or smallest size.

The third, we need for debugging HWIntrinsics as we bring them online. It could be provided in release mode if we thought that was a good idea and we also provided one for each other HWIntrinsics ISA.

fiigii · 2018-07-31T21:50:24Z

I think it would incur a lot of complication to change the size of Vector while still allowing Vector256

Decoupling HW intrinsic from S.N.Vectors https://github.com/dotnet/coreclr/issues/15641 may be helpful to reduce this complication?

tannergooding · 2018-07-31T22:04:06Z

@fiigii, I believe so.

CarolEidt · 2018-07-31T22:20:33Z

Decoupling HW intrinsic from S.N.Vectors #9473 may be helpful to reduce this complication?

Hmmm, not sure about that. The complexity, if any, I think would arise from dealing with this in the VM, which is where the size of Vector<T> is determined.

Also, I don't think we can make COMPlus_EnableAVX debug only, since it has already been available in release mode for both desktop and coreclr. I would say that it would imply both the enablement of the ISA as well as the encodings.

tannergooding · 2018-07-31T22:50:27Z

The complexity, if any, I think would arise from dealing with this in the VM, which is where the size of Vector is determined.

I wouldn't expect that to be very complicated (the code controlling that appears trivial): See https://github.com/dotnet/coreclr/blob/master/src/vm/methodtablebuilder.cpp#L1201
and https://github.com/dotnet/coreclr/blob/master/src/jit/ee_il_dll.cpp#L374

I was more worried about the FEATURE_SIMD/EnableAVX coupling that may make parts of simd.cpp emit 256-bit instructions even if the size were set to 16 (since EnableAVX currently controls both).

Also, I don't think we can make COMPlus_EnableAVX debug only, since it has already been available in release mode for both desktop and coreclr.

Right, although I think there are other issues here with regards to its setup and controlling the Vector size, which may need to be addressed when more sizes become supported. When ARM gets larger Vector support, it would be really strange to control it via EnableAVX, for example. As would using EnableAVX to control whether Vector<T> is 128, 256, or 512 bit on x86 architectures.

My thought is that making a documented breaking change may be desirable, in the long run (which would hopefully be no worse than someone having to enable a "quirk").

fiigii · 2018-07-31T22:52:38Z

I don't think we can make COMPlus_EnableAVX debug only, since it has already been available in release mode for both desktop and coreclr.

It would be better if we also expose HW intrinsic knobs (COMPlus_EnableAVX2, COMPlus_EnableFMA, etc) to release mode (they are only available in debug mode now). That makes users easier to test their programs for different hardware.

CarolEidt · 2018-07-31T23:14:17Z

I think we need to make a strong case for either 1) making a breaking change, or 2) exposing more knobs in release mode.

tannergooding · 2018-07-31T23:34:41Z

For making a breaking change to COMPlus_EnableAVX, I think the basic summary for the case is:

The flag name is x86 specific, but controls cross-architecture functionality (x86, x64, Arm, Arm64, etc)
The flag name implies boolean values (128-bit and 256-bit Vector<T>), but may eventually need to support multiple values (such as, additionally, 512-bit, 1024-bit, and 2048-bit Vector<T>)
The flag is currently controlling multiple pieces of loosely related functionality (AVX/AVX2 ISAs, Vector<T> size, and VEX encoding support)
Other runtimes (such as Mono or CoreRT) are not required to implement/support the flag, so there are already cases where the flag "does nothing"

Splitting this into two release mode flags would solve the majority of these issues and reduce end-user confusion as support expands/grows:

COMPlus_EnableVEX
- x86/x64 specific
- Controls the VEX encoding support
COMPlus_VectorTSize
- Cross Platform
- Implies multiple value support
- Controls the size of Vector<T>

COMPlus_EnableAVX would likely continue to exist as a debug only flag that controls whether the AVX hardware intrinsics function (an alternative name could be used, if we wanted to basically "deprecate" the flag, but keep it around for a release or two with a warning/etc).

fiigii · 2018-07-31T23:43:40Z

COMPlus_VectorTSize
Cross Platform
Implies multiple value support
Controls the size of Vector

If we make these cross-platfom variables and decouple S.N.Vectors knobs from HW intrinsic ones. How do we do the managed implemented S.N.Vectors?

tannergooding · 2018-07-31T23:55:46Z

None of these are HWIntrinsic specific knobs, only the COMPlus_Enable<ISA> flags would be (and those are debug only).

COMPlus_EnableVEX would impact both, but would not be cross-platform. Having it be disabled would prevent dependent HWIntrinsic ISAs (AVX, AVX2, FMA, BMI1, BMI2) from being used. For Vector<T> it would additionally prevent 256-bit or 512-bit support on x86/x64.

I'm not convinced this needs to be a release mode flag, however. The only real benefit, that I can think of, is allowing the user to test the non-VEX perf, but this (realistically) seems more like a CoreCLR testing flag, than anything else. I mostly specified that it should be release flag because it's something the user can reasonably control today via COMPlus_EnableAVX

COMPlus_VectorTSize would only impact S.N.Vectors and would be cross-platform. It, outside any other flags overriding it, would just control the size of Vector<T>, dependent on the actual hardware support available.

CarolEidt · 2018-08-01T00:01:15Z

The flag name is x86 specific, but controls cross-architecture functionality (x86, x64, Arm, Arm64, etc)

I think it is fine to make COMPlus_EnableAVX affect the size of Vector<T> only for x86/x64, as Arm doesn't yet have support for it, and Arm64 is arguably new enough that it's unlikely to be a high impact breaking change (and we won't have customers using it because they were working around early Vector<T> bugs, which were only on x64).

The flag name implies boolean values (128-bit and 256-bit Vector), but may eventually need to support multiple values (such as, additionally, 512-bit, 1024-bit, and 2048-bit Vector)

I don't think that's necessarily true. We can still have additional options controlling the maximum size of Vector<T>, but I would assert that setting COMPlus_EnableAVX=0 would implicitly reduce the size of Vector<T> to 128, since AVX is required to support 256. That's compatible and logical.

I think your proposal is quite reasonable, but I don't see that it in any way depends upon changing or eliminating COMPlus_EnableAVX. It simply makes sense for it to imply VEX encoding and a default vector size of 256. You could still have COMPlus_VectorTSize, but I would assert that it would only ever reduce the vector size from what's supported by the target (or specified) ISA support (the program would have to query the actual vector length to ensure they're getting what they asked for).

tannergooding · 2018-08-01T00:18:46Z

@CarolEidt, could you clarify on a couple of points?

It sounds like, at the high level, you suggest we keep COMPlus_EnableAVX and:

It should only impact x86/x64
A value of 0 indicates Vector<T> is 128-bits
A value of 1 or unspecified indicates Vector<T> is 256-bits
- Or were you suggesting 1/unspecified indicates Vector<T> is "max-supported" size with VectorTSize (or equivalent) controlling the "in-between" sizes (with VectorTSize being cross-plat)?
How would the respective values impact the AVX HWIntrinsic ISA?
- That is, does it have no impact or disable the AVX ISA (and any ISA inheriting from AVX)?
- If no impact, were you thinking that EnableVEX would fill that spot, and would it be debug only or release as well
How would the respective values impact the other VEX-encodings?
- This is, does it have no impact or disable all VEX-encodings, including those that don't inherit from AVX (like BMI1/BMI2)

CarolEidt · 2018-08-01T00:28:03Z

It should only impact x86/x64
A value of 0 indicates Vector is 128-bits

Yes to the above

A value of 1 or unspecified indicates Vector is 256-bits

I would say that a value of 1 indicates Vector<T> is whatever the target supports (or the COMPlus_VectorTSize option specifies). I think it's entirely reasonable to assume that the use of this option is primarily for disabling 256-bit vectors, either for testing or working around the old OS bug.

Or were you suggesting 1/unspecified indicates Vector is "max-supported" size with VectorTSize (or equivalent) controlling the "in-between" sizes (with VectorTSize being cross-plat)?

Yes, that's right.

How would the respective values impact the AVX HWIntrinsic ISA?
That is, does it have no impact or disable the AVX ISA (and any ISA inheriting from AVX)?

The latter (disable anything depending on AVX).

How would the respective values impact the other VEX-encodings?
This is, does it have no impact or disable all VEX-encodings, including those that don't inherit from AVX (like BMI1/BMI2)

It should only impact the AVX instruction set, I would think. Again, while this might be considered a "breaking change", since the hw intrinsics are new there's really nothing to break.

voinokin · 2018-08-01T09:39:07Z

COMPlus_VectorTSize

Would that be possible to also expose this flag per application? Eg. some item in app.config or alike, rather than system-wide with env. variables. I believe having this flag tunable for specific application may provide better support experience.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Oct 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider a new flag COMPlus_EnableVEX #10808

Consider a new flag COMPlus_EnableVEX #10808

tannergooding commented Jul 31, 2018 •

edited by BruceForstall

Loading

tannergooding commented Jul 31, 2018 •

edited

Loading

tannergooding commented Jul 31, 2018

CarolEidt commented Jul 31, 2018 •

edited by BruceForstall

Loading

tannergooding commented Jul 31, 2018 •

edited

Loading

fiigii commented Jul 31, 2018 •

edited

Loading

tannergooding commented Jul 31, 2018

CarolEidt commented Jul 31, 2018

tannergooding commented Jul 31, 2018

fiigii commented Jul 31, 2018

CarolEidt commented Jul 31, 2018

tannergooding commented Jul 31, 2018

fiigii commented Jul 31, 2018

tannergooding commented Jul 31, 2018 •

edited

Loading

CarolEidt commented Aug 1, 2018

tannergooding commented Aug 1, 2018

CarolEidt commented Aug 1, 2018

voinokin commented Aug 1, 2018 •

edited

Loading

Consider a new flag COMPlus_EnableVEX #10808

Consider a new flag COMPlus_EnableVEX #10808

Comments

tannergooding commented Jul 31, 2018 • edited by BruceForstall Loading

Proposal

Rationale

Changes

tannergooding commented Jul 31, 2018 • edited Loading

Additional Thoughts

tannergooding commented Jul 31, 2018

CarolEidt commented Jul 31, 2018 • edited by BruceForstall Loading

tannergooding commented Jul 31, 2018 • edited Loading

fiigii commented Jul 31, 2018 • edited Loading

tannergooding commented Jul 31, 2018

CarolEidt commented Jul 31, 2018

tannergooding commented Jul 31, 2018

fiigii commented Jul 31, 2018

CarolEidt commented Jul 31, 2018

tannergooding commented Jul 31, 2018

fiigii commented Jul 31, 2018

tannergooding commented Jul 31, 2018 • edited Loading

CarolEidt commented Aug 1, 2018

tannergooding commented Aug 1, 2018

CarolEidt commented Aug 1, 2018

voinokin commented Aug 1, 2018 • edited Loading

tannergooding commented Jul 31, 2018 •

edited by BruceForstall

Loading

tannergooding commented Jul 31, 2018 •

edited

Loading

CarolEidt commented Jul 31, 2018 •

edited by BruceForstall

Loading

tannergooding commented Jul 31, 2018 •

edited

Loading

fiigii commented Jul 31, 2018 •

edited

Loading

tannergooding commented Jul 31, 2018 •

edited

Loading

voinokin commented Aug 1, 2018 •

edited

Loading