Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider a new flag COMPlus_EnableVEX #10808

Open
tannergooding opened this issue Jul 31, 2018 · 17 comments
Open

Consider a new flag COMPlus_EnableVEX #10808

tannergooding opened this issue Jul 31, 2018 · 17 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors test-enhancement Improvements of test source code
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Jul 31, 2018

Proposal

Introduce a new flag (COMPlus_EnableVEX) and make some minor changes to the existing flag (COMPlus_EnableAVX).

Rationale

Prior to the HWIntrinsics feature, the COMPlus_EnableAVX flag only controlled support for AVX/AVX2 (which it did by enabling/disabling the VEX-encoding path in the emitter).

However, now that HWIntrinsics exist, there are a number of additional ISAs that are impacted by this flag (BMI1, BMI2, FMA). Since this flag now controls more than its original scope, and that may cause additional confusion later on, we should instead split the functionality into separate logical flags before the HWIntrinsics feature ships.

Changes

Introduce COMPlus_EnableVEX, this would control whether or not the VEX-encoding is support by the emitter. It would impact all ISAs that use the VEX encoding.

Change COMPlus_EnableAVX to only control the AVX/AVX2 ISAs.

category:testing
theme:emitter
skill-level:beginner
cost:small
impact:small

@tannergooding
Copy link
Member Author

tannergooding commented Jul 31, 2018

Additional Thoughts

Just COMPlus_EnableAVX may still be "insufficient".

I would imagine that its primary use-case is actually to control whether 128-bit or 256-bit instructions are used for Vector<T>. As such, It may be desirable to have the flag directly control 128-bit vs 256-bit support (rather than support for the AVX isa).

Some scenarios I am thinking of are:

  • User doesn't want to use 256-bit Vector, but they still want Sse.Add (or the 128-bit Vector<T>) to use the more efficient VEX-encoding.
  • If/when AVX-512 support comes into play, where users may want to force Vector<T> to be 256-bits, rather than be 512-bits or 128-bits.

@tannergooding
Copy link
Member Author

CC. @CarolEidt, @eerhardt, @fiigii

@CarolEidt
Copy link
Contributor

CarolEidt commented Jul 31, 2018

As you point out, COMPlus_EnableAVX currently conflates VEX encodings, and 256-bit Vector<T> along with the AVX instruction set. I think it might make sense to separate them out, though I think it would incur a lot of complication to change the size of Vector<T> while still allowing Vector256<T>. Something ti think about.

@tannergooding
Copy link
Member Author

tannergooding commented Jul 31, 2018

I think it would incur a lot of complication to change the size of Vector while still allowing Vector256

I agree, this could get complicated depending on how the Simd.cpp code is setup.

I think, a good place to be in (ignoring the cost to get there) would be:

  • A flag that controls VEX encoding support (works in release mode)
  • A flag that controls the size of Vector<T> (works in release mode)
  • A flag that controls support for the AVX ISA (works in debug mode only)

The first matters because of all the various ISAs that use VEX, and because it is a generally more efficient encoding.

The second matters because Vector<T> is dynamically sized, and we already provide this support today. We are only concerned about two-sizes right now, but we may have other sizes to worry about in the future (x86 has AVX-512 and ARM has up to 2048-bit simd register support). I also don't think we can have an option that only allows picking the biggest or smallest size.

The third, we need for debugging HWIntrinsics as we bring them online. It could be provided in release mode if we thought that was a good idea and we also provided one for each other HWIntrinsics ISA.

@fiigii
Copy link
Contributor

fiigii commented Jul 31, 2018

I think it would incur a lot of complication to change the size of Vector while still allowing Vector256

Decoupling HW intrinsic from S.N.Vectors https://github.com/dotnet/coreclr/issues/15641 may be helpful to reduce this complication?

@tannergooding
Copy link
Member Author

@fiigii, I believe so.

@CarolEidt
Copy link
Contributor

Decoupling HW intrinsic from S.N.Vectors #9473 may be helpful to reduce this complication?

Hmmm, not sure about that. The complexity, if any, I think would arise from dealing with this in the VM, which is where the size of Vector<T> is determined.

Also, I don't think we can make COMPlus_EnableAVX debug only, since it has already been available in release mode for both desktop and coreclr. I would say that it would imply both the enablement of the ISA as well as the encodings.

@tannergooding
Copy link
Member Author

The complexity, if any, I think would arise from dealing with this in the VM, which is where the size of Vector is determined.

I wouldn't expect that to be very complicated (the code controlling that appears trivial): See https://github.com/dotnet/coreclr/blob/master/src/vm/methodtablebuilder.cpp#L1201
and https://github.com/dotnet/coreclr/blob/master/src/jit/ee_il_dll.cpp#L374

I was more worried about the FEATURE_SIMD/EnableAVX coupling that may make parts of simd.cpp emit 256-bit instructions even if the size were set to 16 (since EnableAVX currently controls both).

Also, I don't think we can make COMPlus_EnableAVX debug only, since it has already been available in release mode for both desktop and coreclr.

Right, although I think there are other issues here with regards to its setup and controlling the Vector size, which may need to be addressed when more sizes become supported. When ARM gets larger Vector support, it would be really strange to control it via EnableAVX, for example. As would using EnableAVX to control whether Vector<T> is 128, 256, or 512 bit on x86 architectures.

  • My thought is that making a documented breaking change may be desirable, in the long run (which would hopefully be no worse than someone having to enable a "quirk").

@fiigii
Copy link
Contributor

fiigii commented Jul 31, 2018

I don't think we can make COMPlus_EnableAVX debug only, since it has already been available in release mode for both desktop and coreclr.

It would be better if we also expose HW intrinsic knobs (COMPlus_EnableAVX2, COMPlus_EnableFMA, etc) to release mode (they are only available in debug mode now). That makes users easier to test their programs for different hardware.

@CarolEidt
Copy link
Contributor

I think we need to make a strong case for either 1) making a breaking change, or 2) exposing more knobs in release mode.

@tannergooding
Copy link
Member Author

For making a breaking change to COMPlus_EnableAVX, I think the basic summary for the case is:

  1. The flag name is x86 specific, but controls cross-architecture functionality (x86, x64, Arm, Arm64, etc)
  2. The flag name implies boolean values (128-bit and 256-bit Vector<T>), but may eventually need to support multiple values (such as, additionally, 512-bit, 1024-bit, and 2048-bit Vector<T>)
  3. The flag is currently controlling multiple pieces of loosely related functionality (AVX/AVX2 ISAs, Vector<T> size, and VEX encoding support)
  4. Other runtimes (such as Mono or CoreRT) are not required to implement/support the flag, so there are already cases where the flag "does nothing"

Splitting this into two release mode flags would solve the majority of these issues and reduce end-user confusion as support expands/grows:

  • COMPlus_EnableVEX
    • x86/x64 specific
    • Controls the VEX encoding support
  • COMPlus_VectorTSize
    • Cross Platform
    • Implies multiple value support
    • Controls the size of Vector<T>

COMPlus_EnableAVX would likely continue to exist as a debug only flag that controls whether the AVX hardware intrinsics function (an alternative name could be used, if we wanted to basically "deprecate" the flag, but keep it around for a release or two with a warning/etc).

@fiigii
Copy link
Contributor

fiigii commented Jul 31, 2018

COMPlus_VectorTSize
Cross Platform
Implies multiple value support
Controls the size of Vector

If we make these cross-platfom variables and decouple S.N.Vectors knobs from HW intrinsic ones. How do we do the managed implemented S.N.Vectors?

@tannergooding
Copy link
Member Author

tannergooding commented Jul 31, 2018

None of these are HWIntrinsic specific knobs, only the COMPlus_Enable<ISA> flags would be (and those are debug only).

COMPlus_EnableVEX would impact both, but would not be cross-platform. Having it be disabled would prevent dependent HWIntrinsic ISAs (AVX, AVX2, FMA, BMI1, BMI2) from being used. For Vector<T> it would additionally prevent 256-bit or 512-bit support on x86/x64.

  • I'm not convinced this needs to be a release mode flag, however. The only real benefit, that I can think of, is allowing the user to test the non-VEX perf, but this (realistically) seems more like a CoreCLR testing flag, than anything else. I mostly specified that it should be release flag because it's something the user can reasonably control today via COMPlus_EnableAVX

COMPlus_VectorTSize would only impact S.N.Vectors and would be cross-platform. It, outside any other flags overriding it, would just control the size of Vector<T>, dependent on the actual hardware support available.

@CarolEidt
Copy link
Contributor

The flag name is x86 specific, but controls cross-architecture functionality (x86, x64, Arm, Arm64, etc)

I think it is fine to make COMPlus_EnableAVX affect the size of Vector<T> only for x86/x64, as Arm doesn't yet have support for it, and Arm64 is arguably new enough that it's unlikely to be a high impact breaking change (and we won't have customers using it because they were working around early Vector<T> bugs, which were only on x64).

The flag name implies boolean values (128-bit and 256-bit Vector), but may eventually need to support multiple values (such as, additionally, 512-bit, 1024-bit, and 2048-bit Vector)

I don't think that's necessarily true. We can still have additional options controlling the maximum size of Vector<T>, but I would assert that setting COMPlus_EnableAVX=0 would implicitly reduce the size of Vector<T> to 128, since AVX is required to support 256. That's compatible and logical.

I think your proposal is quite reasonable, but I don't see that it in any way depends upon changing or eliminating COMPlus_EnableAVX. It simply makes sense for it to imply VEX encoding and a default vector size of 256. You could still have COMPlus_VectorTSize, but I would assert that it would only ever reduce the vector size from what's supported by the target (or specified) ISA support (the program would have to query the actual vector length to ensure they're getting what they asked for).

@tannergooding
Copy link
Member Author

@CarolEidt, could you clarify on a couple of points?

It sounds like, at the high level, you suggest we keep COMPlus_EnableAVX and:

  • It should only impact x86/x64
  • A value of 0 indicates Vector<T> is 128-bits
  • A value of 1 or unspecified indicates Vector<T> is 256-bits
    • Or were you suggesting 1/unspecified indicates Vector<T> is "max-supported" size with VectorTSize (or equivalent) controlling the "in-between" sizes (with VectorTSize being cross-plat)?
  • How would the respective values impact the AVX HWIntrinsic ISA?
    • That is, does it have no impact or disable the AVX ISA (and any ISA inheriting from AVX)?
    • If no impact, were you thinking that EnableVEX would fill that spot, and would it be debug only or release as well
  • How would the respective values impact the other VEX-encodings?
    • This is, does it have no impact or disable all VEX-encodings, including those that don't inherit from AVX (like BMI1/BMI2)

@CarolEidt
Copy link
Contributor

It should only impact x86/x64
A value of 0 indicates Vector is 128-bits

Yes to the above

A value of 1 or unspecified indicates Vector is 256-bits

I would say that a value of 1 indicates Vector<T> is whatever the target supports (or the COMPlus_VectorTSize option specifies). I think it's entirely reasonable to assume that the use of this option is primarily for disabling 256-bit vectors, either for testing or working around the old OS bug.

Or were you suggesting 1/unspecified indicates Vector is "max-supported" size with VectorTSize (or equivalent) controlling the "in-between" sizes (with VectorTSize being cross-plat)?

Yes, that's right.

How would the respective values impact the AVX HWIntrinsic ISA?
That is, does it have no impact or disable the AVX ISA (and any ISA inheriting from AVX)?

The latter (disable anything depending on AVX).

How would the respective values impact the other VEX-encodings?
This is, does it have no impact or disable all VEX-encodings, including those that don't inherit from AVX (like BMI1/BMI2)

It should only impact the AVX instruction set, I would think. Again, while this might be considered a "breaking change", since the hw intrinsics are new there's really nothing to break.

@voinokin
Copy link

voinokin commented Aug 1, 2018

COMPlus_VectorTSize

Would that be possible to also expose this flag per application? Eg. some item in app.config or alike, rather than system-wide with env. variables. I believe having this flag tunable for specific application may provide better support experience.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors test-enhancement Improvements of test source code
Projects
None yet
Development

No branches or pull requests

6 participants