Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel HW Intrinsics classes should use inheritance #25953

Closed
CarolEidt opened this issue Apr 20, 2018 · 19 comments
Closed

Intel HW Intrinsics classes should use inheritance #25953

CarolEidt opened this issue Apr 20, 2018 · 19 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation arch-x86 area-System.Runtime.Intrinsics design-discussion Ongoing discussion about design without consensus
Milestone

Comments

@CarolEidt
Copy link
Contributor

A good portion of X86 ISAs all follow a hierarchy of support. If SSE2 is supported, then implicitly SSE is supported. Similarly, if AVX2 is supported, then AVX is supported as well.

We should design our HW Intrinsics classes in a similar fashion. This allows for simplicity in coding calling functions from 2 (or more) instruction sets:

if (SSE3.IsSupported)
{
    var v1 = SSE3.SetAllVector128(0.5f);   // comes from SSE
    var v2 = SSE3.HorizontalAdd(v1, v1);   // comes from SSE3
}

This means we should have our classes inherit from each other to model this inheritance in the ISAs:

public class Sse
{ ...
}
public class Sse2 : Sse
{ ...
}
public class Sse3 : Sse2
{ ...
}

The IsSupported hierarchy checks are doc'd in https://software.intel.com/en-us/articles/intel-sdm, for example:

12.4.4 Programming SSE3 with SSE/SSE2 Extensions
SIMD instructions in SSE3 extensions are intended to complement the use of SSE/SSE2 in programming SIMD applications. Application software that intends to use SSE3 instructions should also check for the availability of SSE/SSE2 instructions.

The hierarchy is also checked in the .NET runtime: https://github.com/dotnet/coreclr/blob/6bf04a47badd74646e21e70f4e9267c71b7bfd08/src/vm/codeman.cpp#L1307.

There is a bit of an open question: What should AVX inherit from? Should it inherit from SSE/SSE2/SSE4.1/SSE4.2? The Intel docs don't seem to indicate any SSE is necessary to support AVX, but the .NET runtime will only support AVX if SSE2 is present.

NOTE Taking this proposal would mean we will need to drop static class from all our classes, since static class cannot form an inheritance hierarchy.

For the original problem of SetZeroVector128, we have the following options:

  1. Explode the method out, but since there is no argument, we would have to append the type names to each method: SetZeroVector128Float SetZeroVector128Double, SetZeroVector128Int32, etc.
  2. Leave the API as it is, but add support Sse2.SetZeroVector128<float>(), which would match the behavior of Sse.SetZeroVector128().
  3. Since it is a helper, and not a real intrinsic, move the API to another class. Other helper methods could follow suit: like SetAllVector128, SetVector128, etc.
    1. Potentially Vector128<T>.Zero;, new Vector128<T>();, or default.
  4. Exposing a generic SetZeroVector128 in Sse class, and generate different code for underlying hardware support
    1. SSE: emitxorps for all base type that provides correct semantics but maybe with a bit data-bypass penalty on integer base types.
    2. SSE2: emit xorpd, xorps, or pxor for different base types.

Original proposal:

This was discussed during the API design (dotnet/corefx#23489 (comment)) but has come under discussion again in the context of dotnet/coreclr#17691.

A related issue is, if the APIs remain separate, how to deal with an API, e.g. SetZeroVector128 that provides support for float vectors in SSE, but supports the full range of generic types in SSE2.

@CarolEidt
Copy link
Contributor Author

@tannergooding
Copy link
Member

Copying my response from the mentioned thread...

The original reasons for splitting this, as I recall, were:

  • It makes the API consistent with the underlying ISA
  • It sets a precedent that we should mirror the underlying ISA regardless of what a given runtime supports
  • The API is part of CoreFX, not part of CoreCLR. CoreCLR only provides an implementation of the API.
    • Another runtime could, theoretically, decide to only support SSE instructions
    • Even for CoreCLR, the x86 JIT only started requiring SSE/SSE2 fairly recently

There were several discussions around how to make these APIs the "most usable"/"friendly". In the end, it was originally decided that some things (like software fallbacks) could be provided separately. I believe a shared wrapper API could be provided in the same way (or even as part of the same software fallback library).

I do think that merging the APIs may make it "easier to use", but so would merging "AVX" and "AVX2". However, we can't merge "AVX" and "AVX2" because there are still plenty of CPUs and operating systems that only support one of them. Additionally, users can trivially "merge" these APIs themselves with a using static Sse; using static Sse2;

@fiigii
Copy link
Contributor

fiigii commented Apr 20, 2018

I prefer to merge SSE and SSE2 into one class.

From the hardware product perspective, all the 64-bit x86 processors support SSE2 as the SIMD ISA bottom-line. I do not think any existing MSIL platform will consistently support the hardware released before 2001 (right?)

From the user experience perspective, merging SSE and SSE2 makes a lot of 128-bit APIs generic and more convenient to use. using static Sse; using static Sse2; just solve parts of the problem.

One of my concern of merging SSE and SSE2 is that finding a good name of the class may be hard, Sse, Sse2, or SseSse2?

@fiigii
Copy link
Contributor

fiigii commented Apr 20, 2018

SetZeroVector128 that provides support for float vectors in SSE, but supports the full range of generic types in SSE2.

Actually, I do not think this is a problem. The codegen of helper intrinsic does not need to be constrained to ISAs. Sse2.SetZeroVector128<float> can generate any code to provide a "zero vector". Similarly, Sse2.SetAllVector128 also can be generic to support the full range of generic types in SSE2.

Merging SSE and SSE2 can avoid this kind of "duplicated" APIs, such as Sse2.SetZeroVector128<T> and Sse.SetZeroVector128.

@tannergooding
Copy link
Member

It may be worth noting that other languages/runtimes that are also bringing up HWIntrinsic support are also keeping the ISAs separated.

For example:

@4creators
Copy link
Contributor

other languages/runtimes that are also bringing up HWIntrinsic support are also keeping the ISAs separated

It was main argument behind design of HW APIs. We have done split based on hardware support detection to give fine grained control over implementation.

all the 64-bit x86 processors support SSE2 as the SIMD ISA bottom-line

I do not think it is true as after some additional data checks I have verified that latest Celeron N, Pentium N and Atom C series processors like:

https://ark.intel.com/products/95592/Intel-Pentium-Processor-N4200-2M-Cache-up-to-2_5-GHz or
https://ark.intel.com/products/97935/Intel-Atom-Processor-C3308-4M-Cache-up-to-2_10-GHz

do not support any type of SIMD instructions except for AES NI.

@fiigii
Copy link
Contributor

fiigii commented Apr 20, 2018

I do not think it is true as after some additional data checks I have verified that latest Celeron N, Pentium N and Atom C series processors like:
do not support any type of SIMD instructions except for AES NI.

No, I am pretty sure that Goldmont architecture has SSE -SSE4.2 instructions.

@4creators
Copy link
Contributor

No, I am pretty sure that Goldmont architecture has SSE -SSE4.2 instructions.

Can you point us into any docs confirming SIMD support in Atom.

Another problem is with Celeron N and Pentium N which are also known for lack of SIMD support.

@fiigii
Copy link
Contributor

fiigii commented Apr 20, 2018

Please look for the docs of micro-architecture instead of processor product. For example, the newest ATOM/Celeron/Pentium should base on the architecture code-named Goldmont or Goldmont Plus.

@gfoidl
Copy link
Member

gfoidl commented Apr 21, 2018

One of my concern of merging SSE and SSE2 is that finding a good name of the class may be hard, Sse, Sse2, or SseSse2?

SseX?

For usage it is way more convenient to have these merged to one class.

@ghost
Copy link

ghost commented Apr 21, 2018

SseX?

No thanks 😀

To keep the both worlds happy, can we have class SseX in addition to those for separate ISAs?

@fiigii
Copy link
Contributor

fiigii commented May 1, 2018

Made a PR dotnet/corefx#29447 to show the effect of merging SSE and SSE2 classes, which may help the discussion here.

@eerhardt eerhardt changed the title Considering merging Intel HW Intrinsics for SSE and SSE2 Intel HW Intrinsics classes should use inheritance Jun 13, 2018
@eerhardt
Copy link
Member

@CarolEidt @tannergooding @terrajobst - I've updated the original comment to follow the API proposal process, and to capture the proposal to use inheritance in the X86 classes.

All - Please take a look at the proposal and provide feedback.

@tannergooding
Copy link
Member

I like the proposal and think it works well. It allows easy access to the various methods from a given class while still keeping the IsSupported checks logical/etc. It also becomes easy to discover the relationship via source control (i.e. Is using Sse okay if Sse3.IsSupported returns true).

Props to @terrajobst, who I believe was the original proposer.

@tannergooding
Copy link
Member

For the SetZeroVector128 issue. I am in favor of 1 (rename and explode the APIs) or 3 (move them to a shared class).

3 may be slightly better as it is also applicable to ARM (cc. @sdmaclea) and may simplify some things (such as StaticCast implementation, where the plan is to just fold it away in the importer).

@fiigii
Copy link
Contributor

fiigii commented Jun 13, 2018

For the SetZeroVector128 issue, I prefer 2, but there is 4th solution.

  • Exposing a generic SetZeroVector128<T> in Sse class, and generate different code for underlying hardware support
    1. SSE: emitxorps for all base type that provides correct semantics but maybe with a bit data-bypass penalty on integer base types.
    2. SSE2: emit xorpd, xorps, or pxor for different base types.

Note: we (and C++ compilers) already use the similar approach for optimizing some other helpers (e.g., Avx.SetAllVector256).

@mikedn
Copy link
Contributor

mikedn commented Jun 13, 2018

For SetZero there's also a variant of the 3rd option - just rely on new Vector128<T>() and default to do the right thing. Requiring a method call to produce a 0 value is counterintuitive in provides no advantages.

@eerhardt
Copy link
Member

Thanks @fiigii and @mikedn, I've added those options to the list.

@tannergooding
Copy link
Member

Closing this as resolved, since we are now using inheritance to expose the intrinsics.

Any other issues tracked here should be moved to their own new issue.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation arch-x86 area-System.Runtime.Intrinsics design-discussion Ongoing discussion about design without consensus
Projects
None yet
Development

No branches or pull requests

8 participants