Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fix SSE2 SetZeroVector128 on float type #17691

Closed
wants to merge 1 commit into from
Closed

Conversation

fiigii
Copy link

@fiigii fiigii commented Apr 20, 2018

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

cc @4creators

@4creators
Copy link

Actually the proposed change is wrong as SSE2 ISA does not support float overloads. SetZeroVector128 float overload is defined in SSE ISA. See:

public static Vector128<float> SetZeroVector128() => SetZeroVector128();

@mikedn
Copy link

mikedn commented Apr 20, 2018

I wonder in what case the distinction between SSE and SSE2 is relevant these days...

@4creators
Copy link

4creators commented Apr 20, 2018

I wonder in what case the distinction between SSE and SSE2 is relevant these days...

Distinction exists simply due to Intel decision to split implementation of SIMD instructions for 16 byte vectors into float and non-float numeric types including support detection - they could have merged those ISAs a long time ago ....

@mikedn
Copy link

mikedn commented Apr 20, 2018

Distinction exists simply due to Intel decision to split implementation of SIMD instructions for 16 byte vectors into float and non-float numeric types including support detection - they could have merged those ISAs a long time ago ....

No, the distinction between .NET's implementation of SSE/SSE2 is solely due to the lack of common sense of the design.

@4creators
Copy link

4creators commented Apr 20, 2018

@mikedn

I belive that expressing your personal frustration the way you do in this and many other threads is in direct violation of .NET Foundation Code of Conduct

If you have constructive arguments why don't you raise them in corefx repo by logging an API issue and proposing some changes you think may help in creating better Intel Hardware Intrinsic API surface.

@mikedn
Copy link

mikedn commented Apr 20, 2018

I belive that expressing your personal frustration the way you do in this and many other threads is in direct violation of .NET Foundation Code of Conduct

I'm simply stating the facts.

@RussKeldorph RussKeldorph requested a review from CarolEidt April 20, 2018 16:25
@RussKeldorph RussKeldorph added this to the 2.2.0 milestone Apr 20, 2018
@RussKeldorph RussKeldorph added area-CodeGen bug Product bug (most likely) and removed area-CodeGen bug Product bug (most likely) labels Apr 20, 2018
@RussKeldorph RussKeldorph removed this from the 2.2.0 milestone Apr 20, 2018
@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

proposed change is wrong as SSE2 ISA does not support float overloads. SetZeroVector128 float overload is defined in SSE ISA. See:

No really, SSE2 SetZeroVector128 is generic and supports all the numeric types. Actually, if you want, Sse2.SetZeroVector128<float> can generate pxor or xorpd. We have discussed so many times that "helper intrinsics" only guarantee their behavior and functionality rather than codgen.

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

No, the distinction between .NET's implementation of SSE/SSE2 is solely due to the lack of common sense of the design.

@mikedn could you elaborate? Any suggestion for the API design?

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

@RussKeldorph Please set this PR for 2.1. This is a really bad bug that calling Sse2.SetZeroVector128<float> will make a crash.

@mikedn
Copy link

mikedn commented Apr 20, 2018

@fiigii It seems to me that the distinction between SSE and SSE2 instruction sets is purely historical and reflecting it in the API is not very useful:

  • Both SSE and SSE2 are basically part of x64, AFAIK there are no x64 CPUs that lack SSE1/2.
  • SSE2 was introduced with P4 in ~2001, more than 15 years ago. The chance that someone will run .NET on a P4 today and also expect good performance is pretty low already. The chance that someone runs .NET code on a PIII is practically 0.
  • Recent versions of Windows require SSE2. Not sure what's the story on Linux but I wouldn't be too surprised to hear that current versions also require SSE2.
  • .NET Core can't actually run properly on a CPU that doesn't support SSE2, the floating point generator assumes SSE2 and there's no x87 support in RyuJIT.

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

@mikedn I totally agree with you. Actually, the first version of hardware intrinsic APIs merged SSE and SSE2 in one class, which makes more beautiful and convenient generic APIs. However, that design was rejected during review, please see the thread dotnet/corefx#23489 (comment)

@mikedn
Copy link

mikedn commented Apr 20, 2018

However, that design was rejected during review, please see the thread dotnet/corefx#23489 (comment)

Yeah, I see, no technical argumentation whatsoever. Oh well, I'm wasting my time.

@4creators
Copy link

4creators commented Apr 20, 2018

@fiigii We have discussed this issue before and settled on current solution - if you want to change API why dont you rise issue in corefx repo with API change proposal before pushing any changes to the APIs here.

Pretty much it is hard to understand why without any technical arguments the similar problems are coming back just in the form of PR changing APIs

@CarolEidt
Copy link

My apologies for the delay in responding to this. Here are my thoughts:

  • First, I think that this fix is both in keeping with existing design parameters, as well as providing a better developer experience. @fiigii is correct that, although the hardware intrinsics endeavor to provide close fidelity to the hardware instructions, it is not only difficult but undesirable to impose unnecessary constraints or clumsiness on the API.
  • Second, I take this discussion as feedback that we might consider merging SSE and SSE2 in the final design. I confess that I didn't weigh in on that early decision, because it wasn't really clear to me what the implications might be. This kind of feedback is exactly why we weren't eager to finalize this API for 2.1. @tannergooding - do you have any further thoughts on this? You and @mellinoe were in favor of the split, and perhaps you can articulate what considerations might argue for maintaining it.
  • Finally, regarding porting this fix to 2.1 - this is an experimental feature, and there is a clear workaround for this issue. Therefore, I'm not inclined to push this fix into 2.1 at this point.

Thanks, @fiigii @4creators and @mikedn for your thoughts on this.

@CarolEidt CarolEidt requested a review from tannergooding April 20, 2018 18:51
@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

@CarolEidt Thanks for the reply. IMO, merging SSE and SSE2 will considerably improve the user experience, especially when they work with the generic code.
For example, if anyone wants to write a high-level SIMD APIs (e.g., implementing Vector<T> in managed code), they have to write the below code with the current non-generic intrinsic APIs

    public static Vector128<T> Vector128One<T>() where T : struct
    {
        if (typeof(T) == typeof(float))
        {
            return Sse.StaticCast<float, T>(Sse.SetAllVector128(1.0f));
        }
        else if (typeof(T) == typeof(double))
        {
            return Sse.StaticCast<double, T>(Sse2.SetAllVector128((double)1));
        }
        else if (typeof(T) == typeof(byte))
        {
            return Sse.StaticCast<byte, T>(Sse2.SetAllVector128((byte)1));
        }
        else if (typeof(T) == typeof(sbyte))
        {
            return Sse.StaticCast<sbyte, T>(Sse2.SetAllVector128((sbyte)1));
        }
        else if (typeof(T) == typeof(short))
        {
            return Sse.StaticCast<short, T>(Sse2.SetAllVector128((short)1));
        }
        else if (typeof(T) == typeof(ushort))
        {
            return Sse.StaticCast<ushort, T>(Sse2.SetAllVector128((ushort)1));
        }
        else if (typeof(T) == typeof(int))
        {
            return Sse.StaticCast<int, T>(Sse2.SetAllVector128((int)1));
        }
        else if (typeof(T) == typeof(uint))
        {
            return Sse.StaticCast<uint, T>(Sse2.SetAllVector128((uint)1));
        }
        else if (typeof(T) == typeof(long))
        {
            return Sse.StaticCast<long, T>(Sse2.SetAllVector128((long)1));
        }
        else if (typeof(T) == typeof(ulong))
        {
            return Sse.StaticCast<ulong, T>(Sse2.SetAllVector128((ulong)1));
        }
        else
        {
            throw new NotSupportedException();
        }
    }

if we merge SSE and SSE2 to get generic SetAllVector128, that would be simplified to

    public static Vector128<T> Vector128One<T>() where T : struct
    {
        return Sse2.SetAllVector128((T)Convert.ChangeType(1, typeof(T)));
    }

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should take this PR for 2.1, as the workaround is trivial and the feature is only being shipped as experimental.

Before taking this PR, I think we should have a broader design discussion (with the relevant API Reviewers, such as @terrajobst and @eerhardt).

The original reasons for splitting this, as I recall, were:

  • It makes the API consistent with the underlying ISA
  • It sets a precedent that we should mirror the underlying ISA regardless of what a given runtime supports
  • The API is part of CoreFX, not part of CoreCLR. CoreCLR only provides an implementation of the API.
    • Another runtime could, theoretically, decide to only support SSE instructions
    • Even for CoreCLR, the x86 JIT only started requiring SSE/SSE2 fairly recently

There were several discussions around how to make these APIs the "most usable"/"friendly". In the end, it was originally decided that some things (like software fallbacks) could be provided separately. I believe a shared wrapper API could be provided in the same way (or even as part of the same software fallback library).

I do think that merging the APIs may make it "easier to use", but so would merging "AVX" and "AVX2". However, we can't merge "AVX" and "AVX2" because there are still plenty of CPUs and operating systems that only support one of them. Additionally, users can trivially "merge" these APIs themselves with a using static Sse; using static Sse2;

@CarolEidt
Copy link

Thanks @tannergooding - I take your points, and it makes sense to defer this until the design questions are discussed & resolved.

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

it makes sense to defer this until the design questions are discussed & resolved.

Don't you mean defer this PR? But this is a true bug that can crash the runtime, and this PR does not impact API design.

@CarolEidt
Copy link

this is a true bug that can crash the runtime, and this PR does not impact API design.

Yes, there's a bug here. But allowing the float overload isn't necessarily the appropriate fix, and it can only cause a crash if the user doesn't use the appropriate API.
The right fix for this depends on what is determine regarding the API.

@4creators
Copy link

this is a true bug that can crash the runtime, and this PR does not impact API design.

The only fix which seems reasonable without changing API is to throw PNSE for Sse2.SetZeroVector128 float overload.

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

The only fix which seems reasonable without changing API is to throw PNSE for Sse2.SetZeroVector128 float overload.

No, we have decided that generic APIs have to support all numeric types. BTW, we can implement Sse2.SetZeroVector128<float> with SSE2 instruction.

@4creators
Copy link

No, we have decided that generic APIs have to support all numeric types

I think it was quite the opposite - we decided to create generic APIs in the case all numeric types are supported in given ISAs. That means that current definition of Sse2.SetVector128 is wrong bcs we should have it in the form where we throw TypeNotSupportedException for unsupported type - this was AFAIR agreed as correct way to handle that kind of errors (I was wrong above by proposing PNSE instead).

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

That means that current definition of Sse2.SetVector128 is wrong bcs we should have it in the form where we throw TypeNotSupportedException for unsupported type

So, don't you mean changing Sse2.SetVector128 to a non-generic APIs that has no float overload? If yes, that depends on the further API reviews like above suggestions. BTW, non-generic API does not throw TypeNotSupportedException.

Sse2.SetVector128 is a helper intrinsic that does not have to bind to an exact instruction. Why do we give users the constraints?

@4creators
Copy link

4creators commented Apr 20, 2018

Both SSE and SSE2 are basically part of x64, AFAIK there are no x64 CPUs that lack SSE1/2.

This is unfortunately common misconception among some developers who do not use SIMD in production. Intel up to this day ships new x64 CPUs which lack any type of SIMD support including missing SSE/SSE2 ISAs. These are for example some latest Celeron N, Pentium N and Atom C series processors like:

https://ark.intel.com/products/95592/Intel-Pentium-Processor-N4200-2M-Cache-up-to-2_5-GHz or
https://ark.intel.com/products/97935/Intel-Atom-Processor-C3308-4M-Cache-up-to-2_10-GHz

which do not support any type of SIMD instructions except for AES NI.

@4creators
Copy link

don't you mean changing Sse2.SetVector128 to a non-generic APIs that has no float overload

This cannot be done in C# due to the fact that return type does not participate in overload resolution and function is parameterless.

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

This cannot be done in C# due to the fact that return type does not participate in overload resolution and function is parameterless.

Yes, so supporting float is much better than changing APIs and throwing exceptions.

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

Intel up to this day ships new x64 CPUs which lack any type of SIMD support including missing SSE/SSE2 ISAs. These are for example some latest Celeron N, Pentium N and Atom C series processors like:

No, all the new ATOM processors support SSE - SSE4.1/4.2.

@4creators
Copy link

Why do we give users the constraints?

This is very good question which I think best expresses the problem and we should discuss it thoroughly. My preference would be to do it during general API review as @tannergooding suggested.

And I agree with @fiigii that we should fix the bug which is leading to runtime crush for 2.1 but my preference is to do it via throwing TypeNotSupportedException for float as this does not change the API.

@4creators
Copy link

No, all the new ATOM processors support SSE - SSE4.1/4.2.

@fiigii It's not in the ARK tables - do you think it's tables error?

@fiigii
Copy link
Author

fiigii commented Apr 20, 2018

It's not in the ARK tables - do you think it's tables error?

Not sure why does not the web page show ISA features.

my preference is to do it via throwing TypeNotSupportedException for float as this does not change the API.

Why not just emit an instruction? pxor, xorps, or xorpd, whatever...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RyuJIT] SSE2 SetZeroVector128 does not support float base type
8 participants