Fix SSE2 SetZeroVector128 on float type #17691

fiigii · 2018-04-20T00:42:24Z

fix https://github.com/dotnet/coreclr/issues/17689

fiigii · 2018-04-20T00:42:45Z

4creators · 2018-04-20T11:55:09Z

Actually the proposed change is wrong as SSE2 ISA does not support float overloads. SetZeroVector128 float overload is defined in SSE ISA. See:

coreclr/src/mscorlib/src/System/Runtime/Intrinsics/X86/Sse.cs

Line 511 in 83fbfec

public static Vector128<float> SetZeroVector128() => SetZeroVector128();

mikedn · 2018-04-20T11:56:39Z

I wonder in what case the distinction between SSE and SSE2 is relevant these days...

4creators · 2018-04-20T12:04:32Z

I wonder in what case the distinction between SSE and SSE2 is relevant these days...

Distinction exists simply due to Intel decision to split implementation of SIMD instructions for 16 byte vectors into float and non-float numeric types including support detection - they could have merged those ISAs a long time ago ....

mikedn · 2018-04-20T12:11:22Z

Distinction exists simply due to Intel decision to split implementation of SIMD instructions for 16 byte vectors into float and non-float numeric types including support detection - they could have merged those ISAs a long time ago ....

No, the distinction between .NET's implementation of SSE/SSE2 is solely due to the lack of common sense of the design.

4creators · 2018-04-20T12:41:48Z

@mikedn

I belive that expressing your personal frustration the way you do in this and many other threads is in direct violation of .NET Foundation Code of Conduct

If you have constructive arguments why don't you raise them in corefx repo by logging an API issue and proposing some changes you think may help in creating better Intel Hardware Intrinsic API surface.

mikedn · 2018-04-20T12:50:29Z

I belive that expressing your personal frustration the way you do in this and many other threads is in direct violation of .NET Foundation Code of Conduct

I'm simply stating the facts.

fiigii · 2018-04-20T16:58:07Z

proposed change is wrong as SSE2 ISA does not support float overloads. SetZeroVector128 float overload is defined in SSE ISA. See:

No really, SSE2 SetZeroVector128 is generic and supports all the numeric types. Actually, if you want, Sse2.SetZeroVector128<float> can generate pxor or xorpd. We have discussed so many times that "helper intrinsics" only guarantee their behavior and functionality rather than codgen.

fiigii · 2018-04-20T16:59:40Z

No, the distinction between .NET's implementation of SSE/SSE2 is solely due to the lack of common sense of the design.

@mikedn could you elaborate? Any suggestion for the API design?

fiigii · 2018-04-20T17:01:22Z

@RussKeldorph Please set this PR for 2.1. This is a really bad bug that calling Sse2.SetZeroVector128<float> will make a crash.

mikedn · 2018-04-20T17:14:24Z

@fiigii It seems to me that the distinction between SSE and SSE2 instruction sets is purely historical and reflecting it in the API is not very useful:

Both SSE and SSE2 are basically part of x64, AFAIK there are no x64 CPUs that lack SSE1/2.
SSE2 was introduced with P4 in ~2001, more than 15 years ago. The chance that someone will run .NET on a P4 today and also expect good performance is pretty low already. The chance that someone runs .NET code on a PIII is practically 0.
Recent versions of Windows require SSE2. Not sure what's the story on Linux but I wouldn't be too surprised to hear that current versions also require SSE2.
.NET Core can't actually run properly on a CPU that doesn't support SSE2, the floating point generator assumes SSE2 and there's no x87 support in RyuJIT.

fiigii · 2018-04-20T17:31:10Z

@mikedn I totally agree with you. Actually, the first version of hardware intrinsic APIs merged SSE and SSE2 in one class, which makes more beautiful and convenient generic APIs. However, that design was rejected during review, please see the thread dotnet/corefx#23489 (comment)

mikedn · 2018-04-20T17:39:07Z

However, that design was rejected during review, please see the thread dotnet/corefx#23489 (comment)

Yeah, I see, no technical argumentation whatsoever. Oh well, I'm wasting my time.

4creators · 2018-04-20T18:35:00Z

@fiigii We have discussed this issue before and settled on current solution - if you want to change API why dont you rise issue in corefx repo with API change proposal before pushing any changes to the APIs here.

Pretty much it is hard to understand why without any technical arguments the similar problems are coming back just in the form of PR changing APIs

CarolEidt · 2018-04-20T18:48:32Z

My apologies for the delay in responding to this. Here are my thoughts:

First, I think that this fix is both in keeping with existing design parameters, as well as providing a better developer experience. @fiigii is correct that, although the hardware intrinsics endeavor to provide close fidelity to the hardware instructions, it is not only difficult but undesirable to impose unnecessary constraints or clumsiness on the API.
Second, I take this discussion as feedback that we might consider merging SSE and SSE2 in the final design. I confess that I didn't weigh in on that early decision, because it wasn't really clear to me what the implications might be. This kind of feedback is exactly why we weren't eager to finalize this API for 2.1. @tannergooding - do you have any further thoughts on this? You and @mellinoe were in favor of the split, and perhaps you can articulate what considerations might argue for maintaining it.
Finally, regarding porting this fix to 2.1 - this is an experimental feature, and there is a clear workaround for this issue. Therefore, I'm not inclined to push this fix into 2.1 at this point.

Thanks, @fiigii @4creators and @mikedn for your thoughts on this.

fiigii · 2018-04-20T19:07:46Z

@CarolEidt Thanks for the reply. IMO, merging SSE and SSE2 will considerably improve the user experience, especially when they work with the generic code.
For example, if anyone wants to write a high-level SIMD APIs (e.g., implementing Vector<T> in managed code), they have to write the below code with the current non-generic intrinsic APIs

    public static Vector128<T> Vector128One<T>() where T : struct
    {
        if (typeof(T) == typeof(float))
        {
            return Sse.StaticCast<float, T>(Sse.SetAllVector128(1.0f));
        }
        else if (typeof(T) == typeof(double))
        {
            return Sse.StaticCast<double, T>(Sse2.SetAllVector128((double)1));
        }
        else if (typeof(T) == typeof(byte))
        {
            return Sse.StaticCast<byte, T>(Sse2.SetAllVector128((byte)1));
        }
        else if (typeof(T) == typeof(sbyte))
        {
            return Sse.StaticCast<sbyte, T>(Sse2.SetAllVector128((sbyte)1));
        }
        else if (typeof(T) == typeof(short))
        {
            return Sse.StaticCast<short, T>(Sse2.SetAllVector128((short)1));
        }
        else if (typeof(T) == typeof(ushort))
        {
            return Sse.StaticCast<ushort, T>(Sse2.SetAllVector128((ushort)1));
        }
        else if (typeof(T) == typeof(int))
        {
            return Sse.StaticCast<int, T>(Sse2.SetAllVector128((int)1));
        }
        else if (typeof(T) == typeof(uint))
        {
            return Sse.StaticCast<uint, T>(Sse2.SetAllVector128((uint)1));
        }
        else if (typeof(T) == typeof(long))
        {
            return Sse.StaticCast<long, T>(Sse2.SetAllVector128((long)1));
        }
        else if (typeof(T) == typeof(ulong))
        {
            return Sse.StaticCast<ulong, T>(Sse2.SetAllVector128((ulong)1));
        }
        else
        {
            throw new NotSupportedException();
        }
    }

if we merge SSE and SSE2 to get generic SetAllVector128, that would be simplified to

    public static Vector128<T> Vector128One<T>() where T : struct
    {
        return Sse2.SetAllVector128((T)Convert.ChangeType(1, typeof(T)));
    }

tannergooding

I don't think we should take this PR for 2.1, as the workaround is trivial and the feature is only being shipped as experimental.

Before taking this PR, I think we should have a broader design discussion (with the relevant API Reviewers, such as @terrajobst and @eerhardt).

The original reasons for splitting this, as I recall, were:

It makes the API consistent with the underlying ISA
It sets a precedent that we should mirror the underlying ISA regardless of what a given runtime supports
The API is part of CoreFX, not part of CoreCLR. CoreCLR only provides an implementation of the API.
- Another runtime could, theoretically, decide to only support SSE instructions
- Even for CoreCLR, the x86 JIT only started requiring SSE/SSE2 fairly recently

There were several discussions around how to make these APIs the "most usable"/"friendly". In the end, it was originally decided that some things (like software fallbacks) could be provided separately. I believe a shared wrapper API could be provided in the same way (or even as part of the same software fallback library).

I do think that merging the APIs may make it "easier to use", but so would merging "AVX" and "AVX2". However, we can't merge "AVX" and "AVX2" because there are still plenty of CPUs and operating systems that only support one of them. Additionally, users can trivially "merge" these APIs themselves with a using static Sse; using static Sse2;

CarolEidt · 2018-04-20T19:55:57Z

Thanks @tannergooding - I take your points, and it makes sense to defer this until the design questions are discussed & resolved.

fiigii · 2018-04-20T20:05:10Z

it makes sense to defer this until the design questions are discussed & resolved.

Don't you mean defer this PR? But this is a true bug that can crash the runtime, and this PR does not impact API design.

CarolEidt · 2018-04-20T20:22:25Z

this is a true bug that can crash the runtime, and this PR does not impact API design.

Yes, there's a bug here. But allowing the float overload isn't necessarily the appropriate fix, and it can only cause a crash if the user doesn't use the appropriate API.
The right fix for this depends on what is determine regarding the API.

4creators · 2018-04-20T21:00:37Z

this is a true bug that can crash the runtime, and this PR does not impact API design.

The only fix which seems reasonable without changing API is to throw PNSE for Sse2.SetZeroVector128 float overload.

fiigii · 2018-04-20T21:03:00Z

The only fix which seems reasonable without changing API is to throw PNSE for Sse2.SetZeroVector128 float overload.

No, we have decided that generic APIs have to support all numeric types. BTW, we can implement Sse2.SetZeroVector128<float> with SSE2 instruction.

4creators · 2018-04-20T21:10:52Z

No, we have decided that generic APIs have to support all numeric types

I think it was quite the opposite - we decided to create generic APIs in the case all numeric types are supported in given ISAs. That means that current definition of Sse2.SetVector128 is wrong bcs we should have it in the form where we throw TypeNotSupportedException for unsupported type - this was AFAIR agreed as correct way to handle that kind of errors (I was wrong above by proposing PNSE instead).

fiigii · 2018-04-20T21:20:05Z

That means that current definition of Sse2.SetVector128 is wrong bcs we should have it in the form where we throw TypeNotSupportedException for unsupported type

So, don't you mean changing Sse2.SetVector128 to a non-generic APIs that has no float overload? If yes, that depends on the further API reviews like above suggestions. BTW, non-generic API does not throw TypeNotSupportedException.

Sse2.SetVector128 is a helper intrinsic that does not have to bind to an exact instruction. Why do we give users the constraints?

4creators · 2018-04-20T21:21:03Z

Both SSE and SSE2 are basically part of x64, AFAIK there are no x64 CPUs that lack SSE1/2.

This is unfortunately common misconception among some developers who do not use SIMD in production. Intel up to this day ships new x64 CPUs which lack any type of SIMD support including missing SSE/SSE2 ISAs. These are for example some latest Celeron N, Pentium N and Atom C series processors like:

https://ark.intel.com/products/95592/Intel-Pentium-Processor-N4200-2M-Cache-up-to-2_5-GHz or
https://ark.intel.com/products/97935/Intel-Atom-Processor-C3308-4M-Cache-up-to-2_10-GHz

which do not support any type of SIMD instructions except for AES NI.

4creators · 2018-04-20T21:25:06Z

don't you mean changing Sse2.SetVector128 to a non-generic APIs that has no float overload

This cannot be done in C# due to the fact that return type does not participate in overload resolution and function is parameterless.

fiigii · 2018-04-20T21:27:33Z

This cannot be done in C# due to the fact that return type does not participate in overload resolution and function is parameterless.

Yes, so supporting float is much better than changing APIs and throwing exceptions.

fiigii · 2018-04-20T21:30:40Z

Intel up to this day ships new x64 CPUs which lack any type of SIMD support including missing SSE/SSE2 ISAs. These are for example some latest Celeron N, Pentium N and Atom C series processors like:

No, all the new ATOM processors support SSE - SSE4.1/4.2.

4creators · 2018-04-20T21:31:43Z

Why do we give users the constraints?

This is very good question which I think best expresses the problem and we should discuss it thoroughly. My preference would be to do it during general API review as @tannergooding suggested.

And I agree with @fiigii that we should fix the bug which is leading to runtime crush for 2.1 but my preference is to do it via throwing TypeNotSupportedException for float as this does not change the API.

4creators · 2018-04-20T21:33:39Z

No, all the new ATOM processors support SSE - SSE4.1/4.2.

@fiigii It's not in the ARK tables - do you think it's tables error?

fiigii · 2018-04-20T21:37:38Z

It's not in the ARK tables - do you think it's tables error?

Not sure why does not the web page show ISA features.

my preference is to do it via throwing TypeNotSupportedException for float as this does not change the API.

Why not just emit an instruction? pxor, xorps, or xorpd, whatever...

Fix SSE2 SetZeroVector128 on float type

1fd7f95

dotnet-bot added the 2 - In Progress label Apr 20, 2018

RussKeldorph requested a review from CarolEidt April 20, 2018 16:25

RussKeldorph added this to the 2.2.0 milestone Apr 20, 2018

RussKeldorph added area-CodeGen bug Product bug (most likely) and removed area-CodeGen bug Product bug (most likely) labels Apr 20, 2018

RussKeldorph removed this from the 2.2.0 milestone Apr 20, 2018

CarolEidt approved these changes Apr 20, 2018

View reviewed changes

CarolEidt requested a review from tannergooding April 20, 2018 18:51

tannergooding suggested changes Apr 20, 2018

View reviewed changes

fiigii mentioned this pull request Apr 25, 2018

Port more Vector tests to Vector128/256<T> #17737

Merged

fiigii closed this Aug 6, 2018

CarolEidt mentioned this pull request Jan 31, 2020

Intel HW Intrinsics classes should use inheritance dotnet/runtime#25953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SSE2 SetZeroVector128 on float type #17691

Fix SSE2 SetZeroVector128 on float type #17691

fiigii commented Apr 20, 2018

fiigii commented Apr 20, 2018

4creators commented Apr 20, 2018

mikedn commented Apr 20, 2018

4creators commented Apr 20, 2018 •

edited

Loading

mikedn commented Apr 20, 2018

4creators commented Apr 20, 2018 •

edited

Loading

mikedn commented Apr 20, 2018

fiigii commented Apr 20, 2018 •

edited

Loading

fiigii commented Apr 20, 2018

fiigii commented Apr 20, 2018 •

edited

Loading

mikedn commented Apr 20, 2018

fiigii commented Apr 20, 2018

mikedn commented Apr 20, 2018

4creators commented Apr 20, 2018 •

edited

Loading

CarolEidt commented Apr 20, 2018

fiigii commented Apr 20, 2018

tannergooding left a comment •

edited

Loading

CarolEidt commented Apr 20, 2018

fiigii commented Apr 20, 2018

CarolEidt commented Apr 20, 2018

4creators commented Apr 20, 2018

fiigii commented Apr 20, 2018 •

edited

Loading

4creators commented Apr 20, 2018

fiigii commented Apr 20, 2018

4creators commented Apr 20, 2018 •

edited

Loading

4creators commented Apr 20, 2018

fiigii commented Apr 20, 2018

fiigii commented Apr 20, 2018

4creators commented Apr 20, 2018

4creators commented Apr 20, 2018

fiigii commented Apr 20, 2018

Fix SSE2 SetZeroVector128 on float type #17691

Fix SSE2 SetZeroVector128 on float type #17691

Conversation

fiigii commented Apr 20, 2018

fiigii commented Apr 20, 2018

4creators commented Apr 20, 2018

mikedn commented Apr 20, 2018

4creators commented Apr 20, 2018 • edited Loading

mikedn commented Apr 20, 2018

4creators commented Apr 20, 2018 • edited Loading

mikedn commented Apr 20, 2018

fiigii commented Apr 20, 2018 • edited Loading

fiigii commented Apr 20, 2018

fiigii commented Apr 20, 2018 • edited Loading

mikedn commented Apr 20, 2018

fiigii commented Apr 20, 2018

mikedn commented Apr 20, 2018

4creators commented Apr 20, 2018 • edited Loading

CarolEidt commented Apr 20, 2018

fiigii commented Apr 20, 2018

tannergooding left a comment • edited Loading

Choose a reason for hiding this comment

CarolEidt commented Apr 20, 2018

fiigii commented Apr 20, 2018

CarolEidt commented Apr 20, 2018

4creators commented Apr 20, 2018

fiigii commented Apr 20, 2018 • edited Loading

4creators commented Apr 20, 2018

fiigii commented Apr 20, 2018

4creators commented Apr 20, 2018 • edited Loading

4creators commented Apr 20, 2018

fiigii commented Apr 20, 2018

fiigii commented Apr 20, 2018

4creators commented Apr 20, 2018

4creators commented Apr 20, 2018

fiigii commented Apr 20, 2018

4creators commented Apr 20, 2018 •

edited

Loading

4creators commented Apr 20, 2018 •

edited

Loading

fiigii commented Apr 20, 2018 •

edited

Loading

fiigii commented Apr 20, 2018 •

edited

Loading

4creators commented Apr 20, 2018 •

edited

Loading

tannergooding left a comment •

edited

Loading

fiigii commented Apr 20, 2018 •

edited

Loading

4creators commented Apr 20, 2018 •

edited

Loading