-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix SSE2 SetZeroVector128 on float type #17691
Conversation
cc @4creators |
Actually the proposed change is wrong as SSE2 ISA does not support
|
I wonder in what case the distinction between SSE and SSE2 is relevant these days... |
Distinction exists simply due to Intel decision to split implementation of SIMD instructions for 16 byte vectors into |
No, the distinction between .NET's implementation of SSE/SSE2 is solely due to the lack of common sense of the design. |
I belive that expressing your personal frustration the way you do in this and many other threads is in direct violation of .NET Foundation Code of Conduct If you have constructive arguments why don't you raise them in corefx repo by logging an API issue and proposing some changes you think may help in creating better Intel Hardware Intrinsic API surface. |
I'm simply stating the facts. |
No really, SSE2 |
@mikedn could you elaborate? Any suggestion for the API design? |
@RussKeldorph Please set this PR for 2.1. This is a really bad bug that calling |
@fiigii It seems to me that the distinction between SSE and SSE2 instruction sets is purely historical and reflecting it in the API is not very useful:
|
@mikedn I totally agree with you. Actually, the first version of hardware intrinsic APIs merged SSE and SSE2 in one class, which makes more beautiful and convenient generic APIs. However, that design was rejected during review, please see the thread dotnet/corefx#23489 (comment) |
Yeah, I see, no technical argumentation whatsoever. Oh well, I'm wasting my time. |
@fiigii We have discussed this issue before and settled on current solution - if you want to change API why dont you rise issue in corefx repo with API change proposal before pushing any changes to the APIs here. Pretty much it is hard to understand why without any technical arguments the similar problems are coming back just in the form of PR changing APIs |
My apologies for the delay in responding to this. Here are my thoughts:
Thanks, @fiigii @4creators and @mikedn for your thoughts on this. |
@CarolEidt Thanks for the reply. IMO, merging SSE and SSE2 will considerably improve the user experience, especially when they work with the generic code. public static Vector128<T> Vector128One<T>() where T : struct
{
if (typeof(T) == typeof(float))
{
return Sse.StaticCast<float, T>(Sse.SetAllVector128(1.0f));
}
else if (typeof(T) == typeof(double))
{
return Sse.StaticCast<double, T>(Sse2.SetAllVector128((double)1));
}
else if (typeof(T) == typeof(byte))
{
return Sse.StaticCast<byte, T>(Sse2.SetAllVector128((byte)1));
}
else if (typeof(T) == typeof(sbyte))
{
return Sse.StaticCast<sbyte, T>(Sse2.SetAllVector128((sbyte)1));
}
else if (typeof(T) == typeof(short))
{
return Sse.StaticCast<short, T>(Sse2.SetAllVector128((short)1));
}
else if (typeof(T) == typeof(ushort))
{
return Sse.StaticCast<ushort, T>(Sse2.SetAllVector128((ushort)1));
}
else if (typeof(T) == typeof(int))
{
return Sse.StaticCast<int, T>(Sse2.SetAllVector128((int)1));
}
else if (typeof(T) == typeof(uint))
{
return Sse.StaticCast<uint, T>(Sse2.SetAllVector128((uint)1));
}
else if (typeof(T) == typeof(long))
{
return Sse.StaticCast<long, T>(Sse2.SetAllVector128((long)1));
}
else if (typeof(T) == typeof(ulong))
{
return Sse.StaticCast<ulong, T>(Sse2.SetAllVector128((ulong)1));
}
else
{
throw new NotSupportedException();
}
} if we merge SSE and SSE2 to get generic public static Vector128<T> Vector128One<T>() where T : struct
{
return Sse2.SetAllVector128((T)Convert.ChangeType(1, typeof(T)));
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should take this PR for 2.1, as the workaround is trivial and the feature is only being shipped as experimental.
Before taking this PR, I think we should have a broader design discussion (with the relevant API Reviewers, such as @terrajobst and @eerhardt).
The original reasons for splitting this, as I recall, were:
- It makes the API consistent with the underlying ISA
- It sets a precedent that we should mirror the underlying ISA regardless of what a given runtime supports
- The API is part of CoreFX, not part of CoreCLR. CoreCLR only provides an implementation of the API.
- Another runtime could, theoretically, decide to only support SSE instructions
- Even for CoreCLR, the x86 JIT only started requiring SSE/SSE2 fairly recently
There were several discussions around how to make these APIs the "most usable"/"friendly". In the end, it was originally decided that some things (like software fallbacks) could be provided separately. I believe a shared wrapper API could be provided in the same way (or even as part of the same software fallback library).
I do think that merging the APIs may make it "easier to use", but so would merging "AVX" and "AVX2". However, we can't merge "AVX" and "AVX2" because there are still plenty of CPUs and operating systems that only support one of them. Additionally, users can trivially "merge" these APIs themselves with a using static Sse; using static Sse2;
Thanks @tannergooding - I take your points, and it makes sense to defer this until the design questions are discussed & resolved. |
Don't you mean defer this PR? But this is a true bug that can crash the runtime, and this PR does not impact API design. |
Yes, there's a bug here. But allowing the float overload isn't necessarily the appropriate fix, and it can only cause a crash if the user doesn't use the appropriate API. |
The only fix which seems reasonable without changing API is to throw PNSE for |
No, we have decided that generic APIs have to support all numeric types. BTW, we can implement |
I think it was quite the opposite - we decided to create generic APIs in the case all numeric types are supported in given ISAs. That means that current definition of |
So, don't you mean changing
|
This is unfortunately common misconception among some developers who do not use SIMD in production. Intel up to this day ships new x64 CPUs which lack any type of SIMD support including missing SSE/SSE2 ISAs. These are for example some latest Celeron N, Pentium N and Atom C series processors like: https://ark.intel.com/products/95592/Intel-Pentium-Processor-N4200-2M-Cache-up-to-2_5-GHz or which do not support any type of SIMD instructions except for AES NI. |
This cannot be done in C# due to the fact that return type does not participate in overload resolution and function is parameterless. |
Yes, so supporting |
No, all the new ATOM processors support SSE - SSE4.1/4.2. |
This is very good question which I think best expresses the problem and we should discuss it thoroughly. My preference would be to do it during general API review as @tannergooding suggested. And I agree with @fiigii that we should fix the bug which is leading to runtime crush for 2.1 but my preference is to do it via throwing |
@fiigii It's not in the ARK tables - do you think it's tables error? |
Not sure why does not the web page show ISA features.
Why not just emit an instruction? |
fix https://github.com/dotnet/coreclr/issues/17689
@CarolEidt PTAL