-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove usage of BMI2 due to regression on AMD #18949
Comments
Looking at dotnet/runtime#31904, I see quite some similarities between the codes. So maybe it makes sense to build |
I have a branch ready. |
My last comment for today 😉 For a simple / trivial benchmark (on Intel KabyLake) the SSE2 alternative shows a perf-win over BMI2 -- as already figured out by @GrabYourPitchforks So it seems there is no need to keep BMI2 around (as I proposed in the first comment). Benchmark codeusing System;
using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
using System.Text;
using BenchmarkDotNet.Attributes;
#if !DEBUG
using BenchmarkDotNet.Running;
#endif
namespace ConsoleApp3.Bench
{
public unsafe class Bench
{
private readonly long _input;
private readonly char[] _output = new char[8];
public Bench()
{
byte[] input = Encoding.UTF8.GetBytes("Hello world");
_input = Unsafe.ReadUnaligned<long>(ref input[0]);
}
public string Result => new string(_output);
static void Main()
{
#if DEBUG
var bench = new Bench();
bench.BMI2();
Console.WriteLine($">{bench.Result}<");
bench._output.AsSpan().Clear();
bench.SSE2();
Console.WriteLine($">{bench.Result}<");
#else
BenchmarkRunner.Run<Bench>();
#endif
}
#if Path_32bit
[Benchmark(Baseline = true)]
public void BMI2()
{
int value = (int)_input;
fixed (char* output = _output)
{
((uint*)output)[0] = Bmi2.ParallelBitDeposit((uint)value, 0x00FF00FFu);
((uint*)output)[1] = Bmi2.ParallelBitDeposit((uint)(value >> 16), 0x00FF00FFu);
}
}
[Benchmark]
public void SSE2()
{
int value = (int)_input;
fixed (char* output = _output)
{
Vector128<sbyte> vecNarrow = Sse2.ConvertScalarToVector128Int32(value).AsSByte();
Vector128<ulong> vecWide = Sse2.UnpackLow(vecNarrow, Vector128<sbyte>.Zero).AsUInt64();
Unsafe.WriteUnaligned(output, Sse2.X64.ConvertToUInt64(vecWide));
}
}
#else
[Benchmark(Baseline = true)]
public void BMI2()
{
long value = _input;
fixed (char* output = _output)
{
((ulong*)output)[0] = Bmi2.X64.ParallelBitDeposit((ulong)value, 0x00FF00FF_00FF00FFul);
((ulong*)output)[1] = Bmi2.X64.ParallelBitDeposit((ulong)(value >> 32), 0x00FF00FF_00FF00FFul);
}
}
[Benchmark]
public void SSE2()
{
long value = _input;
fixed (char* output = _output)
{
Vector128<sbyte> zero = Vector128<sbyte>.Zero;
Vector128<sbyte> vecNarrow = Sse2.X64.ConvertScalarToVector128Int64(value).AsSByte();
Vector128<ulong> vecWide = Sse2.UnpackLow(vecNarrow, zero).AsUInt64();
Sse2.Store((ulong*)output, vecWide);
}
}
#endif
}
} Results for widening 4 ASCII-bytes
Results for widening 8 ASCII-bytes
|
@gfoidl Sounds good! I take it from your comments you are able to submit a PR? That would be great! |
Yes, I'm able 😉 |
No worries, just getting a feel for if I need folks to write some code or review some code ;P. |
#19009 is merged, so this can be closed. |
For context see dotnet/runtime#2251 and #17556 (comment)
Usages of BMI2 in ASP.NET Core:
aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs
Line 15 in f7dc095
Maybe put the usage of BMI2 under a toggle
#if USE_BMI2
(or with a better name), so it can be switched on/off by configuration?Of course we should do some benchmarks to validate if
The text was updated successfully, but these errors were encountered: