-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize MemoryMarshal.Cast
for same type
#100750
Conversation
Tagging subscribers to this area: @dotnet/interop-contrib |
Didn't we discuss that this fast path makes no sense? in #99835 (comment) |
Just echoing what @EgorBo said, due to that stall mentioned there, it's not clearly a win even if the codegen "looks" better. For example: using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
BenchmarkSwitcher.FromAssembly(typeof(Benchmark).Assembly).Run(args);
public class Benchmark
{
private int[] _values = Enumerable.Range(0, 10000).Select(_ => Random.Shared.Next(2)).ToArray();
[Benchmark]
public void BenchBase()
{
Span<int> s;
for (int i = 0; i < 10000; i++)
{
TestBase(_values.AsSpan(0, i), out s);
}
}
[Benchmark]
public void BenchDiff()
{
Span<int> s;
for (int i = 0; i < 10000; i++)
{
TestDiff(_values.AsSpan(0, i), out s);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static void TestBase<T>(Span<T> s, out Span<int> s2) where T : struct
{
s2 = CastBase<T, int>(s);
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static void TestDiff<T>(Span<T> s, out Span<int> s2) where T : struct
{
s2 = CastDiff<T, int>(s);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe Span<TTo> CastDiff<TFrom, TTo>(Span<TFrom> span)
where TFrom : struct
where TTo : struct
{
if (typeof(TFrom) == typeof(TTo))
return *(Span<TTo>*)&span;
// Use unsigned integers - unsigned division by constant (especially by power of 2)
// and checked casts are faster and smaller.
uint fromSize = (uint)Unsafe.SizeOf<TFrom>();
uint toSize = (uint)Unsafe.SizeOf<TTo>();
uint fromLength = (uint)span.Length;
int toLength;
if (fromSize == toSize)
{
// Special case for same size types - `(ulong)fromLength * (ulong)fromSize / (ulong)toSize`
// should be optimized to just `length` but the JIT doesn't do that today.
toLength = (int)fromLength;
}
else if (fromSize == 1)
{
// Special case for byte sized TFrom - `(ulong)fromLength * (ulong)fromSize / (ulong)toSize`
// becomes `(ulong)fromLength / (ulong)toSize` but the JIT can't narrow it down to `int`
// and can't eliminate the checked cast. This also avoids a 32 bit specific issue,
// the JIT can't eliminate long multiply by 1.
toLength = (int)(fromLength / toSize);
}
else
{
// Ensure that casts are done in such a way that the JIT is able to "see"
// the uint->ulong casts and the multiply together so that on 32 bit targets
// 32x32to64 multiplication is used.
ulong toLengthUInt64 = (ulong)fromLength * (ulong)fromSize / (ulong)toSize;
toLength = checked((int)toLengthUInt64);
}
return MemoryMarshal.CreateSpan(ref Unsafe.As<TFrom, TTo>(ref MemoryMarshal.GetReference(span)), toLength);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe Span<TTo> CastBase<TFrom, TTo>(Span<TFrom> span)
where TFrom : struct
where TTo : struct
{
// Use unsigned integers - unsigned division by constant (especially by power of 2)
// and checked casts are faster and smaller.
uint fromSize = (uint)Unsafe.SizeOf<TFrom>();
uint toSize = (uint)Unsafe.SizeOf<TTo>();
uint fromLength = (uint)span.Length;
int toLength;
if (fromSize == toSize)
{
// Special case for same size types - `(ulong)fromLength * (ulong)fromSize / (ulong)toSize`
// should be optimized to just `length` but the JIT doesn't do that today.
toLength = (int)fromLength;
}
else if (fromSize == 1)
{
// Special case for byte sized TFrom - `(ulong)fromLength * (ulong)fromSize / (ulong)toSize`
// becomes `(ulong)fromLength / (ulong)toSize` but the JIT can't narrow it down to `int`
// and can't eliminate the checked cast. This also avoids a 32 bit specific issue,
// the JIT can't eliminate long multiply by 1.
toLength = (int)(fromLength / toSize);
}
else
{
// Ensure that casts are done in such a way that the JIT is able to "see"
// the uint->ulong casts and the multiply together so that on 32 bit targets
// 32x32to64 multiplication is used.
ulong toLengthUInt64 = (ulong)fromLength * (ulong)fromSize / (ulong)toSize;
toLength = checked((int)toLengthUInt64);
}
return MemoryMarshal.CreateSpan(ref Unsafe.As<TFrom, TTo>(ref MemoryMarshal.GetReference(span)), toLength);
}
}
|
@jakobbotsch Thanks for the BenchmarkDotNet example, is this issue specific to XArch?
|
@@ -122,6 +122,9 @@ public static unsafe ReadOnlySpan<byte> AsBytes<T>(ReadOnlySpan<T> span) | |||
if (RuntimeHelpers.IsReferenceOrContainsReferences<TTo>()) | |||
ThrowHelper.ThrowInvalidTypeWithPointersNotSupported(typeof(TTo)); | |||
|
|||
if (typeof(TFrom) == typeof(TTo)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change would be a regression for shared generic code. TFrom
and TTo
may require generic dictionary lookups.
The identical types should be covered by if (fromSize == toSize)
path below. Is there anything preventing the JIT from generating the optimal code for that path?
Follow-up to #99835.
Diffs
Diffs looks mostly positive although there are some large code size regressions due to enablement of more inlining.
cc: @stephentoub