-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding more IndexOf variations #75175
Comments
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsSearching is a key aspect of many workloads. Historically, we provided just the variations that everyone needed, e.g. IndexOf. Then we expanded to IndexOfAny with a few arguments. We just recently expanded to IndexOfAnyExcept. But beyond these, our general stance has been "if you need something any more customized, it's easy to just write it yourself". And historically, it has been easy. Let's say I wanted a static int IndexOfRange(ReadOnlySpan<char> span, char lowInclusive, char highInclusive)
{
uint range = (uint)(highInclusive - lowInclusive);
for (int i = 0; i < span.Length; i++)
if ((uint)(span[i] - lowInclusive) <= range)
return i;
return -1;
} Not a big deal. But now, the expectation is such operations will take advantage of vectorization, and that's no longer "just a few lines". That ends up being something more like this (untested): static int IndexOfRange(ReadOnlySpan<char> span, char lowInclusive, char highInclusive)
{
if (Vector128.IsHardwareAccelerated && span.Length >= Vector128<ushort>.Count)
{
if (Vector256.IsHardwareAccelerated && span.Length >= Vector256<ushort>.Count)
{
ref ushort searchSpace = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(span));
ref ushort current = ref searchSpace;
Vector256<ushort> lowVector = Vector256.Create((ushort)lowInclusive);
Vector256<ushort> rangeVector = Vector256.Create((ushort)highInclusive) - lowVector;
Vector256<ushort> lessThanOrEqual;
ref ushort endMinusOneVector = ref Unsafe.Add(ref current, span.Length - Vector256<ushort>.Count);
do
{
lessThanOrEqual = Vector256.LessThanOrEqual(Vector256.LoadUnsafe(ref current) - lowVector, rangeVector);
if (lessThanOrEqual != Vector256<ushort>.Zero)
{
return
BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
(int)(Unsafe.ByteOffset(ref searchSpace, ref current) / Unsafe.SizeOf<ushort>());
}
current = ref Unsafe.Add(ref current, Vector256<ushort>.Count);
}
while (Unsafe.IsAddressLessThan(ref current, ref endMinusOneVector));
lessThanOrEqual = Vector256.LessThanOrEqual(Vector256.LoadUnsafe(ref endMinusOneVector) - lowVector, rangeVector);
if (lessThanOrEqual != Vector256<ushort>.Zero)
{
return
BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
(int)(Unsafe.ByteOffset(ref searchSpace, ref endMinusOneVector) / Unsafe.SizeOf<ushort>());
}
}
else
{
ref ushort searchSpace = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(span));
ref ushort current = ref searchSpace;
Vector128<ushort> lowVector = Vector128.Create((ushort)lowInclusive);
Vector128<ushort> rangeVector = Vector128.Create((ushort)highInclusive) - lowVector;
Vector128<ushort> lessThanOrEqual;
ref ushort endMinusOneVector = ref Unsafe.Add(ref current, span.Length - Vector128<ushort>.Count);
do
{
lessThanOrEqual = Vector128.LessThanOrEqual(Vector128.LoadUnsafe(ref current) - lowVector, rangeVector);
if (lessThanOrEqual != Vector128<ushort>.Zero)
{
return
BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
(int)(Unsafe.ByteOffset(ref searchSpace, ref current) / Unsafe.SizeOf<ushort>());
}
current = ref Unsafe.Add(ref current, Vector128<ushort>.Count);
}
while (Unsafe.IsAddressLessThan(ref current, ref endMinusOneVector));
lessThanOrEqual = Vector128.LessThanOrEqual(Vector128.LoadUnsafe(ref endMinusOneVector) - lowVector, rangeVector);
if (lessThanOrEqual != Vector128<ushort>.Zero)
{
return
BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
(int)(Unsafe.ByteOffset(ref searchSpace, ref endMinusOneVector) / Unsafe.SizeOf<ushort>());
}
}
}
else
{
uint range = (uint)(highInclusive - lowInclusive);
for (int i = 0; i < span.Length; i++)
{
if ((uint)(span[i] - lowInclusive) <= range)
{
return i;
}
}
}
return -1;
} which is closer to ~80 lines of non-trivial code, not something someone just whips up. In addition to finding ways to reduce the development cost of this (e.g. could we make it easier to share 128/256 code paths via having those types implement an interface and employing generic specialization?), it seems like we should consider lowering our bar a bit for what we consider worthy of inclusion, such that we can provide more such operations with good vectorized implementations. This could include:
We should:
This concept also extends beyond IndexOf, of course. There are multiple places where we've already vectorized implementations that could be generalized to be used by others, e.g.
|
An alternative to providing number of specialized APIs may be to create a source generator or something like that that produces the vectorized version from the simple form. |
It's an option; Tanner had a prototype of something similar. But if it's possible to do that with perfect fidelity, it should be possible to do the same in the compiler/JIT with an auto-vectorizing implementation and save the dev the trouble. And if it's not possible with perfect fidelity and the generated code would be more of a starting point the dev would need to copy and mutate, it's not really addressing the core problem this aims to. I think pursuing both approaches is reasonable, with built-in implementations for the things we ourselves use and others could as well plus the next layer of variation beyond what we already have, and then a source generator to help with the next layer still. As an example, I'd want to use some of these from the regex source generator and compiler, and it'd be nice to avoid injecting a lot of such unsafe code into the assembly that would also likely be duplicated across many assemblies. |
Just curious, would these just delegate to #68328, or did you have a different approach in mind? |
Also #76106 can be used (e.g. for IndexOfAsciiDigit). |
If there were to be an But, warranting new APIs for these in particular would necessitate both a) showing significant usage potential and b) significantly better performance potential than just using one of the other "existing" APIs. |
The scope of work covered by this issue is done. We can always add more IndexOf helpers on an as-needed basis. |
Searching is a key aspect of many workloads. Historically, we provided just the variations that everyone needed, e.g. IndexOf. Then we expanded to IndexOfAny with a few arguments. We just recently expanded to IndexOfAnyExcept. But beyond these, our general stance has been "if you need something any more customized, it's easy to just write it yourself". And historically, it has been easy. Let's say I wanted a
static int IndexOfRange(ReadOnlySpan<char> span, char lowInclusive, highInclusive)
; it's straightforward to implement that "in just a few lines":Not a big deal. But now, the expectation is such operations will take advantage of vectorization, and that's no longer "just a few lines". That ends up being something more like this (untested):
which is closer to ~80 lines of non-trivial code, not something someone just whips up.
In addition to finding ways to reduce the development cost of this (e.g. could we make it easier to share 128/256 code paths via having those types implement an interface and employing generic specialization?), it seems like we should consider lowering our bar a bit for what we consider worthy of inclusion, such that we can provide more such operations with good vectorized implementations.
This could include:
We should:
Also, would it be possible to expose a generalized helper for this, where someone just needs to supply the kernel of the thing being searched for and plug that into the overall algorithm? e.g. an interface with a method accepting the vector to analyze and a scaffolded IndexOf search that uses generic specialization to make it fast to plug in the caller's code.
This concept also extends beyond IndexOf, of course. There are multiple places where we've already vectorized implementations that could be generalized to be used by others, e.g.
The text was updated successfully, but these errors were encountered: