-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryExtensions.SequenceCompare #20525
Comments
This may be a helpful addition, but I don't think that using |
Might be a good idea for Span? |
True, I've chosen Array in order to make it analog to Buffer.BlockCopy(Array,int,Array,int,int) but T[] would probably be even more elegant, void* & length also definitely. The issue with T[] is that the efficient comparison effectively only works for primitives and structs that don't redefine Equals. If you have a reference type or a value type that redefines Equals, it'd have to call a virtual method for each element, rendering the efficiency of memcmp useless (a for loop would have been fine here for clarity). Alternatively one could just redefine the contract to do "binary comparison" only, where for reference types it only compares its references, for struct types it only compares the entire structure regardless whether Equals is redefined. However I don't think it'd be intuitive or useful to users, so limiting it to primitives might still be useful.
Yes, Span would greatly benefit from it aswell, had it in my mind aswell. Just added it for a start. |
Only skimmed the proposal, but I think |
All raw memory APIs should go to Span going forward. There is |
Fair enough, however I firmly believe that it is important to have comparison functions even outside of Span's, as there are many variants of memcpy but not a single one of memcmp. Nearly all modern programming languages offer a way to compare primitive arrays in a fast, efficient way without too much overhead. While Span is amazing and it should be there aswell, I do believe there should be a way to access that function without the overhead of a Span. While Span is a great API for working with memory efficiently, I don't think something as specific as Span should be forced onto the user for something general like comparing memory. |
The overhead of Span is neglible, and it is often less than the overhead of alternatives (passing indices manually, etc.). You can watch a case study on this topic in dotnet/corefxlab#1278 . |
Alright, I didn't know that. Thanks for the quick reply! I've adjusted the API. |
@AronParker, does this thinking still hold? Also, can you explain what SequenceCompare should return and in what cases? Essentially, can you provide code sample showing usage of this API? |
@ahsonkhan No. I was actually mainly looking for a way to compare equality, not relation so SequenceEqual is actually sufficient. |
It is pity that this didn't fit Span/ReadOnlySpan. memcmp is critical function for embedded databases. |
@Bobris Span APIs are still work in progress. It was mentioned above that If you believe that SequenceCompare is an important API to have in additional to SequenceEqual, could you please provide some real examples what you would use it for, and how you would like to see it behave e.g. for spans of different lengths? |
@jkotas Thanks for looking at this. I need lexicographical order in my KeyValue DB. My current pour man implementation is https://github.com/Bobris/BTDB/blob/58c1bf9bbf555e98d836c6de094f431b5dfdd8f9/BTDB/Buffer/BitArrayManipulation.cs#L171. This operation is basic for any B-Tree implementation when key is just sequence of bytes. |
I have seen several people asking for this API. I think it would be good to discuss this at the API review. @karelz, I will change the tag to ready for API review. It seems like the API proposal is now for Span APIs, which is what you asked for when you changed the tag to "needs more work". |
We should change the name to class static SpanExtensions
{
static int SequenceCompareTo(this ReadOnlySpan<byte> first, ReadOnlySpan<byte> second);
} |
FYI: The API review discussion was recorded - see https://youtu.be/BI3iXFT8H7E?t=4684 (5 min duration) |
Top post updated per API name change in API review: https://github.com/dotnet/corefx/issues/16878#issuecomment-349414287 |
I looked at implementation mentioned in video in https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.Primitives/System/Buffers/BufferExtensions.cs and it is correct, but "slow", so hopefully it is planned to have it with optimizations from beginning, please pretty please. |
@Bobris what else would you propose? Do you want to submit a PR? |
Not whole PR, but even "hand inlining" CompareTo to simple subtract (lengths are only positive so it is safe from wrapping) is improvement (inputs are identical): BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
|
Should this method be a generic method just like SequenceEqual? public static int SequenceCompareTo<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second) where T : IEquatable<T> { throw null; } |
It would need to be |
Latest Proposal
See https://github.com/dotnet/corefx/issues/16878#issuecomment-349414287
Problem:
As of right now there is no function to efficiently compare memory. The result is that such comparisons are usually done on a byte by byte basis or even worse Enumerable.SequenceEqual, which performs really slow.
Suggested APIs:
Design
It would fit along nicely with other memory functions defined in Buffer such as memcpy or memmove. The arrays would have to be primitives and can be efficiently compared.
Implementation
Different implementations can be used for different lengths. Here are some examples:
The individual thresholds would have to be profiled of course.
Original Proposal
[EDIT] Add C# syntax highlight for code by @karelz
The text was updated successfully, but these errors were encountered: