-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HashSet TryGetValue #20073
Comments
I've wanted this. |
I have rolled my own hash sets solely for this use. (A while back I started on something that hit just this use and not other set purposes, with different versions for concurrent and single-threaded use, etc. as it can be very useful for memory-saving pools of objects, but then I got distracted by something shiny or something). |
This would be useful for some string interning schemes, as @JonHanna mentions. I've copied and modified the HasSet code myself for this purpose in a previous product. |
What's wrong with the Dictionary approach that was suggested in the top post? I think adding a |
@jamesqo it's wasteful in space - an extra pointer in every entry. How would it be confusing? (Not disagreeing just trying to understand) |
Maybe it would be better to have specialized string interning APIs then that mitigated this problem then?
People not using the API for object pooling would ask "what's the point of this method if we already know the key?" |
Any successful string interning API needs to have customizable policy (I guess that was why /cc @jkotas |
Agree. Check the Roslyn string interning facility as a case study: https://github.com/dotnet/roslyn/blob/acc053dbd9d3cb25f086f86f36f80896c8379466/src/Compilers/Core/Portable/MetadataReader/PEModule.cs#L2986 The interesting points:
(edit: @danmosemsft - also see https://github.com/dotnet/roslyn/blob/872202543f4be3033d39d0737426b9663a444c05/src/Compilers/Core/Portable/InternalUtilities/StringTable.cs ) |
Here's another one that I worked on It has some hard coded strings it does a fast check for, then uses several linked lists stored in priority order, each limited in the number of entries, and dedicated to a different size range. It also has a bunch of diagnostic logging to help tune it. Seems over complex now, but we got there by iterating with real work loads. Upshot is I think it would be difficult to devise an interning API that would be sufficiently configurable for folks who want to go beyond basically a layer over a hashset. |
Another potential use case for the API proposed is a hashtable where each entry knows its own key (some object that has a name property for example). If you provide a comparer that compares only against the name, you can retrieve the object without the overhead of storing a separate key. Oddly enough in that previous life I had to do that also |
An interning API can grow in some very non-set-related ways; concurrency, weak references, matching stored strings with character array ranges, hash-consing factories, layered stores, generalising the store of Just adding Worth doing, I think. |
I think adding About the index getter, why should this be valuable if adding |
@safern I think you're probably right. The index getter is probably not that useful. I guess I was trying to mirror |
I updated the rationale to capture the use cases discussed above. |
Thanks for updating that @danmosemsft. API proposal looks good for me, so I will go ahead and add the label api-ready-for-review so that the members in charge of approving APIs can review it. Thanks @TylerBrinkley for your proposal, the review process should take one or a couple of weeks. |
Looks good as proposed, but we should also add the same API on namespace System.Collections.Generic {
public partial class HashSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
public partial class SortedSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
} By the way, we already exposed these APIs on the immutable collections for the same reason. We should be consistent with the parameter names used there ( |
I have updated the top-post with the new API shape (@terrajobst's note above): Original Proposal namespace System.Collections.Generic {
public class HashSet<T> : ICollection<T>, IDeserializationCallback, IEnumerable, IEnumerable<T>, IReadOnlyCollection<T>, ISerializable, ISet<T> {
+ public bool TryGetValue(T key, out T actualValue);
}
} New Proposal namespace System.Collections.Generic {
public partial class HashSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
public partial class SortedSet<T> {
+ public bool TryGetValue(T equalValue, out T actualValue);
}
} |
I'll take my stab at implementing this if that's alright. |
I was just about to write it is straightforward and available :). |
That is great @TylerBrinkley thanks for contributing!! Thanks @karelz @terrajobst |
Sometimes I find myself needing to retrieve a value from a
HashSet
. You may wonder why that's needed when you already have the value when you might want to retrieve it. Well, that's not always true depending on theComparer
you're using as a value may evaluate to being contained within the set but is not strictly equal to the value stored in the set.Proposed API
Rationale and Usage
My use case is sometimes I want to create a case-insensitive
string
mapping to the actualstring
value. So for example instead of having to use aDictionary<string, string>
for the mapping as belowI could use a
HashSet<string>
like thisWith this implemented users would save on memory usage, would be able to use set operations, and be able to populate the collection more easily.
Other possible use cases (from discussion below)
API Details
TryGetValue
returns an indication if theHashSet
contains theequalValue
and the output parameteractualValue
contains theHashSet
's value that equalsequalValue
with the setComparer
if contained in theHashSet
otherwise the default value ofT
.Updates
The text was updated successfully, but these errors were encountered: