Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HashSet TryGetValue #20073

Closed
TylerBrinkley opened this issue Feb 1, 2017 · 21 comments · Fixed by dotnet/corefx#16034
Closed

Add HashSet TryGetValue #20073

TylerBrinkley opened this issue Feb 1, 2017 · 21 comments · Fixed by dotnet/corefx#16034
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@TylerBrinkley
Copy link
Contributor

Sometimes I find myself needing to retrieve a value from a HashSet. You may wonder why that's needed when you already have the value when you might want to retrieve it. Well, that's not always true depending on the Comparer you're using as a value may evaluate to being contained within the set but is not strictly equal to the value stored in the set.

Proposed API

 namespace System.Collections.Generic {
     public partial class HashSet<T> {
+        public bool TryGetValue(T equalValue, out T actualValue);
     }
     public partial class SortedSet<T> {
+        public bool TryGetValue(T equalValue, out T actualValue);
     }
 }

Rationale and Usage

My use case is sometimes I want to create a case-insensitive string mapping to the actual string value. So for example instead of having to use a Dictionary<string, string> for the mapping as below

var dictionary = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
dictionary["A"] = "A";
string value = "a";
string actualValue;
if (dictionary.TryGetValue(value, out actualValue))
{
    Assert.AreEqual("A", actualValue);
}

I could use a HashSet<string> like this

var hashSet = new HashSet<string>(StringComparer.OrdinalIgnoreCase) { "A" };
string value = "a";
string actualValue;
if (hashSet.TryGetValue(value, out actualValue))
{
    Assert.AreEqual("A", actualValue);
}

With this implemented users would save on memory usage, would be able to use set operations, and be able to populate the collection more easily.

Other possible use cases (from discussion below)

  • object pool, such as simple string interning
  • save overhead of the key in a dictionary, when the value knows its own key

API Details

  • TryGetValue returns an indication if the HashSet contains the equalValue and the output parameter actualValue contains the HashSet's value that equals equalValue with the set Comparer if contained in the HashSet otherwise the default value of T.

Updates

  • Formalized request
  • Removed getter indexer
@jnm2
Copy link
Contributor

jnm2 commented Feb 1, 2017

I've wanted this.

@JonHanna
Copy link
Contributor

JonHanna commented Feb 1, 2017

I have rolled my own hash sets solely for this use. (A while back I started on something that hit just this use and not other set purposes, with different versions for concurrent and single-threaded use, etc. as it can be very useful for memory-saving pools of objects, but then I got distracted by something shiny or something).

@danmoseley
Copy link
Member

This would be useful for some string interning schemes, as @JonHanna mentions. I've copied and modified the HasSet code myself for this purpose in a previous product.

@jamesqo
Copy link
Contributor

jamesqo commented Feb 1, 2017

What's wrong with the Dictionary approach that was suggested in the top post? I think adding a TryGetValue method to HashSet would confuse 99% of users of the API who aren't using it for object pooling. Perhaps efforts should be dedicated to better pooling APIs instead.

@danmoseley
Copy link
Member

@jamesqo it's wasteful in space - an extra pointer in every entry.

How would it be confusing? (Not disagreeing just trying to understand)

@jamesqo
Copy link
Contributor

jamesqo commented Feb 1, 2017

it's wasteful in space - an extra pointer in every entry.

Maybe it would be better to have specialized string interning APIs then that mitigated this problem then?

How would it be confusing? (Not disagreeing just trying to understand)

People not using the API for object pooling would ask "what's the point of this method if we already know the key?"

@danmoseley
Copy link
Member

Any successful string interning API needs to have customizable policy (I guess that was why String.Intern failed). We could devise some such API but it's quite feasible to build something reasonable on HashSet or some other existing collection. I suspect if you're doing interning you're a relatively sophisticated developer and would be comfortable doing that.

/cc @jkotas

@jkotas
Copy link
Member

jkotas commented Feb 2, 2017

Any successful string interning API needs to have customizable policy

Agree. Check the Roslyn string interning facility as a case study: https://github.com/dotnet/roslyn/blob/acc053dbd9d3cb25f086f86f36f80896c8379466/src/Compilers/Core/Portable/MetadataReader/PEModule.cs#L2986

The interesting points:

  • It has two layers to align with the lifetimes of the compiler object graphs
  • It has special entrypoints to intern based in UTF8 input
  • It "forgets" strings - it does not keep everything alive indefinitely

(edit: @danmosemsft - also see https://github.com/dotnet/roslyn/blob/872202543f4be3033d39d0737426b9663a444c05/src/Compilers/Core/Portable/InternalUtilities/StringTable.cs )

@danmoseley
Copy link
Member

Here's another one that I worked on
https://github.com/Microsoft/msbuild/blob/xplat/src/Shared/OpportunisticIntern.cs

It has some hard coded strings it does a fast check for, then uses several linked lists stored in priority order, each limited in the number of entries, and dedicated to a different size range. It also has a bunch of diagnostic logging to help tune it. Seems over complex now, but we got there by iterating with real work loads.

Upshot is I think it would be difficult to devise an interning API that would be sufficiently configurable for folks who want to go beyond basically a layer over a hashset.

@danmoseley
Copy link
Member

Another potential use case for the API proposed is a hashtable where each entry knows its own key (some object that has a name property for example). If you provide a comparer that compares only against the name, you can retrieve the object without the overhead of storing a separate key.

Oddly enough in that previous life I had to do that also
https://github.com/Microsoft/msbuild/blob/xplat/src/XMakeBuildEngine/Collections/RetrievableEntryHashSet/HashSet.cs

@JonHanna
Copy link
Contributor

JonHanna commented Feb 2, 2017

An interning API can grow in some very non-set-related ways; concurrency, weak references, matching stored strings with character array ranges, hash-consing factories, layered stores, generalising the store of string.Intern to use as a layer. Which is all possibly useful, and all quite possibly of no value to the original request, depending on just why they wanted hash sets to offer this.

Just adding TryGetValue could perhaps raise some philosophical objections on the grounds of what it means to be in a set (the difference between a.Union(b) and b.Union(a) etc. can become more noticeable), but fits in well with some current practical uses, is easy to add (the last time I needed something like this I copied the source of HashSet from here and added it, and it was straightforward to do). It's probably therefore worth doing. There's a lot less design work needed than considering the wider needs of an ideal interning API.

Worth doing, I think.

@TylerBrinkley
Copy link
Contributor Author

I've formalized this request.

@ianhays, @safern any thoughts?

@safern
Copy link
Member

safern commented Feb 7, 2017

I think adding TryGetValue would be useful and as you guys already mentioned there are some practical uses where this can be useful.

About the index getter, why should this be valuable if adding TryGetValue would already give users the tool to get the value in a safe way without throwing an exception if the key doesn't exist and knowing if the key did exist while trying to get the value without a need of a try - catch block. @TylerBrinkley

@TylerBrinkley
Copy link
Contributor Author

@safern I think you're probably right. The index getter is probably not that useful. I guess I was trying to mirror Dictionary and added it as an afterthought. I've removed it from the proposal.

@danmoseley
Copy link
Member

I updated the rationale to capture the use cases discussed above.

@safern
Copy link
Member

safern commented Feb 7, 2017

Thanks for updating that @danmosemsft.

API proposal looks good for me, so I will go ahead and add the label api-ready-for-review so that the members in charge of approving APIs can review it. Thanks @TylerBrinkley for your proposal, the review process should take one or a couple of weeks.

@terrajobst
Copy link
Contributor

terrajobst commented Feb 7, 2017

Looks good as proposed, but we should also add the same API on SortedSet<T> for consistency:

 namespace System.Collections.Generic {
     public partial class HashSet<T> {
+        public bool TryGetValue(T equalValue, out T actualValue);
     }
     public partial class SortedSet<T> {
+        public bool TryGetValue(T equalValue, out T actualValue);
     }
 }

By the way, we already exposed these APIs on the immutable collections for the same reason. We should be consistent with the parameter names used there (equalValue, and actualValue).

@karelz
Copy link
Member

karelz commented Feb 7, 2017

I have updated the top-post with the new API shape (@terrajobst's note above):

Original Proposal

 namespace System.Collections.Generic {
     public class HashSet<T> : ICollection<T>, IDeserializationCallback, IEnumerable, IEnumerable<T>, IReadOnlyCollection<T>, ISerializable, ISet<T> {
+        public bool TryGetValue(T key, out T actualValue);
     }
 }

New Proposal

 namespace System.Collections.Generic {
     public partial class HashSet<T> {
+        public bool TryGetValue(T equalValue, out T actualValue);
     }
     public partial class SortedSet<T> {
+        public bool TryGetValue(T equalValue, out T actualValue);
     }
 }

@TylerBrinkley
Copy link
Contributor Author

I'll take my stab at implementing this if that's alright.

@karelz
Copy link
Member

karelz commented Feb 7, 2017

I was just about to write it is straightforward and available :).
Assigned yo you @TylerBrinkley.

@safern
Copy link
Member

safern commented Feb 7, 2017

That is great @TylerBrinkley thanks for contributing!!

Thanks @karelz @terrajobst

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.