-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std: Add retain method for HashMap and HashSet #39560
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @bluss (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
src/libstd/collections/hash/map.rs
Outdated
/// map.retain(|&k, _| k % 2 == 0); | ||
/// assert_eq!(map.len(), 3); | ||
/// ``` | ||
#[stable(feature = "retain_hash_map", since = "1.17.0")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should enter the source tree as an unstable
feature. The feature name is good. I'll just give you an initial review libs team will decide on ultimately accepting this or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we learned from Vec::retain
's restrictions is that mutable access while retaining is useful. In this case it should then be that it passes &K, &mut V
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accept.
src/libstd/collections/hash/set.rs
Outdated
/// set.retain(|&k| k % 2 == 0); | ||
/// assert_eq!(set.len(), 3); | ||
/// ``` | ||
#[stable(feature = "retain_hash_set", since = "1.17.0")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same (unstable
) The two methods can use the same feature name, that makes it simpler; they will almost surely be stabilized later as a unit.
src/libstd/collections/hash/map.rs
Outdated
for (h, k, v) in old_table.into_iter() { | ||
if f(&k, &v) { | ||
self.insert_hashed_nocheck(h, k, v); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good. It could also use the ordered insertion loop that resize
is using, since we're iterating in hash order and preserving the order.
Either of those are not in place, and ideally this operation should be in place, I wonder if that can be done efficiently? I'd prefer to have this API even if it is not implemented in place. Since it's only ever removing keys, I don't see any traps (cases where it runs very slowly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating the map buckets in reverse order and deleting when needed should be efficient (lower number of backward shift) but I'm not sure if the current XXXBucket interface supports that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current bucket table interface doesn't support reverse iteration over buckets. Adding that support is as simple as writing a prev
method, though. For deletion, you should be able to use code similar to the existing backward shift.
Anyway, @F001's implementation is a good starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this function by "in place" algorithm. And used reverse iteration.
Though, the benchmark data has not collected. I don't know whether it is more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great addition.
Even if the non-inplace algorithm performs better in some cases, I think the inplace is the way to go in order to allow caller to intentionally avoid the memory allocator.
src/libstd/collections/hash/table.rs
Outdated
// during insertion. We must skip forward to a bucket that won't | ||
// get reinserted too early and won't unfairly steal others spot. | ||
// This eliminates the need for robin hood. | ||
pub fn buffer_head(table: M) -> Bucket<K, V, M> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about head_bucket
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
} | ||
Empty(b) => { | ||
// Encountered a hole between clusters. | ||
b.into_bucket() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we use it for retain we should early return on an empty bucket as well, avoiding walking the buckets more than needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. Empty bucket and occupied bucket are not in separate groups. I have to walk all the buckets without missing any one, just like the clone
method does.
Could you please explain more about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the way RobinHood hashing works as soon as you encounter an empty bucket the next non-empty bucket is guaranteed to have displacement = 0. Right now the code goes all the way to the first full bucket with displacement = 0, which is fine for both use cases but may mean extra work for retain. Changing it to return on empty bucket as well changes nothing for resize and possibly saves this bit of work on retain. Micro, but still...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind this, it's probably not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still struggling to fully understand the algorithm...
Thank you all the same.
Is there any other issue there?
src/libstd/collections/hash/table.rs
Outdated
@@ -1061,6 +1129,45 @@ impl<K: Clone, V: Clone> Clone for RawTable<K, V> { | |||
} | |||
} | |||
|
|||
impl<K: Debug, V: Debug> Debug for RawTable<K, V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep these out to avoid unnecessary code? The end user can't use these anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -416,22 +416,26 @@ fn search_hashed<K, V, M, F>(table: M, hash: SafeHash, mut is_match: F) -> Inter | |||
} | |||
} | |||
|
|||
fn pop_internal<K, V>(starting_bucket: FullBucketMut<K, V>) -> (K, V) { | |||
fn pop_internal<K, V>(starting_bucket: FullBucketMut<K, V>) | |||
-> (K, V, &mut RawTable<K, V>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to return Bucket<K, V, ...>
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just one small issue.
@@ -416,22 +416,26 @@ fn search_hashed<K, V, M, F>(table: M, hash: SafeHash, mut is_match: F) -> Inter | |||
} | |||
} | |||
|
|||
fn pop_internal<K, V>(starting_bucket: FullBucketMut<K, V>) -> (K, V) { | |||
fn pop_internal<K, V>(starting_bucket: FullBucketMut<K, V>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you revert this to return the &mut Table (reverting my previous suggestion)? Sorry about that.
Right now it isn't consistent regarding what bucket it returns. Fixing that would increase the diff even more for no good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Should I squash all the commits together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@bors: r+ |
📌 Commit d90a7b3 has been approved by |
⌛ Testing commit d90a7b3 with merge cccf756... |
@bors: retry |
std: Add retain method for HashMap and HashSet Fix rust-lang#36648 r? @bluss
std: Add retain method for HashMap and HashSet Fix rust-lang#36648 r? @bluss
std: Add retain method for HashMap and HashSet Fix rust-lang#36648 r? @bluss
⌛ Testing commit d90a7b3 with merge 9344cd3... |
I overlooked it before but we may want a have a &V instead of &mut V for consistency with Vec and other friends (#25477) even if it's strictly less useful. |
Ah no that's intentional, we'd like to change |
💔 Test failed - status-travis |
… On Tue, Feb 14, 2017 at 10:28 AM, bors ***@***.***> wrote:
💔 Test failed - status-travis
<https://travis-ci.org/rust-lang/rust/builds/201531576>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#39560 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD95Ol-4oQ-zhQdF3nWXn3pQbcCaP2jks5rcdYWgaJpZM4L3cmy>
.
|
std: Add retain method for HashMap and HashSet Fix rust-lang#36648 r? @bluss
☀️ Test successful - status-appveyor, status-travis |
Implement BTreeMap::retain and BTreeSet::retain Adds new methods `BTreeMap::retain` and `BTreeSet::retain`. These are implemented on top of `drain_filter` (rust-lang#70530). The API of these methods is identical to `HashMap::retain` and `HashSet::retain`, which were implemented in rust-lang#39560 and stabilized in rust-lang#36648. The docs and tests are also copied from HashMap/HashSet. The new methods are unstable, behind the `btree_retain` feature gate, with tracking issue rust-lang#79025. See also rust-lang/rfcs#1338.
Implement BTreeMap::retain and BTreeSet::retain Adds new methods `BTreeMap::retain` and `BTreeSet::retain`. These are implemented on top of `drain_filter` (rust-lang#70530). The API of these methods is identical to `HashMap::retain` and `HashSet::retain`, which were implemented in rust-lang#39560 and stabilized in rust-lang#36648. The docs and tests are also copied from HashMap/HashSet. The new methods are unstable, behind the `btree_retain` feature gate, with tracking issue rust-lang#79025. See also rust-lang/rfcs#1338.
Implement BTreeMap::retain and BTreeSet::retain Adds new methods `BTreeMap::retain` and `BTreeSet::retain`. These are implemented on top of `drain_filter` (rust-lang#70530). The API of these methods is identical to `HashMap::retain` and `HashSet::retain`, which were implemented in rust-lang#39560 and stabilized in rust-lang#36648. The docs and tests are also copied from HashMap/HashSet. The new methods are unstable, behind the `btree_retain` feature gate, with tracking issue rust-lang#79025. See also rust-lang/rfcs#1338.
Fix #36648
r? @bluss