-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dont segfault if btree range is not in order #39457
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @BurntSushi (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
src/libcollections/btree/map.rs
Outdated
|
||
min_edge = match (min_found, min) { | ||
(false, Included(key)) => { | ||
match search::search_linear(&min_node, key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the search_tree
calls to search_linear
? (FYI, I'm not familiar with the btree impl.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search_tree searches the entire tree and returns a handle anywhere in the tree. I need to search a single node at a time and be able to compare 2 searches. You might think that I want search_node, but that returns a handle. I need to be able to compare 2 edges of the same node, which I can't. search_linear just gives me an index value into the node, which is exactly what I need.
cc @rust-lang/libs Someone more familiar with the btree impl should review this. cc @gereeter With respect to the panic, I think that's consistent behavior with other collections. Either way, I think this behavior should be documented. |
also cc @arthurprs and @apasel422 |
This is very smart 👍 I wonder if it makes sense to have a |
@bors: delegate=arthurprs |
✌️ @arthurprs can now approve this pull request |
I've made some small cleanup changes, implemented range_mut as well, and changed the implementation so that it should never panic, but only return an empty iterator. @arthurprs That would only get rid of one variable, right? Personally, it doesn't seem worth it, and it seems to obfuscate the way I only care about comparing them until the point that they diverge, after which I don't care anymore. |
I'd really like to steer the discussion towards the strange cases of range(..) and what the behavior should be. I've read all of the rfcs and proposals and documentation that I can find, but no one seems to address these cases. If you know of where it's documented, let me know. If it's not currently documented, who makes these decisions? It might seem like an extremely small detail, but changing behavior later would be a breaking change. The way I see it, there are 8 interesting cases to think about, and 3 basic ways to handle them: Weird cases
where B > A: Option 1Return an empty iterator on cases 2-3. Immediately panic on cases 4-8. During iteration of the tree, if the 2 tree searches "cross directions" in the middle of searching, we know for sure that Ord is ill-defined, and the function will panic. Pros:
Cons:
Option 2Return an empty iterator on cases 2-4. Immediately panic on cases 5-8. Case 4 would be a strange case where in order to prevent a seg fault, one side of the Range might not end up where you would expect in order to force an empty iterator. Eg: the range.back handle might be on the right side of A, even though A was excluded. Pros:
Cons:
Option 3Return an empty iterator on cases 2-8, never panic. Declare that the range returned will always have range.back equal or to the right of range.front. This will naturally cause 2-8 to all be empty iterators. Pros:
Cons:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not getting around to looking at this earlier. This looks good to me, modulo unifying the mutable and immutable cases.
Personally, I think the most logical behavior is to panic on cases 4-8 and return an empty iterator on cases 2 and 3 (Option 1). This corresponds to panicking when the two ends cross, but not when they touch. I don't think documenting case 4 as panicking is that odd - intuitively, it is starting after A and ending before A. That just sounds wrong. In contrast, cases 2 and 3 are starting and ending at the same place.
src/libcollections/btree/map.rs
Outdated
|
||
if !diverged { | ||
// Don't allow max_edge to go to the left of min_edge | ||
if max_edge < min_edge { max_edge = min_edge; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do go this route of never panicking, this also could be an optimization - just search for max_edge
starting from min_edge
. This would never produced "crossed nodes" by construction, and would require fewer comparisons. I don't know whether this is worth it though. Additionally, it can be done later.
src/libcollections/btree/map.rs
Outdated
} | ||
GoDown(bottom) => bottom, | ||
} | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is identical to the range
case and fairly complex, so I think it would be useful to pull this logic out into a function that is generic over mutability (taking two aliasing root pointers and returning two leaf handles).
src/libcollections/btree/map.rs
Outdated
GoDown(bottom) => bottom, | ||
} | ||
loop { | ||
let min_edge = match (min_found, range.start()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that there is an at-all clean way to do this, but I'm curious whether it would be more performant to check Unbounded
vs. Included
vs. Excluded
before the loop. I expect it doesn't matter, though - it would just be removing some predictable branches, I think.
src/libcollections/btree/map.rs
Outdated
@@ -777,69 +773,65 @@ impl<K: Ord, V> BTreeMap<K, V> { | |||
reason = "matches collection reform specification, waiting for dust to settle", | |||
issue = "27787")] | |||
pub fn range_mut<T: ?Sized, R>(&mut self, range: R) -> RangeMut<K, V> | |||
where T: Ord, K: Borrow<T>, R: RangeArgument<T> | |||
where T: Ord, K: Borrow<T>, R: RangeArgument<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this indentation change should probably not be here.
cc @rust-lang/libs about the behavioral options laid out in #39457 (comment). Analogous APIs for slices, such as subslicing, tend to panic on nonsensical ranges, which I think corresponds most closely to Option 1. I think this is the right convention to follow, because such ranges are usually produced by a bug. |
I'm not fond of panicking in recoverable errors but Option 1 has precedents from the slice api, so that's probably the way to go. |
I agree with @aturon that erring on the side of more panics here seems good. That should be the conservative change here and I think we've got room to return empty iterators in the future if we really need to? |
01e4816
to
e8c63d0
Compare
@bvinc could you add some docstrings explaining the panic behavior? |
@bors: r+ |
📌 Commit fb91047 has been approved by |
Dont segfault if btree range is not in order This is a first attempt to fix issue rust-lang#33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?
Dont segfault if btree range is not in order This is a first attempt to fix issue rust-lang#33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?
Dont segfault if btree range is not in order This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?
💔 Test failed - status-appveyor |
… On Tue, Feb 14, 2017 at 8:01 AM, bors ***@***.***> wrote:
💔 Test failed - status-appveyor
<https://ci.appveyor.com/project/rust-lang/rust/build/1.0.1949>
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#39457 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD95NxubWXK0gk-yoiytj5A6Bc_vzKRks5rcbPKgaJpZM4L0zQe>
.
|
Dont segfault if btree range is not in order This is a first attempt to fix issue rust-lang#33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?
⌛ Testing commit fb91047 with merge 7edcd67... |
💔 Test failed - status-travis |
@bors: retry
* spurious network failure w/ crates.io
…On Wed, Feb 15, 2017 at 1:30 AM, bors ***@***.***> wrote:
💔 Test failed - status-travis
<https://travis-ci.org/rust-lang/rust/builds/201763727>
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#39457 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD95Ia5UeoiZyfzCrHO3J7ArUTzCldeks5rcql_gaJpZM4L0zQe>
.
|
Dont segfault if btree range is not in order This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?
💔 Test failed - status-travis |
… On Wed, Feb 15, 2017 at 1:06 PM, bors ***@***.***> wrote:
💔 Test failed - status-travis
<https://travis-ci.org/rust-lang/rust/builds/201950532>
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#39457 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD95E8GboearsXQq-1nJWu3U2h8jxDjks5rc0yfgaJpZM4L0zQe>
.
|
⌛ Testing commit fb91047 with merge 0d54a22... |
💔 Test failed - status-travis |
… On Wed, Feb 15, 2017 at 2:06 PM, bors ***@***.***> wrote:
💔 Test failed - status-travis
<https://travis-ci.org/rust-lang/rust/builds/201984747>
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#39457 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD95C5GCcWO2QxZqu7w1-gf6XStsUTHks5rc1qwgaJpZM4L0zQe>
.
|
Dont segfault if btree range is not in order This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?
☀️ Test successful - status-appveyor, status-travis |
BtreeMap range_search spruced up #39457 created a lower level entry point for `range_search` to operate on, but it's really not hard to move it up a level of abstraction, making it somewhat shorter and reusing existing unsafe code (`new_edge` is unsafe although it is currently not tagged as such). Benchmark added. Comparison says there's no real difference: ``` >cargo benchcmp old3.txt new3.txt --threshold 5 name old3.txt ns/iter new3.txt ns/iter diff ns/iter diff % speedup btree::map::find_seq_100 19 21 2 10.53% x 0.90 btree::map::range_excluded_unbounded 3,117 2,838 -279 -8.95% x 1.10 btree::map::range_included_unbounded 1,768 1,871 103 5.83% x 0.94 btree::set::intersection_10k_neg_vs_10k_pos 35 37 2 5.71% x 0.95 btree::set::intersection_staggered_100_vs_10k 2,488 2,314 -174 -6.99% x 1.08 btree::set::is_subset_10k_vs_100 3 2 -1 -33.33% x 1.50 ``` r? @Mark-Simulacrum
This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.
Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case.
Open questions:
The same question for other weird cases:
2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5.
4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?