-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BTreeSet symmetric_difference & union optimized #65226
Conversation
The results of the relevant benches in my separate repository (using
|
Oh sh... seems after many experiments I lost track of size_hint. size_hint is correct at the start, but can underestimate 1 after iteration has started. Fix coming up... |
let mut a_next = match self.peeked { | ||
Some(MergeIterPeeked::A(next)) => next, | ||
_ => self.a.next(), | ||
}; | ||
let mut b_next = match self.peeked { | ||
Some(MergeIterPeeked::B(next)) => next, | ||
_ => self.b.next(), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, if I define both (a_next, b_next) with a single match expression (which looks more elegant to me), the performance gain vanishes completely halves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate, it feels like that kind of detail has been there for a loong time (since early Rust 1.x), that tuple match has inferior codegen.
Since I claim this fancy MergeIter is so much better, why doesn't intersection use it too?
MergeIter could be taught the latter trick too, but even the former point on its own still makes a difference. This is what changes if intersection stitch uses a MergeIter that bails out as soon as it can:
In addition, the current implementation of intersection is pretty simple, so there is no readability gain by complicating MergeIter instead. Difference has a tougher implementation. This is what changes if difference stitch uses a MergeIter that bails out as soon as it can:
And this is what change if difference avoids Peekable without relying on MergeIter:
|
It would be cool to see the improvements as bench diffs, like https://github.com/BurntSushi/cargo-benchcmp style. (Just if you think it's interesting! I think it has prefix/suffix filters?) |
With the latest benching code, and building benches in two flavours, running the same benches with both, and then comparing "old" (nightly) with "new" (this PR):
With the _alone measurement instead, the improvement is even better. But if we compare old with "newer" instead (i.e. #65375), as reported there, the _crowd measurement says the change is a total dud, and only the _alone measurement justifies doing it. By the way, across 5 samples, the measurements for each flavour are within 5% and just a few over 3% different. But the same measurement for "difference" instead of "symmdiff" sometimes exhibits a ~10% boost across all benches, in both flavours. |
@bluss it depends... this PR is always better than the status quo, but #65375 is somewhere between a status quo and noticeably better than the best measurement of this PR. E.g. comparing new (this PR) and newer (#65375) with the benches in isolation, threshold 5%:
|
Some more experiments confirm the latest findings, and that some of the ideas in #65375 only hold if you exclude other benches from the build. In particular, my claim that the enum variants So whether #65375 is better remains a question, but my opinion is now the same as just before opening this PR: there's no real evidence that Peeking is better. The only thing that's still confirmed is that replacing Peekable with Peeking boosts |
f.debug_tuple("Intersection").field(&answer).finish() | ||
} | ||
} | ||
f.debug_tuple("Intersection").field(&self.inner).finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm not proposing any change.) I can't see or remember why these don't use derive(Debug), anyone know? It looks like implementation including trait bounds matches what the derive would produce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matches, the old code takes out the enum variant names. But I think those are pretty handy. And I don't think brevity is a good counterargument, given that fields like iterators list the whole set awaiting (at least for small sets, I haven't tried big ones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh but there's one thing the old code also left out: the set reference itself, for the search variants. That's just something I figured would be better, because it never changes throughout iteration. I don't think it was ever discussed.
If there's Since there is no code around to limit the number of elements shown for an iterator, and if since there's no such code for showing a set, than debugging with large sets would does become more painful. Or more accurately, the Search alternative inadvertently helped debugging by hiding the large set and that brief period of relief is now over.
But for debugging small sets, it becomes easier, since you can see everything involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, now I see another angle to the question. My guess is that it's the only way to get that 1.17 version showing up in the documentation
.field(&self.a) | ||
.field(&self.b) | ||
.finish() | ||
f.debug_tuple("SymmetricDifference").field(&self.0).finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked on the internals forum whether we could prefer the style f.debug_tuple(stringify!(SymmetricDifference))
when we hard-code the struct/enum name. I still prefer that style, but on the forum the suggestion got no interest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer that style too, but not passionately. I haven't even seen that stringify! in the wild since reading about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be more convinced if it could be f.debug_tuple(stringify!(Self))
, to prevent copy-and-paste errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just print Self.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stringify! has no smarts, but at least it look like an ident and a refactoring tool could potentially handle renames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and even though renaming seems far-fetched for a stable struct like SymmetricDifference, it's still useful to guide others when they base their work on the source code.
But consistency is also a virtue, and Iter
does a literal string too, so should we stringify that otherwise unrelated code? Come to think of it, why doesn't debug_set have a name? And is there a way to find posts containing debug_set on internals.rust-lang.org (hint: it's not through the search facility and not by replacing the underscore with %5F in the URL).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been decided that it is not necessary to have type information in the Debug formats, but structs usually have their name (for clarity I guess). The nameless set probably begins with the nameless Vec formatting, which comes naturally from delegating to the slice's formatting, which is also nameless.
Keep it consistent if you want. We shouldn't go out to change unrelated code in this PR, and changing this to use stringify
in the whole crate is probably needlessly contentious.
struct MergeIterInner<I> | ||
where | ||
I: Iterator, | ||
I::Item: Copy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The where
style in the new code does not match the existing code in the file.
I'd prefer to keep the existing style, we keep the same file consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I normally rustfmt everything, and there are only 2 places where set.rs doesn't follow suit (but that's because a lot of code is commented out in my working copy).
Note sure what the existing style is though. They're all functions instead of types, and while most have one line per clause, BTreeSet::range has all clauses on one line, split_off squashes further. map.rs shows them on structs, but is in doubt about the number of clauses per line.
So it seems we indent where always once (or twice, but not trice for nested functions) and put the first clause behind.
Ok, so now I abandon the other PR and we focus on this one, that sounds good? |
match self.0.nexts() { | ||
(None, None) => return None, | ||
(Some(item), None) => return Some(item), | ||
(None, Some(item)) => return Some(item), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be combined into one branch (Some(item), None) | (None, Some(item)) =>
And I guess it could be in the style of (elt @ Some(_), None) | (None, elt @ Some(_)) => return elt
. Same thing, I guess it can't possibly change codegen(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the @ thing. I don't think combining here helps readability because the repeated code is tiny and aligned, but I can live with it.
You raise an interesting name debate: I recently got to https://internals.rust-lang.org/t/the-is-not-empty-method-as-more-clearly-alternative-for-is-empty/10612/91 and took for granted the statement that we should be talking about items and not elements. Now I see elements in the comments too. But I wouldn't call an optional element an element, I'd call it never mind, the match says it's really an element/itemnext
.
I'm going to benchmark too if a_next.and(b_next).is_none() { return a_next.or(b_next); }
. Not a beauty but it is short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option::xor is new. I'm mildly excited because it looks like it fits here. Oh. it doesn't fit exactly.. let down :)
I didn't intentionally change item
to elt
. elt
is just what my fingers produce without thinking. Using item
is of course fine. (I learned elt
from the rust community I believe.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how xor fits here? I used it instead of or
here, but it doesn't serve any purpose there.
let (a_next, b_next) = self.0.nexts();
if a_next.and(b_next).is_none() {
return a_next.xor(b_next)
}
Now hold on to your seat (or be weary of benchmarks)... this xor thrashes performance by up to 20%. Just turn that xor
into or
and performance is back to what it is with the 4 line match above.
Another disappointment:
match self.0.nexts() {
(None, None) => return None,
(next @ Some(_), None) | (None, next @ Some(_)) => return next,
(Some(_), Some(_)) => (),
}
is about 6% worse, and more so with the "parted_" benches, where there are no (Some(), Some()) matches. I simplified it further to (a surprise in itself):
match self.0.nexts() {
(Some(_), Some(_)) => (),
(next, None) | (None, next) => return next,
}
and that improves it to just about the best level, except for the "parted_" benches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and earlier I had tried some alternatives to MergeIter::next returning a pair of items:
- (Option<&Item>, usize) where second is the number of sets the item (if any) was in: seemed okay
- (Option<&Item>, bool) where second is basically "the if clause of SymmetricDifference": ugly, even if I'd turn the bool into an enum
- move that bool into MergeIter, or just the next function itself: ugly
and none seemed to perform better or worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw another angle to that xor: pass a lambda to the next function that says what to do with a result on both sides. Option::or for Union, Option::xor for SymmetridDifference. And as a consequence move the possible loop inside MergeIterInner. Looks nice, actually, but performance suffers (according to what we now trust as benchmarking), for both SymmetricDifference and Union.
Looks good to me, I only had style questions |
Yeah, basically what I've done, but I'll still be trying to find hints or answers whether and why benches fooled us. |
Cool, you've done a lot of exploration. This is r=me when you think it is ready |
ac45719
to
b043380
Compare
Whitespace around Current benchmark comparisons, with 5% threshold: no change for difference & intersection, symmetric_difference & union say:
r=bluss |
Thanks! @bors r+ rollup |
📌 Commit 5697432 has been approved by |
BTreeSet symmetric_difference & union optimized No scalability changes, but: - Grew the cmp_opt function (shared by symmetric_difference & union) into a MergeIter, with less memory overhead than the pairs of Peekable iterators now, speeding up ~20% on my machine (not so clear on Travis though, I actually switched it off there because it wasn't consistent about identical code). Mainly meant to improve readability by sharing code, though it does end up using more lines of code. Extending and reusing the MergeIter in btree_map might be better, but I'm not sure that's possible or desirable. This MergeIter probably pretends to be more generic than it is, yet doesn't declare to be an iterator because there's no need to, it's only there to help construct genuine iterators SymmetricDifference & Union. - Compact the code of rust-lang#64820 by moving if/else into match guards. r? @bluss
BTreeSet symmetric_difference & union optimized No scalability changes, but: - Grew the cmp_opt function (shared by symmetric_difference & union) into a MergeIter, with less memory overhead than the pairs of Peekable iterators now, speeding up ~20% on my machine (not so clear on Travis though, I actually switched it off there because it wasn't consistent about identical code). Mainly meant to improve readability by sharing code, though it does end up using more lines of code. Extending and reusing the MergeIter in btree_map might be better, but I'm not sure that's possible or desirable. This MergeIter probably pretends to be more generic than it is, yet doesn't declare to be an iterator because there's no need to, it's only there to help construct genuine iterators SymmetricDifference & Union. - Compact the code of rust-lang#64820 by moving if/else into match guards. r? @bluss
Rollup of 5 pull requests Successful merges: - #64007 (Add check for overlapping ranges to unreachable patterns lint) - #65192 (Use structured suggestion for restricting bounds) - #65226 (BTreeSet symmetric_difference & union optimized) - #65448 (rustc_codegen_ssa: remove some unnecessary Box special-casing.) - #65505 (Rc: value -> allocation) Failed merges: r? @ghost
No scalability changes, but:
r? @bluss