-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Iterate over smaller set for setdiff[!](a,b) #29048
Conversation
base/set.jl
Outdated
# get rid of pathological cases where length(s) <<< length(t) | ||
# the 0.5*length is a very conservative threshold | ||
function setdiff!(s::Set, t::Set) | ||
if length(s) < length(t) * 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about writing this as 2*length(s) < length(t)
? That will be simpler and faster than converting to float (although it doesn't really matter when doing much more expensive operations on collections, but waste not, want not and all that.)
Is the 2x length ration threshold empirically determined or is there some intuition behind it? Either way it's probably worth noting in a comment. |
I made some statistics with Ints as elements, starting with 1:1 ratio and it appeared that a good threshold for those is around 0.8, but this depends on a couple of things:
With those variables its hard to get to a generic threshold and therefore the conservative estimation of 0.5 (~things should not regress performance wise). Maybe it would help to implement a faster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I believe nanosoldier is on holidays, but the benchmarks in the OP are sufficiently convincing for me :)
@nanosoldier |
Derp |
I want to find a better solution which also includes the other set functions (symdiff, union, intersect) plagued by this asymmetric behaviour - should I in the mean time close this PR? |
It seems better to me to have a partial solution until we can have a more complete one. You can always revert this later if it's easier. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Benchmarks look to be their usual noisy selves. We do have a few |
* Iterate over smaller set for setdiff[!] * Add comment and change to integer multiplication (cherry picked from commit 773540d)
* Iterate over smaller set for setdiff[!] * Add comment and change to integer multiplication (cherry picked from commit 773540d)
Attempt to fix #25317 by specialising on
setdiff!(a::Set, b::Set)
and iterating overa
instead ofb
iflength(a) < length(b) * 0.5
. This0.5
length threshold is a conservative guess after benchmarking has shown, that the iteration overa
in caselength(a) == length(b)
slows down execution by ~20%.setdiff()
usessetdiff!
internally, so both profit from the performance improvement.Benchmarks
Code:
Master:
This PR: