-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
union-types: use insertion (stable) sort instead of qsort #45896
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally forgot about this code. Silly qsort
on windows.
Different platforms implement qsort differently, leading to platform-specific errors. This is a quick port of the ml_matches algorithm for use instead. For small unions (almost always), this should also be slightly faster, though insignificant. Refs #45874
while j > lo && lt(o, x, v[j-1]) | ||
v[j] = v[j-1] | ||
while j > lo | ||
y = v[j-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any case where the compiler doesn't already do this optimization?
The only case I can think of would be stateful getindex
function getindex(x::LogAtReadArray, i::Int)
println(i)
getindex(x.data, i)
end
which isn't particularly compelling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any moderately complex or non-inlined getindex might not be eliminated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess those likely exist, I've just never seen a subtype of AbstractVector
with a complex getindex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any non-inlined lt
user-method will also disallow this from occurring as a direct optimization (it would be semantically invalid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how that would be semantically invalid, but I'm not familiar with the relevant compiler semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could throw for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not seeing it. Even if lt
throws, y = v[j-1]; lt(o, x, y)
is indistinguishable from lt(o, x, v[j-1])
.
As a sanity check,
using Primes
@noinline function lt(x, y)
isprime(x) && error()
x < y
end
@code_llvm sort!([1,2,3], 1, 3, Lt(lt))
Looks the same before and after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks different before/after to me. There is an excess load in the before case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now! lt
could mutate v
. That would make this optimization change behavior. That behavior change is okay with me, but not okay with the compiler. The compiler cannot store the result of v[j - 1]
unless v[j - 1]
is pure with respect to the interleaving calls (including lt
) which it can only know if both getindex
and lt
are simple and inlined.
I must have made a mistake in diffing. You're right, there is an extra load in the before case.
I'd be hard-pressed to find a case where this makes a measurable runtime difference, but I now see why it is an improvement in theory.
Thanks, @KristofferC and @vtjnash for explaining this to me!
…45896) Different platforms implement qsort differently, leading to platform-specific errors. This is a quick port of the ml_matches algorithm for use instead. For small unions (almost always), this should also be slightly faster, though insignificant. Refs JuliaLang#45874
…45896) Different platforms implement qsort differently, leading to platform-specific errors. This is a quick port of the ml_matches algorithm for use instead. For small unions (almost always), this should also be slightly faster, though insignificant. Refs JuliaLang#45874
Different platforms implement qsort differently, leading to
platform-specific errors. This is a quick port of the ml_matches
algorithm for use instead. For small unions (almost always), this should
also be slightly faster, though insignificant.
Refs #45874