-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make LinSpace generic #18777
Make LinSpace generic #18777
Conversation
Oops, put my quotes in the wrong place. @nanosoldier |
I think nanosoldier runs on Haswell hardware. Does it compile julia with a generic arch that disables FMA or does it compile with native arch? @jrevels |
Argh, a last-minute untested tweak introduced a build bug. I'll re-launch nanosoldier once CI passes. |
No, it doesn't disable FMA AFAIK. I'm assuming that would be an option/flag I'd have to enable before running Also, for when you relaunch: |
@_inline_meta | ||
t = j/d | ||
# computes (1-t)*a + t*b | ||
T(fma(t, b, fma(-t, a, a))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put T(j/d)
and remove the T
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will fail if a, b = rand(5), rand(5)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address the "confused": the point being this is the fallback implementation, so should support any type that implements fma
. T
might not be a number. Or, T
might be a number type like Fixed{Int8,7}
(from FixedPointNumbers) which can't represent 1
. Linear interpolation between two valid numbers is still perfectly fine in such circumstances, though, so there's no need to fail. That's why I'm not going to take your suggestion 😄.
Unfortunately non hardware fma is about 40 times slower on my machine. Also, anyone who downloads the windows binaries will also not have access to hardware fma, unless they manually build a system image, even if their cpu supports it. |
Yeah, it's really bad without the hardware acceleration. I suspect the approach I outlined would make things hugely better, but it's hard to know what the final result would be. Let's see whether enough people like the overall direction, though, before deciding to tackle the performance question. |
Though cross-linking to JuliaMath/DoubleDouble.jl#18 in case anyone gets excited about tackling the performance question now 😄. |
Most of this is passing, so let's @nanosoldier Looks like we have a problem on 686 in the nearly-overflowing-numbers tests. Why would overflow result in an |
That was indeed me who wrote the linspace tests (and float ranges tests). I'm glad they were useful. It's basically impossible to get this stuff working without a very thorough test suite – including some randomly generated test cases, since you can't (or at least I couldn't) think of all the nasty things that can actually happen. I agree that the cases that are "broken" by this PR are marginal enough to ignore – and maybe we can fix I'm sold on this change except the last part about
The "no guessing" version of
With naive behavior The only other option is to guess at values of A, B and N, and by implication S = (B-A)/N, such that
That's what my "mind reading" code does – it finds rational values that project down onto People are uneasy about this "guessing" – I suspect, largely because they don't understand what or why it's doing what it's doing, not because it's doing something they don't like. (There haven't been any complaints about
The current algorithm, although it's technically guessing, is justified in doing so – because some amount of guesswork is inherent in the |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
OK, I had temporarily forgotten about cases like So I'd amend my statement about Along these lines: I do suspect something in the julia> using Unitful
julia> const mm = u"mm"
mm
julia> r = (1:800)*(0.8mm);
julia> typeof(r)
Array{Quantity{Float64, Dimensions:{𝐋}, Units:{mm}},1}
julia> last(r)
640.0 mm
julia> r = 0.8mm:0.8mm:(800*0.8mm);
julia> typeof(r)
StepRange{Quantity{Float64, Dimensions:{𝐋}, Units:{mm}},Quantity{Float64, Dimensions:{𝐋}, Units:{mm}}}
julia> last(r)
640.0 mm
julia> length(r)
799 # <--- from the above you know it's lying here, really there are 800 elements
julia> rc = convert(Array, r);
julia> typeof(rc)
Array{Quantity{Float64, Dimensions:{𝐋}, Units:{mm}},1}
julia> length(rc)
799
julia> last(rc)
639.1999999999979 mm # but now there are not So we have this situation where the last element isn't a reliable guide to the length (computed from julia> r1 = 0.0mm:0.1mm:0.9mm
0.0 mm:0.1 mm:0.8 mm
julia> r2 = range(0.0mm, 0.1mm, 10) # hmm, OK, try specifying 10 elements
0.0 mm:0.1 mm:0.8 mm # <---- it gives me 9
julia> r1 == r2
true
# And this is just lovely:
julia> length(r2) # ... and claims I have only 8
8
julia> rc = convert(Array, r2) # ...which is all it keeps here
8-element Array{Quantity{Float64, Dimensions:{𝐋}, Units:{mm}},1}:
0.0 mm
0.1 mm
0.2 mm
0.3 mm
0.4 mm
0.5 mm
0.6 mm
0.7 mm so now we're down by two elements from where we intended to start. As an argument that Overall, I think we're very much on the same page, it's just a question of figuring out what to do about the "step" ranges. It occurs to me that one possible solution would be to unify them into a single type which preserve the user's original inputs in addition to whatever computations we perform twiddling the parameters. That would seem to be a way to help ensure that |
Alright, so it seems this has a future. In addition to the general approval, I notice that the nanosoldier run worked out fairly well, with most changes being improvements and the worst regressions being ~1.5x. So I think this moves into "what do we need to do to finish this off?" mode. Here are items I see:
Of those latter options, I kind of think we want both of the last items, and that at least one must be in place before we can merge this. |
It's not just windows binaries that are built without fma support. Mac binaries are currently built for core2 (the earliest Intel macs), i686 Linux and Windows binaries are built for pentium4, and x86_64 Linux and Windows binaries are built for x86-64 (though requiring cmpxchg16b recently, not without causing problems - #18701). Building binaries that don't start because you don't have recent hardware isn't very nice. Retuning for the native hardware after install time might work, but is a bit time consuming and requires some toolchain components that aren't necessarily always present. We can start trying to do what openblas does and pre-build multiple copies of the system image for different instruction set generations and switching at runtime, but that requires some infrastructure that we currently don't have (see #14995 for a not totally general prototype). It also doesn't help with all of the C libraries other than openblas (e.g. openlibm) that don't implement their own optimized kernel runtime dispatch. |
There was a somewhat regrettable discussion on hacker news a few days back, the conclusion of which (from most onlookers) was that we probably don't need to export a |
end | ||
|
||
""" | ||
linspace(start::Real, stop::Real, n::Real=50) | ||
linspace(start, stop, n=50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signature needs adjusting in rst too
(possibly unless the manual conversion gets merged before this - not sure) it did
I guess it would be good to write down exactly what we want. From what I understand, the intention is that
Points 1-3 seem reasonable. 4 is trickier (at least for floating point), and this PR doesn't quite provide that, e.g. function lerpi{T}(j::Integer, d::Integer, a::T, b::T)
t = j/d
# computes (1-t)*a + t*b
T(fma(t, b, fma(-t, a, a)))
end
function lerpi_big(j,d,a,b)
ba = big(a); bb = big(b); bj = big(j); bd = big(d)
oftype(a,(ba*(bd-bj) + bb*bj)/bd)
end
a = 2.0693042755264957
b = 0.18522296289341592
j = 6
d = 11
I'm not sure easy it would be to actually implement. Alternatively, we could weaken condition 4, in that case it might be possible to dispense with the fmas. |
# This is separate to make it useful even when running with --check-bounds=yes | ||
function unsafe_getindex(r::LinSpace, i::Integer) | ||
d = r.lendiv | ||
j, a, b = ifelse(2i >= length(r), (i-1, r.start, r.stop), (length(r)-i, r.stop, r.start)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure you want the branch the other way, as you want t = j/d < 0.5
(otherwise the inner fma buys you nothing, since 1-t
is exact).
I've tried out a couple of different strategies here: https://gist.github.com/simonbyrne/45c5924940746d628e94de38a78ae37c The tldr is that with one small modification, this is the best approach; the next best is a fairly simple one. Now all these methods fail catastrophically if |
Out of curiosity, where does the name |
@simonbyrne, really glad to see you chiming in here. Thanks for the help. I haven't had time to tackle this yet today, but I will check this out carefully ASAP. @StefanKarpinski, I'm not remotely wedded to the name, alternatives are welcome. |
Again, thanks @simonbyrne for getting involved. I was sort of hoping you might find this interesting, and was looking forward to your input. So, first, you're absolutely right that I misspoke above, in two ways:
OK, so what tolerance matters? You measured the precision of the answer against itself. My tests measured the precision of the answer against the endpoints, using Using a slight modification of your test script, here are the answers I get: function ans_worst ans_median end_worst end_median
lerp0 2.131 0.351 2.131 0.224
lerp1 1.809 0.333 1.702 0.212
lerp2 2.193 0.352 2.083 0.224
lerp3 1.394 0.273 1.172 0.172
lerp4 132.738 0.292 1.098 0.193
lerp5 2.174 0.291 1.840 0.189 The The most interesting case is naturally The reason I suspect Now, interestingly, this reveals that It is interesting, though, how much better worst-case behavior the two |
I added two more to the mix, lerp6 1.474 0.276 1.119 0.179
lerp7 0.500 0.250 0.500 0.154
|
I'm comparing the error to the result computed in BigFloat precision, so you would expect the minimum error to be 0.5 ulps (when the BigFloat answer is halfway between 2 floating point numbers). |
Here's how one could do it without DoubleDouble.jl: Which seems to be correctly rounded (i.e. accurate to within 0.5 ulp): the other nice feature is that the division and fma can be done at the creation of the |
Ah, yes, I just figured that out too. If you round the result to function ans_worst ans_mean ans_median end_worst end_mean end_median
lerp0 2.000 0.337 0.000 2.000 0.233 0.000
lerp1 2.000 0.309 0.000 2.000 0.215 0.000
lerp2 2.000 0.338 0.000 2.000 0.234 0.000
lerp3 1.000 0.199 0.000 1.000 0.140 0.000
lerp4 83.000 0.250 0.000 1.000 0.131 0.000
lerp5 2.000 0.222 0.000 2.000 0.141 0.000
lerp6 1.000 0.194 0.000 1.000 0.119 0.000
lerp7 1.000 0.000 0.000 1.000 0.000 0.000
lerp8 1.000 0.001 0.000 1.000 0.000 0.000
My new laptop had a motherboard problem and is out for repair, so I don't have easy access to a machine with hardware |
Some kind of rounding is needed, but one could probably get sensible behavior "cheaply" like this: function colon(a, s, b)
len_est = (b-a)/s + 1
StepRangeLen(a, s, floor(Int, len_est+10*eps(len_est)))
end This doesn't make the returned values "pretty," of course. I just think that @andreasnoack is saying that, to him, |
AppVeyor failure is unrelated. Glad to see this emerge from the other side! |
I don't see how getting within 10 ulps (or any other simplistic criterion) is actually better that what we do now – it's just a dumber way to not take the user's input literally, and will often result in a step and/or endpoint which do not round to the given floating-point values. That's the opposite of taking the given inputs at face value. I do, however, think that something more comprehensive and principled might be possible. In particular, the main objection to "guessing" can only be that there might be multiple, observably different interpretations of the user's input. In other words for a given |
Sure. Let me clarify that I'm fine with having it stay like it is now (pending real-world feedback). But also keep in mind that even now there's no guarantee that I think an easier simplification might be to consider ditching |
Right, there's the complication that the user's intention might not be to hit julia> d = rand()
0.3359637381398932
julia> length(d+0.1:0.1:d+0.9), length(0.1:0.1:0.9)
(8,9)
julia> mean((d = rand(); length(d+0.1:0.1:d+0.9) != 9) for _=1:10^7)
0.4471468 This fails for about 45% of the time but it should succeed every time if we were searching thoroughly. Of course it's possible that some of these are actually impossible to satisfy since |
/(r::FloatRange, x::Real) = FloatRange(r.start/x, r.step/x, r.len, r.divisor) | ||
/(r::LinSpace, x::Real) = LinSpace(r.start / x, r.stop / x, r.len, r.divisor) | ||
# For #18336 we need to prevent promotion of the step type: | ||
+(x::Number, r::AbstractUnitRange) = range(x + first(r), step(r), length(r)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must've not noticed these before, but assuming it's intentional that you've widened these from Real to Number? Should probably be tested if we're sure we want to keep that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost certain there's a test something like (1:3)+im
, although I don't know whether the output type is tested.
But yes, there's no particular reason to restrict to Real
anymore.
Yes.
beautiful and floating point numbers usually don't go together, but yes, I don't think you should care if it is |
Empirically, people do and we don't get bug reports about this sort of thing anymore 😬 |
I've noticed that, 👍. This suggests that even though many of us are happy to tolerate ulp-level differences, expunging them is a good thing. Fewer bug reports (less email) is priceless. Will be interesting to see if someone comes up with a simpler approach, but for now we may have to live with surprisingly-complex code for ranges. Even if it's complex, getting both generality AND better accuracy out of this most recent change suggests we're doing quite a lot right. |
Amazing work here. I am very excited about ranges supporting generic types! I just noticed, but is this method for The most important thing is that the range types themselves now work with generic types, so I'm very excited about that. I just want to make sure I'm implementing things sensibly and in line with any ideas @timholy has about how this should all fit together. |
Right, I suspect you should write your own |
Make trunc, round, floor, ceil testing more precise
KernelDensity.jl rolled its own range generator which suffers from similar flaws as Base's linspace prior to being fixed in JuliaLang/julia#18777. This closes JuliaStats#39.
KernelDensity.jl rolled its own range generator which suffers from similar flaws as Base's linspace prior to being fixed in JuliaLang/julia#18777. This closes JuliaStats#39.
KernelDensity.jl rolled its own range generator which suffers from similar flaws as Base's linspace prior to being fixed in JuliaLang/julia#18777. This closes #39.
This is my proposed alternative to #18492 for fixing the issues with LinSpace. The main attractions of this PR are:
linspace
with any typeT
for which(1-t)*a::T + t*b::T
makes sense.Given the large amount of work & discussion that has gone into
linspace
and friends with respect to numerical precision (at least #2333, #9637, #18492), it would be remiss not to comment (extensively 😉) on how this does and does not address previous concerns or implementations. Let me address the issues one-by-one.Mind-reading
First and foremost, this implementation does not try to read your mind, whereas the implementation currently in master does. The easiest way to see what I mean is to compare master's behavior:
with the behavior of this PR:
Note that, given the specified endpoints, that's the correct answer:
The version in master returns 0.1 essentially because it interprets the right endpoint 0.3 as
3//10
. This is what I refer to as mind-reading, and while it has some undeniable attractions, it's (1) only applicable to special types (e.g., Float32/64), (2) fairly fragile (it's the origin of #14420), and (3) arguably wrong even in principle (the user said 0.3, not3//10
).Note that, now, people who want mind reading can be told to use
Rational
s directly:Precision
All that said, there's still a valid concern about precision: linear interpolation is famously subject to roundoff error. This implementation returns results with full numeric precision by using two calls to
fma
. The key function (which other types can specialize) is:As long as
j >= d/2
, it turns out you don't need to computet
to higher precision for this to return a numerically-precise result (as long as you usefma
).All the (new) tests compare the result against what happens when you take the two endpoints, convert them to BigFloats, perform all interpolation at BigFloat precision, and then cast back to the element type. That's what I mean by "full precision" results---there is no roundoff error to within the precision of the numerical type, as determined by the inputs upon construction.
Test changes and deletions
First, I have to say that whoever wrote the old tests (mostly Stefan?) did an amazing job. While my final implementation is dirt-simple, I confess it was not the first (or second, or third...) thing I tried; it was very demanding to find an implementation that could pass the "good" ones (spelled out below). The extensive tests saved me from many implementations that seemed equivalent but ran into trouble in corner cases. 💯
That said, anyone who looks at this PR will see many test changes and a few deletions, because the new implementation does not pass many of our old tests. For the vast majority of these I am unconcerned and changed them guilt-free, because most of these turned out to rely on mind-reading: in particular, they don't pass the "convert endpoints to BigFloat before performing operations" test. Even the so-called "additive identity" tests (e.g.,
(r - 1) + 1
) fall in that category when combined withcollect
, due to the non-associativity of floating-point addition. (The old tests happened to pass for the provided inputs, but it doesn't take long at all to discover that tweaking the endpoints in those tests, in just about any way, breaks them. So we were testing for an "identity" that doesn't actually hold in general.)However, a few of the test deletions make me sad or are worthy of public confession:
u = eps(0.0)
: most tests pass, but ones that specifically test for generation of-0.0
don't. I'm basically unbothered by this---we have several concerns about-0.0
anyway (e.g., Should zero(::Float) = -0.0? #18341). If you think of it as a bug, it's probably best thought of as anfma
bug, not a problem with this implementation oflinspace
.linspace(-3u,3u,7)
doesn't pass. I'd rather this one passed, because it's seemingly unrelated to nonsense like-0.0
. I suspect this is anfma
bug, so I kept the test and marked it@test_skip
. Note this test passes ifu = 2*eps(0.0)
, so the failure is specific to the smallest-magnitude floating point number.Performance
I'm hoping that on machines with CPU-support for
fma
, this will be competitive with our current implementation.fma
support was introduced around 2012-2013, so there many be a good number of machines that don't have such support. Without it, this ends upccalling
C code for thefma
and is therefore glacially slow.If we are concerned about performance on older machines, one potential way to fix it would be to Dekker-split the two endpoints upon construction; then you'd only have to split
t
at runtime, and perhaps along with some tweaks that take advantage of the symmetry of the problem you might be able to largely close the gap.We might also consider adding a
@fastmath
specialization.But before getting too speculative about performance, let's see what
@nanosoldier runbenchmarks(ALL, vs=:master)
thinks.Future changes to Ranges
One source of concern is that
LinSpace
andFloatRange
now have different behavior. I think the right way to handle this is to causestart:step:stop
to return aLinSpace
when any of those entries areAbstractFloat
s, and moveFloatRange
to a package. That way there won't be any inconsistencies.StepRange
can still exist, only now (with this more genericLinSpace
) I think it can safely be confined toInteger
(thus fixing all kinds of problems like those I'm attempting to address in #18744).