RFC: robust division for Complex{Float64} #5112

ggggggggg · 2013-12-12T03:23:17Z

Robust complex division for Complex{Float64} based on arxiv.1210.4539. Includes testing based on 10 example hard complex division.
Closes #5072

StefanKarpinski · 2013-12-12T04:22:24Z

Awesome. I'll let @JeffBezanson take a look and pull the trigger.

ViralBShah · 2013-12-12T06:14:05Z

This looks really thorough!

ViralBShah · 2013-12-12T06:14:41Z

How does the performance compare to the existing version?

JeffBezanson · 2013-12-12T07:01:26Z

It seems like this would work for Float32 as well, at least. I realize the
test cases are Float64, but that's ok.
On Dec 12, 2013 1:14 AM, "Viral B. Shah" [email protected] wrote:

How does the performance compare to the existing version?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5112#issuecomment-30391802
.

ggggggggg · 2013-12-12T20:48:47Z

I'll work up some benchmarks later, in the paper they benchmark the robust algorithm as something like 1/3 slower than smith's algorithm (which I believe is the current algorithm). As for Float32 I was under the impression that Julia did calculations with Float64, then converts back to Float32. If that is the case then I think this algorithm offers no advantage over the previous (smith's) algorithm, and will probably be slower.

In the paper their reference "correct" answers come from using smith's algorithm with Float64 input, but some sort of intel extension that causes the actual calculation to be done with Float80 precision.

JeffBezanson · 2013-12-12T21:05:20Z

There is no magic to do Float32 operations as Float64, but you are right that we should do Float32 complex division by converting to Float64, using the old algorithm, and converting back.

JeffBezanson · 2013-12-12T21:07:53Z

I'm not too worried about performance here; I think the extra accuracy is well worth it.

JeffBezanson · 2013-12-12T22:35:12Z

OK I think this is ready to merge. But I'd like to ask that the link to the scilab file be removed, since you did not copy the code from there and I don't want to create confusion about whether we are bound by its license.

ggggggggg · 2013-12-12T23:37:15Z

Ok. That was fun. Now I need to find another easy issue.

jiahao · 2013-12-15T04:39:20Z

bump

RFC: robust division for Complex{Float64}

ViralBShah · 2013-12-15T14:14:38Z

Great to have this merged.

stevengj · 2013-12-18T17:28:45Z

Isn't the main point of the old (Smith) algorithm to avoid spurious overflow (in abs2(z))? If we implement Float32 division in terms of Float64 operations, then overflow should be impossible.

Speaking of spurious overflow, the definition inv(z::Complex) = conj(z)/abs2(z) seems wrong, e.g. it gives:

julia> inv(1e300+0im)
0.0 - 0.0im

stevengj · 2013-12-18T17:31:34Z

base/complex.jl

+#             a + i*b
+#  p + i*q = ---------
+#             c + i*d
+function robust_cdiv{T<:Float64}(a::T,b::T,c::T,d::T)


What is T<:Float64 for? Since Float64 is a concrete type, how is it possible T to be anything but Float64?

stevengj · 2013-12-18T17:38:13Z

(It would be nice to see our own benchmarks; I've learned the hard way that one should always be cautious about performance claims in papers.)

ggggggggg · 2013-12-18T17:59:22Z

I used T<:Float64 mostly because it was more concise than writing Float64 a bunch throughout the function. I didn't think there would be any performance penalty.

I'll put benchmarks back on my todo.

JeffBezanson · 2013-12-18T18:01:59Z

There is no performance penalty, but it would be cleaner to remove the Ts. I'm pretty sure all those declarations are redundant.

StefanKarpinski · 2013-12-18T18:21:03Z

Aside: it can actually be useful to use a concrete type as an upper type bound if the parameter could be None:

foo{T<:Float64}(v::Vector{T}) = ...

This foo method now accepts arrays of type Array{Float64} or Array{None} which is sometimes convenient.

ggggggggg · 2013-12-19T22:01:27Z

I put together a simple benchmark, and the speed difference seems to be greater than the paper suggested. The paper benchmarked the robust algorithm at about 4/3 the time of the smith algorithm if I'm remembering correctly. Here I'm seeing closer to a factor of 5. No idea why so much memory is being allocated.

https://gist.github.com/ggggggggg/8046958
robust/new
elapsed time: 0.311824189 seconds (64940708 bytes allocated)
elapsed time: 0.106856784 seconds (64000048 bytes allocated)
elapsed time: 0.086953636 seconds (64000048 bytes allocated)
elapsed time: 0.087881237 seconds (64000048 bytes allocated)
elapsed time: 0.099615934 seconds (64000048 bytes allocated)
smith/old
elapsed time: 0.027790837 seconds (349352 bytes allocated)
elapsed time: 0.021653097 seconds (48 bytes allocated)
elapsed time: 0.021378803 seconds (48 bytes allocated)
elapsed time: 0.020773924 seconds (48 bytes allocated)
elapsed time: 0.020672097 seconds (48 bytes allocated)

JeffBezanson · 2013-12-19T22:15:15Z

Are you running that in julia 0.2? It should be much faster with current master.

stevengj · 2013-12-19T22:22:49Z

I did a simple benchmark with master:

foo(A, B) = Complex128[(A[i] * conj(B[i])) / abs2(B[i]) for i in 1:length(A)]
bar(A, B) = Complex128[A[i] / B[i] for i in 1:length(A)]
baz(A, B) = Complex128[abs(real(B[i]) + imag(B[i])) > 1e150 ? A[i] / B[i] : (A[i] * conj(B[i])) / abs2(B[i]) for i in 1:length(A)]
n = 10^7
x = rand(n) + rand(n)*im
y = rand(n) + rand(n)*im
@time foo(x,y);
@time bar(x,y);
@time baz(x,y);

and got

elapsed time: 0.198087233 seconds (160000096 bytes allocated)
elapsed time: 0.405705481 seconds (160000096 bytes allocated)
elapsed time: 0.201764476 seconds (160000096 bytes allocated)

I haven't had a chance to try the old (Smith) code. However, it looks like:

The new code (robust div) is about 2x slower than naive division
Adding a simple check for large values and using naive division in the common case where overflow does not occur (almost all the time in practice) is nearly as fast as naive division.

ggggggggg · 2013-12-20T16:05:47Z

I ran my benchmark on 0.2 on my work machine, so I built the latest master at Jeff's suggestion and reran it. I get much closer performance now.

robust/new: elapsed time: 0.068046929 seconds (48 bytes allocated)
smith/old: elapsed time: 0.05163035 seconds (48 bytes allocated)

Also I the simple check for potential overflow doesn't work. At the least I think it misses cases with underflow. I ran stevegj's baz function through the hard division tests in test/complex/jl with [sb_accuracy(baz(h[1],h[2])[1],h[3]) for h in harddivs]. On half of the tests it gets the correct answer (53 bits), on the other half it fails (0 bits).

stevengj · 2013-12-20T17:32:34Z

I'm not surprised that my test was too simple (e.g. I just noticed that the abs call is in the wrong place); it was just a quick test to see the overhead of a basic check.

My point is that it is worth using the naive algorithm if we can devise a simple test for when it is valid, and the robust algorithm otherwise, because in the overwhelming majority of real-world cases (no underflow/overflow) the naive algorithm is fine. Modifying your benchmark code to look at the naive algorithm and a "checked" version (the naive algorithm or the robust one depending on a simple test), I find that the results are even better than my benchmark from above:

robust/new -- elapsed time: 0.031266115 seconds (48 bytes allocated)
smith/old -- elapsed time: 0.020095014 seconds (48 bytes allocated)
naive -- elapsed time: 0.001157761 seconds (48 bytes allocated)
checked naive -- elapsed time: 0.002784719 seconds (48 bytes allocated)

The robust version is 50% slower than Smith. The naive version is 20x faster than Smith, and the naive version plus a simple check is 10x faster than Smith.

For a factor of 10 improvement, it is probably worth spending some time to figure out a reliable test for whether the naive algorithm is sufficient.

stevengj · 2013-12-20T18:15:34Z

The following routine passes all of the harddiv tests:

function checked_div(a::Complex128, b::Complex128)
    bnorm1 = abs(real(b)) + abs(imag(b))
    bnorm1 > 1e153 || bnorm1 < 1e-153 ? robust_cdiv(a, b) : a * (conj(b) / abs2(b))
end

and is about 50% faster than the old Smith method in your benchmark. If I manually inline checked_div in the benchmark, it is about 6x faster than the old Smith method.

@JeffBezanson, I don't suppose much can be done about the function-call overhead? Is this another case for #1106?

PS. The whole bnorm1 = abs(real(b)) + abs(imag(b)); bnorm1 > 1e153 || bnorm1 < 1e-153 check could almost certainly be sped up considerably by bit-twiddling to examine the exponent bits directly.

Benchmark routines:

function bench_check(n,d)
    for i = 1:length(n)
        checked_div(n[i],d[i])
    end
end

function bench_check_inline(n,d)
    for i = 1:length(n)
        di = d[i]
        bnorm1 = abs(real(di)) + abs(imag(di))
        bnorm1 > 1e153 || bnorm1 < 1e-153 ? robust_cdiv(n[i], di) : n[i] * (conj(di) / abs2(di))
    end
end

ggggggggg · 2013-12-20T18:23:12Z

I see. You're fast, here is another routine that passes all the harddiv tests, I think it is slightly slower than yours, but I can only test on 0.2 right now which we know is very slow with robust_cdiv

robust/current: elapsed time: 0.098058714 seconds (64006848 bytes allocated)
smith/old: elapsed time: 0.019170227 seconds (48 bytes allocated)
naive_test: elapsed time: 0.01587455 seconds (88712 bytes allocated)

function naive_test_cdiv(a::Complex128, b::Complex128)
    c = naive_cdiv(a,b)
    cr,ci = reim(c)
    if !isfinite(cr) || !isfinite(ci) || (cr == 0.0 && real(a) != 0.0) || (ci == 0.0 && imag(a) != 0.0)
        return robust_cdiv(a,b)
    end
    c
end

Updated my gist with this benchmark and the harddiv tests.
https://gist.github.com/ggggggggg/8046958

stevengj · 2013-12-20T18:27:24Z

Here is another routine that is about 2.25x faster than Smith, just by inlining the complex operations in the naive division:

function checked_div(a::Complex128, b::Complex128)
    br = real(b); bi = imag(b)
    bnorm1 = abs(br) + abs(bi)
    if bnorm1 > 1e153 || bnorm1 < 1e-153 
        robust_cdiv(a, b)
    else
        binv = 1.0 / (br*br + bi*bi)
        ar = real(a); ai = imag(a)
        br *= binv; bi *= binv
        Complex128(ar * br + ai * bi, ai * br - ar * bi)
    end
end

stevengj · 2013-12-20T19:50:26Z

I benchmarked your routine, along with an optimized version of your routine that inlines naive_cdiv similar to my code, and the results (best times) are:

robust/new elapsed time: 0.032071361 seconds (48 bytes allocated)
smith/old elapsed time: 0.020198377 seconds (48 bytes allocated)
naive elapsed time: 0.001157725 seconds (48 bytes allocated)
checked_div elapsed time: 0.008129058 seconds (48 bytes allocated)
inlined checked_div elapsed time: 0.002815216 seconds (48 bytes allocated)
naive_test_cdiv elapsed time: 0.013251627 seconds (48 bytes allocated)
optimized naive_test_cdiv elapsed time: 0.011002862 seconds (48 bytes allocated)

ggggggggg · 2013-12-20T20:40:18Z

Inline inlined checked_div is impressively fast. Seems pretty close to the best of both worlds.

stevengj · 2013-12-20T21:04:09Z

The problem is that it is manually inlined (directly into the benchmark function), so this is not possible as a way to implement / without improvements to Julia's inlining capabilities. checked_div (my 2nd version above) may be close to as good as we can do for now.

robust division for Complex{Float64}

bf35dc7

comment clarification

bbc7afb

JeffBezanson added a commit that referenced this pull request Dec 15, 2013

Merge pull request #5112 from ggggggggg/complex_division_robust

d89ea95

RFC: robust division for Complex{Float64}

JeffBezanson merged commit d89ea95 into JuliaLang:master Dec 15, 2013

ggggggggg deleted the complex_division_robust branch December 15, 2013 18:06

stevengj reviewed Dec 18, 2013
View reviewed changes

stevengj mentioned this pull request Dec 18, 2013

overflow bug in inv(::Complex) #5188

Closed

jiahao mentioned this pull request Aug 25, 2014

Why is the complex128 division implementation different from the other complex division implementations? #8121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: robust division for Complex{Float64} #5112

RFC: robust division for Complex{Float64} #5112

ggggggggg commented Dec 12, 2013

StefanKarpinski commented Dec 12, 2013

ViralBShah commented Dec 12, 2013

ViralBShah commented Dec 12, 2013

JeffBezanson commented Dec 12, 2013

ggggggggg commented Dec 12, 2013

JeffBezanson commented Dec 12, 2013

JeffBezanson commented Dec 12, 2013

JeffBezanson commented Dec 12, 2013

ggggggggg commented Dec 12, 2013

jiahao commented Dec 15, 2013

ViralBShah commented Dec 15, 2013

stevengj commented Dec 18, 2013

stevengj Dec 18, 2013

stevengj commented Dec 18, 2013

ggggggggg commented Dec 18, 2013

JeffBezanson commented Dec 18, 2013

StefanKarpinski commented Dec 18, 2013

ggggggggg commented Dec 19, 2013

JeffBezanson commented Dec 19, 2013

stevengj commented Dec 19, 2013

ggggggggg commented Dec 20, 2013

stevengj commented Dec 20, 2013

stevengj commented Dec 20, 2013

ggggggggg commented Dec 20, 2013

stevengj commented Dec 20, 2013

stevengj commented Dec 20, 2013

ggggggggg commented Dec 20, 2013

stevengj commented Dec 20, 2013

RFC: robust division for Complex{Float64} #5112

RFC: robust division for Complex{Float64} #5112

Conversation

ggggggggg commented Dec 12, 2013

StefanKarpinski commented Dec 12, 2013

ViralBShah commented Dec 12, 2013

ViralBShah commented Dec 12, 2013

JeffBezanson commented Dec 12, 2013

ggggggggg commented Dec 12, 2013

JeffBezanson commented Dec 12, 2013

JeffBezanson commented Dec 12, 2013

JeffBezanson commented Dec 12, 2013

ggggggggg commented Dec 12, 2013

jiahao commented Dec 15, 2013

ViralBShah commented Dec 15, 2013

stevengj commented Dec 18, 2013

stevengj Dec 18, 2013

Choose a reason for hiding this comment

stevengj commented Dec 18, 2013

ggggggggg commented Dec 18, 2013

JeffBezanson commented Dec 18, 2013

StefanKarpinski commented Dec 18, 2013

ggggggggg commented Dec 19, 2013

JeffBezanson commented Dec 19, 2013

stevengj commented Dec 19, 2013

ggggggggg commented Dec 20, 2013

stevengj commented Dec 20, 2013

stevengj commented Dec 20, 2013

ggggggggg commented Dec 20, 2013

stevengj commented Dec 20, 2013

stevengj commented Dec 20, 2013

ggggggggg commented Dec 20, 2013

stevengj commented Dec 20, 2013