Large performance regression for simple bisection function #42007

giordano · 2021-08-25T17:28:26Z

function bisectroot(f::Function, left, right, ϵ=1e-6)
    @assert f(left) < 0 && f(right) > 0
    ϵ₁ = Inf
    while ϵ₁ > ϵ
        mid = (left + right)/2
        left, right = f(mid) > 0 ? (left, mid) : (mid, right)
        ϵ₁ = right - left
    end
    (left + right)/2
end

on v1.6:

julia> @benchmark bisectroot(x -> x^2-7, 0, 7)
BenchmarkTools.Trial: 10000 samples with 935 evaluations.
 Range (min … max):  105.705 ns … 293.216 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     111.143 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   114.704 ns ±  12.841 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▂█▆▅▇▅▄   ▄▃▁▂       ▁▁                                      ▂
  ████████████████████████▇▇▇▆▇▇▇█▇▇▅▅▅▅▄▅▆▇▆▅▇▅▅▄▆▅▆▇▅▅▅▅▄▄▅▆▆ █
  106 ns        Histogram: log(frequency) by time        176 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

on master (a11de31):

julia> @benchmark bisectroot(x -> x^2-7, 0, 7)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.255 μs … 308.235 μs  ┊ GC (min … max): 0.00% … 97.95%
 Time  (median):     4.441 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.046 μs ±   5.829 μs  ┊ GC (mean ± σ):  2.50% ±  2.19%

  ▃█▆▃▂▃▁▁▁▁▄▃▂ ▁▁▁▁                                          ▁
  ███████████████████▇▆▆▇▆▇▆▆▇██▇▆▆▅▅▆▅▆▆▆▅▅▆▅▅▆▅▅▅▆▅▅▄▅▅▅▅▅▄ █
  4.26 μs      Histogram: log(frequency) by time      10.5 μs <

 Memory estimate: 3.30 KiB, allocs estimate: 140.

Note that on master the function allocates, when it shouldn't.

The regression can be completely cleared (master would be actually ~25% faster) by replacing left, right = f(mid) > 0 ? (left, mid) : (mid, right) with an if like this:

function bisectroot(f::Function, left, right, ϵ=1e-6)
    @assert f(left) < 0 && f(right) > 0
    ϵ₁ = Inf
    while ϵ₁ > ϵ
        mid = (left + right)/2
        if f(mid) > 0
            left, right = left, mid
        else
            left, right = mid, right
        end
        ϵ₁ = right - left
    end
    (left + right)/2
end

v1.6:

julia> @benchmark bisectroot(x -> x^2-7, 0, 7)
BenchmarkTools.Trial: 10000 samples with 930 evaluations.
 Range (min … max):  105.654 ns … 240.912 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     111.938 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   115.669 ns ±  14.069 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄ █▃█▅▃ ▃ ▄▃▂▁ ▁▁▁                                            ▂
  █▇█████▇███████████████▇▆▇▇▇▇▇▆▇▆▆▆▅▆▆▆▆█▆▆▆▆▆▆▆▇▆▆▅▅▅▆▆▄▅▃▅▄ █
  106 ns        Histogram: log(frequency) by time        183 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

master:

julia> @benchmark bisectroot(x -> x^2-7, 0, 7)
BenchmarkTools.Trial: 10000 samples with 969 evaluations.
 Range (min … max):  75.507 ns … 200.540 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     78.062 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   82.220 ns ±  10.800 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂ █ ▇▁   ▂▂ ▁▂▁                      ▁                       ▁
  █▄████▅▇▆███████▇▇▇▇▇▇▇▇██▆▅▆▆▅▅▆▆▇▆▅█▅▅▅▄▅▄▆▆▆▄▃▅▄▅▄▃▅▄▅▄▄▄ █
  75.5 ns       Histogram: log(frequency) by time       135 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

At the moment I can't run git bisect myself to try and find the culprit (would take too long...), but if someone has clues about potential candidates, I could do a smaller search.

The text was updated successfully, but these errors were encountered:

KristofferC · 2021-08-25T17:45:54Z

Looks regressed in 1.7-beta already.

The performance difference goes away if the call is made type stable: bisectroot(x -> x^2-7, 0.0, 7.0).

giordano · 2021-08-25T21:07:52Z

The performance difference goes away if the call is made type stable: bisectroot(x -> x^2-7, 0.0, 7.0).

Ugh, that's an excellent point. I thought about type-instability because of the unexpected memory allocations, I completely overlooked the fact the input arguments were reused in the body of the function, but I did use @code_warntype and it didn't flag anything:

julia> @code_warntype bisectroot(x -> x^2-7, 0, 7)
MethodInstance for bisectroot(::var"#19#20", ::Int64, ::Int64)
  from bisectroot(f::Function, left, right) in Main at /tmp/foo.jl:1
Arguments
  #self#::Core.Const(bisectroot)
  f::Core.Const(var"#19#20"())
  left::Int64
  right::Int64
Body::Float64
1 ─ %1 = (#self#)(f, left, right, 1.0e-6)::Float64
└──      return %1

A type-unstable call is probably borderline in terms of what's expected to be optimised, right?

JeffBezanson · 2021-08-25T21:29:57Z

I'm certainly curious to find out what caused this.

KristofferC · 2021-08-25T21:55:16Z

but I did use @code_warntype and it didn't flag anything:

It's because of the default argument. Use cthulhu to dig deeper into the calls.

martinholters · 2021-08-26T07:45:38Z

On 1.6, inference gives Tuple{Union{Float64, Int64}, Union{Float64, Int64}} for the tuple that gets destructured into left, right, while on master, it gives Union{Tuple{Float64, Int64}, Tuple{Union{Float64, Int64}, Float64}}. Note that master even produces a slightly tighter result (ruling out Tuple{Int64, Int64}), but the call to indexed_iterate then is done via dynamic dispatch, while 1.6 handles it more gracefully.

maleadt · 2021-08-26T09:26:30Z

Bisected:

3635f0473dea7c7254d611e3eaa939e0f6dd9484 is the first bad commit
commit 3635f0473dea7c7254d611e3eaa939e0f6dd9484
Author: Takafumi Arakaki <[email protected]>
Date:   Mon Mar 15 12:28:21 2021 -0700

    Check issimpleenoughtype before stripping off type parameters in tmerge (#39980)

VERSION = v"1.7.0-DEV.710"
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.404 μs … 516.281 μs  ┊ GC (min … max): 0.00% … 98.92%
 Time  (median):     2.549 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.847 μs ±  10.815 μs  ┊ GC (mean ± σ):  9.24% ±  2.43%

vs

commit 11016bfcb2ec0ec3006eca4b556f9f5f24f2be71 (HEAD)
Author: Keno Fischer <[email protected]>
Date:   Mon Mar 15 10:55:35 2021 -0400

    Fix opaque closure codegen ABI (#40001)

VERSION = v"1.7.0-DEV.709"
BenchmarkTools.Trial: 10000 samples with 971 evaluations.
 Range (min … max):  75.457 ns … 87.332 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     76.703 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   76.605 ns ±  0.539 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

cc @tkf

vtjnash · 2023-08-24T20:59:44Z

The tmerge here is a slightly odd. While true it is exactly equal to the Union of the 2 input types, it is an awkward way to express the 3 Tuple elements, and so the optimizer later struggles to split this well in SROA.

8 ──       (@_11 = Core.tuple(left@_9, mid))::Tuple{Union{Float64, Int64}, Float64}
└───       goto #10
9 ──       (@_11 = Core.tuple(mid, right@_10))::Tuple{Float64, Union{Float64, Int64}}
10 ┄ %25 = @_11::Union{Tuple{Float64, Int64}, Tuple{Union{Float64, Int64}, Float64}}

This means codegen sees this type: Tuple{Union{Float64, Int64}, Union{Float64, Int64}}, which it doesn't support unboxed.

JeffBezanson added the regression Regression in behavior compared to a previous version label Aug 25, 2021

KristofferC added the performance Must go faster label Aug 31, 2021

KristofferC mentioned this issue Aug 31, 2021

Check issimpleenoughtype before stripping off type parameters in tmerge #39980

Merged

tkf mentioned this issue Sep 3, 2021

type-intersection env may fail merging of multiple value matches #41096

Closed

KristofferC mentioned this issue Sep 17, 2021

"Internal error" crash in Julia 1.7.0-beta4 when using the Gridap FEM library. #42264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large performance regression for simple bisection function #42007

Large performance regression for simple bisection function #42007

giordano commented Aug 25, 2021

KristofferC commented Aug 25, 2021 •

edited

Loading

giordano commented Aug 25, 2021

JeffBezanson commented Aug 25, 2021

KristofferC commented Aug 25, 2021

martinholters commented Aug 26, 2021

maleadt commented Aug 26, 2021 •

edited

Loading

vtjnash commented Aug 24, 2023

Large performance regression for simple bisection function #42007

Large performance regression for simple bisection function #42007

Comments

giordano commented Aug 25, 2021

KristofferC commented Aug 25, 2021 • edited Loading

giordano commented Aug 25, 2021

JeffBezanson commented Aug 25, 2021

KristofferC commented Aug 25, 2021

martinholters commented Aug 26, 2021

maleadt commented Aug 26, 2021 • edited Loading

vtjnash commented Aug 24, 2023

KristofferC commented Aug 25, 2021 •

edited

Loading

maleadt commented Aug 26, 2021 •

edited

Loading