Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large performance regression for simple bisection function #42007

Open
giordano opened this issue Aug 25, 2021 · 7 comments
Open

Large performance regression for simple bisection function #42007

giordano opened this issue Aug 25, 2021 · 7 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@giordano
Copy link
Contributor

function bisectroot(f::Function, left, right, ϵ=1e-6)
    @assert f(left) < 0 && f(right) > 0
    ϵ₁ = Inf
    while ϵ₁ > ϵ
        mid = (left + right)/2
        left, right = f(mid) > 0 ? (left, mid) : (mid, right)
        ϵ₁ = right - left
    end
    (left + right)/2
end

on v1.6:

julia> @benchmark bisectroot(x -> x^2-7, 0, 7)
BenchmarkTools.Trial: 10000 samples with 935 evaluations.
 Range (min  max):  105.705 ns  293.216 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     111.143 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   114.704 ns ±  12.841 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▂█▆▅▇▅▄   ▄▃▁▂       ▁▁                                      ▂
  ████████████████████████▇▇▇▆▇▇▇█▇▇▅▅▅▅▄▅▆▇▆▅▇▅▅▄▆▅▆▇▅▅▅▅▄▄▅▆▆ █
  106 ns        Histogram: log(frequency) by time        176 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

on master (a11de31):

julia> @benchmark bisectroot(x -> x^2-7, 0, 7)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min  max):  4.255 μs  308.235 μs  ┊ GC (min  max): 0.00%  97.95%
 Time  (median):     4.441 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.046 μs ±   5.829 μs  ┊ GC (mean ± σ):  2.50% ±  2.19%

  ▃█▆▃▂▃▁▁▁▁▄▃▂ ▁▁▁▁                                          ▁
  ███████████████████▇▆▆▇▆▇▆▆▇██▇▆▆▅▅▆▅▆▆▆▅▅▆▅▅▆▅▅▅▆▅▅▄▅▅▅▅▅▄ █
  4.26 μs      Histogram: log(frequency) by time      10.5 μs <

 Memory estimate: 3.30 KiB, allocs estimate: 140.

Note that on master the function allocates, when it shouldn't.

The regression can be completely cleared (master would be actually ~25% faster) by replacing left, right = f(mid) > 0 ? (left, mid) : (mid, right) with an if like this:

function bisectroot(f::Function, left, right, ϵ=1e-6)
    @assert f(left) < 0 && f(right) > 0
    ϵ₁ = Inf
    while ϵ₁ > ϵ
        mid = (left + right)/2
        if f(mid) > 0
            left, right = left, mid
        else
            left, right = mid, right
        end
        ϵ₁ = right - left
    end
    (left + right)/2
end

v1.6:

julia> @benchmark bisectroot(x -> x^2-7, 0, 7)
BenchmarkTools.Trial: 10000 samples with 930 evaluations.
 Range (min  max):  105.654 ns  240.912 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     111.938 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   115.669 ns ±  14.069 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄ █▃█▅▃ ▃ ▄▃▂▁ ▁▁▁                                            ▂
  █▇█████▇███████████████▇▆▇▇▇▇▇▆▇▆▆▆▅▆▆▆▆█▆▆▆▆▆▆▆▇▆▆▅▅▅▆▆▄▅▃▅▄ █
  106 ns        Histogram: log(frequency) by time        183 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

master:

julia> @benchmark bisectroot(x -> x^2-7, 0, 7)
BenchmarkTools.Trial: 10000 samples with 969 evaluations.
 Range (min  max):  75.507 ns  200.540 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     78.062 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   82.220 ns ±  10.800 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂ █ ▇▁   ▂▂ ▁▂▁                      ▁                       ▁
  █▄████▅▇▆███████▇▇▇▇▇▇▇▇██▆▅▆▆▅▅▆▆▇▆▅█▅▅▅▄▅▄▆▆▆▄▃▅▄▅▄▃▅▄▅▄▄▄ █
  75.5 ns       Histogram: log(frequency) by time       135 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

At the moment I can't run git bisect myself to try and find the culprit (would take too long...), but if someone has clues about potential candidates, I could do a smaller search.

@KristofferC
Copy link
Member

KristofferC commented Aug 25, 2021

Looks regressed in 1.7-beta already.

The performance difference goes away if the call is made type stable: bisectroot(x -> x^2-7, 0.0, 7.0).

@giordano
Copy link
Contributor Author

The performance difference goes away if the call is made type stable: bisectroot(x -> x^2-7, 0.0, 7.0).

Ugh, that's an excellent point. I thought about type-instability because of the unexpected memory allocations, I completely overlooked the fact the input arguments were reused in the body of the function, but I did use @code_warntype and it didn't flag anything:

julia> @code_warntype bisectroot(x -> x^2-7, 0, 7)
MethodInstance for bisectroot(::var"#19#20", ::Int64, ::Int64)
  from bisectroot(f::Function, left, right) in Main at /tmp/foo.jl:1
Arguments
  #self#::Core.Const(bisectroot)
  f::Core.Const(var"#19#20"())
  left::Int64
  right::Int64
Body::Float64
1%1 = (#self#)(f, left, right, 1.0e-6)::Float64
└──      return %1

A type-unstable call is probably borderline in terms of what's expected to be optimised, right?

@JeffBezanson JeffBezanson added the regression Regression in behavior compared to a previous version label Aug 25, 2021
@JeffBezanson
Copy link
Member

I'm certainly curious to find out what caused this.

@KristofferC
Copy link
Member

but I did use @code_warntype and it didn't flag anything:

It's because of the default argument. Use cthulhu to dig deeper into the calls.

@martinholters
Copy link
Member

On 1.6, inference gives Tuple{Union{Float64, Int64}, Union{Float64, Int64}} for the tuple that gets destructured into left, right, while on master, it gives Union{Tuple{Float64, Int64}, Tuple{Union{Float64, Int64}, Float64}}. Note that master even produces a slightly tighter result (ruling out Tuple{Int64, Int64}), but the call to indexed_iterate then is done via dynamic dispatch, while 1.6 handles it more gracefully.

@maleadt
Copy link
Member

maleadt commented Aug 26, 2021

Bisected:

3635f0473dea7c7254d611e3eaa939e0f6dd9484 is the first bad commit
commit 3635f0473dea7c7254d611e3eaa939e0f6dd9484
Author: Takafumi Arakaki <[email protected]>
Date:   Mon Mar 15 12:28:21 2021 -0700

    Check issimpleenoughtype before stripping off type parameters in tmerge (#39980)

VERSION = v"1.7.0-DEV.710"
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.404 μs … 516.281 μs  ┊ GC (min … max): 0.00% … 98.92%
 Time  (median):     2.549 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.847 μs ±  10.815 μs  ┊ GC (mean ± σ):  9.24% ±  2.43%

vs

commit 11016bfcb2ec0ec3006eca4b556f9f5f24f2be71 (HEAD)
Author: Keno Fischer <[email protected]>
Date:   Mon Mar 15 10:55:35 2021 -0400

    Fix opaque closure codegen ABI (#40001)

VERSION = v"1.7.0-DEV.709"
BenchmarkTools.Trial: 10000 samples with 971 evaluations.
 Range (min … max):  75.457 ns … 87.332 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     76.703 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   76.605 ns ±  0.539 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

cc @tkf

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2023

The tmerge here is a slightly odd. While true it is exactly equal to the Union of the 2 input types, it is an awkward way to express the 3 Tuple elements, and so the optimizer later struggles to split this well in SROA.

8 ──       (@_11 = Core.tuple(left@_9, mid))::Tuple{Union{Float64, Int64}, Float64}
└───       goto #10
9 ──       (@_11 = Core.tuple(mid, right@_10))::Tuple{Float64, Union{Float64, Int64}}
10 ┄ %25 = @_11::Union{Tuple{Float64, Int64}, Tuple{Union{Float64, Int64}, Float64}}

This means codegen sees this type: Tuple{Union{Float64, Int64}, Union{Float64, Int64}}, which it doesn't support unboxed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

6 participants