-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving broadcasting performance by working around recursion limits of inlining #41090
Comments
What effect does this have on compilation time? |
Compilation time of what? |
I just tried measuring compilation time for one of this issues (JuliaArrays/StaticArrays.jl#560), before: julia> @time f(ũ, u₀, u₁, ρ)
0.377140 seconds (831.70 k allocations: 51.208 MiB, 2.14% gc time, 99.99% compilation time)
3-element SVector{3, Float64} with indices SOneTo(3):
0.2167203902566349
0.8064882455742344
4.013494934265736 after: julia> @time f(ũ, u₀, u₁, ρ)
0.266012 seconds (720.18 k allocations: 44.344 MiB, 2.57% gc time, 99.99% compilation time)
3-element SVector{3, Float64} with indices SOneTo(3):
0.49720047062352407
1.074241639114959
4.084778907675536 So it seems to even improve compilation time in some cases. |
A few more benchmarks (Julia master Version 1.7.0-DEV.1255 (2021-06-05)). Note that compile times after the change were measured after fresh start of Julia REPL so that nothing from "before" evaluations was cached. Overall I haven't encountered a single case where either compile time or runtime was worse after this change, and there are some fairly decent improvements. The only drawback is slightly more complicated broadcasting code. Issue JuliaArrays/StaticArrays.jl#560runtime before: julia> @btime f($ũ, $u₀, $u₁, $ρ) # 2.354 μs (44 allocations: 3.59 KiB)
65.892 ns (12 allocations: 192 bytes) compile time before: julia> @time f(ũ, u₀, u₁, ρ) # 2.354 μs (44 allocations: 3.59 KiB)
0.379381 seconds (871.05 k allocations: 55.635 MiB, 4.07% gc time, 100.00% compilation time) runtime after julia> @btime f($ũ, $u₀, $u₁, $ρ) # 2.354 μs (44 allocations: 3.59 KiB)
0.020 ns (0 allocations: 0 bytes) compile time after (I'm not sure why I can't reproduce the compile time reduction today now but at least it's not worse) julia> @time f(ũ, u₀, u₁, ρ) # 2.354 μs (44 allocations: 3.59 KiB)
0.379410 seconds (923.10 k allocations: 58.787 MiB, 4.11% gc time, 100.00% compilation time) Issue JuliaArrays/StaticArrays.jl#682runtime before: julia> @btime f($A, $b, $A, $(_dual(b)), $b, $A)
12.153 μs (310 allocations: 7.34 KiB) compile time before: julia> @time f(A, b, A, _dual(b), b, A)
0.666930 seconds (1.53 M allocations: 97.502 MiB, 4.58% gc time, 99.91% compilation time) runtime after: julia> @btime f($A, $b, $A, $(_dual(b)), $b, $A)
0.020 ns (0 allocations: 0 bytes) compile time after (small improvement): julia> @time f(A, b, A, _dual(b), b, A)
0.558779 seconds (1.47 M allocations: 94.353 MiB, 7.04% gc time, 100.00% compilation time) Issue JuliaArrays/StaticArrays.jl#609runtime before: julia> @btime doit($s,$c)
1.193 μs (24 allocations: 448 bytes) compile time before: julia> @time doit(s,c)
0.394210 seconds (805.74 k allocations: 52.276 MiB, 99.97% compilation time)
1-element SVector{1, Float64} with indices SOneTo(1): runtime after: julia> @btime doit($s,$c)
0.020 ns (0 allocations: 0 bytes) compile time after (small improvement): julia> @time doit(s,c)
0.351852 seconds (761.63 k allocations: 49.821 MiB, 3.27% gc time, 100.00% compilation time) Issue JuliaArrays/StaticArrays.jl#797runtime before: julia> @btime g!($rs, $as, $bs, $cs, $ds)
17.343 μs (750 allocations: 13.28 KiB) compile time before: julia> @time g!(rs, as, bs, cs, ds)
0.716501 seconds (2.49 M allocations: 128.211 MiB, 11.63% gc time) runtime after: julia> @btime g!($rs, $as, $bs, $cs, $ds)
20.882 ns (0 allocations: 0 bytes) compile time after (decent improvement): julia> @time g!(rs, as, bs, cs, ds)
0.534649 seconds (2.71 M allocations: 139.501 MiB, 4.61% gc time) Unrelated broadcasting examplejulia> a = [1.0 2; 3 4];
julia> b = [1.0, 2.0];
julia> f(a, b) = a .+ 2 .* b; compile time before: julia> @time f(a, b)
0.108545 seconds (619.70 k allocations: 36.861 MiB, 99.98% compilation time)
2×2 Matrix{Float64}: compile time after: julia> @time f(a, b)
0.108584 seconds (624.50 k allocations: 37.268 MiB, 99.99% compilation time)
2×2 Matrix{Float64}: |
Should I open a PR with this change or is it not suitable for Julia Base? |
I think a PR would show more clearly what your proposed changes are and thereby lead to a better discussion. |
OK, I've opened a PR: #41139 . |
Hi!
I've discovered that many (runtime) performance issues with broadcasting are caused by inlining not working with the highly recursive broadcasting code. It turns out that defining more methods can actually help here. Here is a piece of code you can evaluate in REPL to see that:
This effectively duplicates these two functions:
julia/base/broadcast.jl
Lines 380 to 384 in abbb220
julia/base/broadcast.jl
Line 361 in abbb220
It turns out that it's sufficient to fix the following issues:
JuliaArrays/StaticArrays.jl#560
JuliaArrays/StaticArrays.jl#682
JuliaArrays/StaticArrays.jl#609
JuliaArrays/StaticArrays.jl#797
What do you think about it?
The text was updated successfully, but these errors were encountered: