Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer: fully support inlining of union-split, partially constant-prop' callsite #43347

Merged
merged 2 commits into from
Jan 5, 2022

Conversation

aviatesk
Copy link
Member

@aviatesk aviatesk commented Dec 6, 2021

Makes full use of constant-propagation, by addressing this TODO.
Here is a performance improvement from #43287:

ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit

fixes #43287

@aviatesk aviatesk added backport 1.7 compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Dec 6, 2021
@aviatesk
Copy link
Member Author

aviatesk commented Dec 6, 2021

@nanosoldier runbenchmarks("broadcast" || "sparse" || "array" || "union" || "string" || "tuple", vs=":master")

@oscardssmith
Copy link
Member

Does this need TTFP benchmarking or is it good to merge?

@KristofferC KristofferC mentioned this pull request Dec 7, 2021
15 tasks
@KristofferC
Copy link
Member

This needs a manual backport to the 1.7 backport branch (#43297). Would be good if that could happen kind of quickly if possible, @aviatesk

@aviatesk
Copy link
Member Author

aviatesk commented Dec 7, 2021

I'm happy to do the backporting, but I also would like to run nanosolider before merging in order to assert this commit isn't introducing another performance regression as I did in #42841.

@aviatesk
Copy link
Member Author

aviatesk commented Dec 7, 2021

Does this need TTFP benchmarking or is it good to merge?

Instead of TTFP, I'd propose we should switch to using JuliaCI/BaseBenchmarks.jl#288 in order to track latency regressions.

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("broadcast" || "sparse" || "array" || "union" || "string" || "tuple", vs=":master")

@johnnychen94
Copy link
Member

bump; will this be included in Julia 1.7.1?

@aviatesk aviatesk added the needs nanosoldier run This PR should have benchmarks run on it label Dec 21, 2021
…prop' callsite

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
@aviatesk
Copy link
Member Author

aviatesk commented Jan 5, 2022

@nanosoldier runbenchmarks("broadcast" || "sparse" || "array" || "union" || "string" || "tuple", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk
Copy link
Member Author

aviatesk commented Jan 5, 2022

The benchmark result looks good. I'm also running another benchmark on mit cluster now and would like to get this merged once I confirm there is no obvious regressions in that result tool.

@aviatesk
Copy link
Member Author

aviatesk commented Jan 5, 2022

@nanosoldier runbenchmarks("linalg", vs=":master")

@aviatesk
Copy link
Member Author

aviatesk commented Jan 5, 2022

On amdci2, I got

!("scalar")

Benchmark Report

Job Properties

Commits: https://github.com/JuliaLang/julia@a79e40d61d4d0861c8fcbf15709588fe18ee8f74 vs https://github.com/JuliaLang/julia@85a6990a9c1d49dd5aeaffeb4b38f881dc120823

Comparison Diff: link

Triggered By: link

Tag Predicate: !scalar

Results

Note: If Chrome is your browser, I strongly recommend installing the Wide GitHub
extension, which makes the result table easier to read.

Below is a table of this job's results, obtained by running the benchmarks found in
JuliaCI/BaseBenchmarks.jl. The values
listed in the ID column have the structure [parent_group, child_group, ..., key],
and can be used to index into the BaseBenchmarks suite to retrieve the corresponding
benchmarks.

The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["array", "any/all", ("all", "Vector{Float64} generator")] 1.21 (5%) ❌ 1.00 (1%)
["array", "any/all", ("all", "Vector{Float64}")] 1.21 (5%) ❌ 1.00 (1%)
["array", "equality", ("==", "BitArray")] 1.65 (5%) ❌ 1.00 (1%)
["array", "equality", ("==", "UnitRange{Int64}")] 1.19 (5%) ❌ 1.00 (1%)
["array", "equality", ("isequal", "Vector{Bool}")] 0.71 (5%) ✅ 1.00 (1%)
["array", "equality", ("isequal", "Vector{Int16}")] 0.67 (5%) ✅ 1.00 (1%)
["array", "equality", ("isequal", "Vector{Int64} isequal Vector{Int16}")] 0.90 (5%) ✅ 1.00 (1%)
["array", "index", "2d"] 1.07 (5%) ❌ 1.00 (1%)
["array", "index", ("sumlogical", "SubArray{Int32, 2, Array{Int32, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}")] 2.71 (50%) ❌ 1.00 (1%)
["array", "reductions", ("maxabs", "Float64")] 1.13 (5%) ❌ 1.00 (1%)
["array", "reductions", ("norminf", "Int64")] 0.95 (5%) ✅ 1.00 (1%)
["array", "subarray", ("lucompletepivCopy!", 1000)] 0.89 (5%) ✅ 1.00 (1%)
["collection", "initialization", ("Vector", "Any", "iterator")] 1.30 (25%) ❌ 1.00 (1%)
["collection", "queries & updates", ("Vector", "Int", "in", "false")] 1.38 (25%) ❌ 1.00 (1%)
["collection", "queries & updates", ("Vector", "Int", "in", "true")] 0.53 (25%) ✅ 1.00 (1%)
["dates", "arithmetic", ("Date", "Year")] 0.92 (5%) ✅ 1.00 (1%)
["dates", "parse", ("DateTime", "RFC1123Format", "Lowercase")] 0.95 (5%) ✅ 1.00 (1%)
["dates", "parse", ("DateTime", "RFC1123Format", "Titlecase")] 0.95 (5%) ✅ 1.00 (1%)
["find", "findall", ("> q0.5", "Vector{Float32}")] 1.08 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.5", "Vector{Int64}")] 1.09 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.5", "Vector{UInt64}")] 1.10 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.5", "Vector{UInt8}")] 1.32 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.8", "Vector{Bool}")] 1.06 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.8", "Vector{Float32}")] 1.05 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.8", "Vector{Float64}")] 1.54 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.8", "Vector{Int8}")] 1.06 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.8", "Vector{UInt8}")] 1.28 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.95", "Vector{Bool}")] 1.07 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.95", "Vector{Float32}")] 1.05 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.95", "Vector{Float64}")] 1.11 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.95", "Vector{Int8}")] 1.09 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.95", "Vector{UInt8}")] 1.06 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.99", "Vector{Bool}")] 1.07 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.99", "Vector{Float32}")] 1.18 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.99", "Vector{Float64}")] 1.06 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.99", "Vector{Int8}")] 1.07 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.99", "Vector{UInt64}")] 1.41 (5%) ❌ 1.00 (1%)
["find", "findall", ("> q0.99", "Vector{UInt8}")] 1.07 (5%) ❌ 1.00 (1%)
["find", "findall", ("BitVector", "10-90")] 1.27 (5%) ❌ 1.00 (1%)
["find", "findall", ("Vector{Bool}", "10-90")] 1.42 (5%) ❌ 1.00 (1%)
["find", "findall", ("Vector{Bool}", "50-50")] 1.18 (5%) ❌ 1.00 (1%)
["find", "findall", ("ispos", "Vector{Float32}")] 1.21 (5%) ❌ 1.00 (1%)
["find", "findall", ("ispos", "Vector{Int8}")] 1.08 (5%) ❌ 1.00 (1%)
["find", "findnext", ("ispos", "Vector{Bool}")] 0.92 (5%) ✅ 1.00 (1%)
["find", "findnext", ("ispos", "Vector{Float32}")] 0.84 (5%) ✅ 1.00 (1%)
["find", "findnext", ("ispos", "Vector{Int64}")] 0.90 (5%) ✅ 1.00 (1%)
["find", "findnext", ("ispos", "Vector{Int8}")] 0.91 (5%) ✅ 1.00 (1%)
["find", "findnext", ("ispos", "Vector{UInt64}")] 0.87 (5%) ✅ 1.00 (1%)
["find", "findnext", ("ispos", "Vector{UInt8}")] 0.89 (5%) ✅ 1.00 (1%)
["find", "findprev", ("ispos", "Vector{Bool}")] 0.92 (5%) ✅ 1.00 (1%)
["find", "findprev", ("ispos", "Vector{Float32}")] 0.82 (5%) ✅ 1.00 (1%)
["find", "findprev", ("ispos", "Vector{Float64}")] 0.84 (5%) ✅ 1.00 (1%)
["find", "findprev", ("ispos", "Vector{UInt64}")] 1.39 (5%) ❌ 1.00 (1%)
["find", "findprev", ("ispos", "Vector{UInt8}")] 0.94 (5%) ✅ 1.00 (1%)
["inference", "abstract interpretation", "rand(Float64)"] 0.95 (5%) ✅ 1.00 (1%)
["inference", "abstract interpretation", "sin(42)"] 0.95 (5%) ✅ 1.00 (1%)
["inference", "optimization", "abstract_call_gf_by_type"] 1.08 (5%) ❌ 1.07 (1%) ❌
["io", "serialization", ("serialize", "Matrix{Float64}")] 0.86 (5%) ✅ 1.00 (1%)
["linalg", "arithmetic", ("*", "Matrix", "Vector", 256)] 0.48 (45%) ✅ 1.00 (1%)
["linalg", "arithmetic", ("*", "typename(LinearAlgebra.LowerTriangular)", "Vector", 1024)] 1.62 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("*", "typename(LinearAlgebra.LowerTriangular)", "Vector", 256)] 2.31 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("*", "typename(LinearAlgebra.LowerTriangular)", "typename(LinearAlgebra.LowerTriangular)", 256)] 1.72 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("*", "typename(LinearAlgebra.SymTridiagonal)", "typename(LinearAlgebra.SymTridiagonal)", 256)] 0.46 (45%) ✅ 1.00 (1%)
["linalg", "arithmetic", ("*", "typename(LinearAlgebra.Tridiagonal)", "Vector", 1024)] 0.33 (45%) ✅ 1.00 (1%)
["linalg", "arithmetic", ("*", "typename(LinearAlgebra.Tridiagonal)", "typename(LinearAlgebra.Tridiagonal)", 256)] 0.29 (45%) ✅ 1.00 (1%)
["linalg", "arithmetic", ("+", "typename(LinearAlgebra.Bidiagonal)", "typename(LinearAlgebra.Bidiagonal)", 256)] 0.45 (45%) ✅ 1.00 (1%)
["linalg", "arithmetic", ("+", "typename(LinearAlgebra.Tridiagonal)", "typename(LinearAlgebra.Tridiagonal)", 1024)] 1.45 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("+", "typename(LinearAlgebra.UpperTriangular)", "typename(LinearAlgebra.UpperTriangular)", 256)] 0.47 (45%) ✅ 1.00 (1%)
["linalg", "arithmetic", ("-", "typename(LinearAlgebra.Tridiagonal)", "typename(LinearAlgebra.Tridiagonal)", 1024)] 0.52 (45%) ✅ 1.00 (1%)
["linalg", "arithmetic", ("/", "Matrix", "Matrix", 1024)] 1.90 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("/", "Matrix", "Matrix", 256)] 95.01 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("/", "typename(LinearAlgebra.LowerTriangular)", "typename(LinearAlgebra.LowerTriangular)", 256)] 2.23 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("/", "typename(LinearAlgebra.UpperTriangular)", "typename(LinearAlgebra.UpperTriangular)", 1024)] 1.59 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("/", "typename(LinearAlgebra.UpperTriangular)", "typename(LinearAlgebra.UpperTriangular)", 256)] 3.90 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("\\", "Matrix", "Vector", 1024)] 2.16 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("\\", "Matrix", "Vector", 256)] 0.05 (45%) ✅ 1.00 (1%)
["linalg", "arithmetic", ("\\", "typename(LinearAlgebra.Diagonal)", "Vector", 256)] 1.48 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("\\", "typename(LinearAlgebra.Diagonal)", "typename(LinearAlgebra.Diagonal)", 1024)] 1.65 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("\\", "typename(LinearAlgebra.UpperTriangular)", "typename(LinearAlgebra.UpperTriangular)", 1024)] 1.52 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("cumsum!", "Int32", 256)] 1.57 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("exp", "typename(LinearAlgebra.Hermitian)", 1024)] 1.95 (45%) ❌ 1.00 (1%)
["linalg", "arithmetic", ("log", "typename(LinearAlgebra.Hermitian)", 1024)] 2.27 (45%) ❌ 1.00 (1%)
["linalg", "blas", "gemm"] 0.44 (40%) ✅ 1.00 (1%)
["linalg", "blas", "gemv"] 1.93 (40%) ❌ 1.00 (1%)
["linalg", "blas", "syrk"] 0.41 (40%) ✅ 1.00 (1%)
["linalg", "factorization", ("eigen", "Matrix", 256)] 0.53 (45%) ✅ 1.00 (1%)
["linalg", "factorization", ("lu", "Matrix", 256)] 0.01 (45%) ✅ 1.00 (1%)
["linalg", "factorization", ("svd", "Matrix", 1024)] 0.35 (45%) ✅ 1.00 (1%)
["linalg", "factorization", ("svd", "typename(LinearAlgebra.Bidiagonal)", 1024)] 4.49 (45%) ❌ 1.00 (1%)
["linalg", "factorization", ("svd", "typename(LinearAlgebra.UpperTriangular)", 1024)] 2.44 (45%) ❌ 1.00 (1%)
["linalg", "factorization", ("svd", "typename(LinearAlgebra.UpperTriangular)", 256)] 0.27 (45%) ✅ 1.00 (1%)
["linalg", "small exp #29116"] 0.33 (5%) ✅ 1.00 (1%)
["micro", "randmatmul"] 0.88 (5%) ✅ 1.00 (1%)
["misc", "allocation elision view", "conditional"] 0.78 (5%) ✅ 1.00 (1%)
["misc", "allocation elision view", "no conditional"] 0.77 (5%) ✅ 1.00 (1%)
["misc", "bitshift", ("Int", "Int")] 0.86 (5%) ✅ 1.00 (1%)
["misc", "bitshift", ("Int", "UInt")] 0.86 (5%) ✅ 1.00 (1%)
["misc", "bitshift", ("UInt", "UInt")] 0.86 (5%) ✅ 1.00 (1%)
["misc", "foldl", "foldl(+, filter(...))"] 0.41 (5%) ✅ 1.00 (1%)
["misc", "iterators", "zip(1:1, 1:1, 1:1, 1:1)"] 0.87 (5%) ✅ 1.00 (1%)
["misc", "iterators", "zip(1:1000)"] 1.19 (5%) ❌ 1.00 (1%)
["misc", "repeat", (200, 1, 24)] 0.94 (5%) ✅ 1.00 (1%)
["misc", "repeat", (200, 24, 1)] 0.85 (5%) ✅ 1.00 (1%)
["problem", "simplex", "simplex"] 1.29 (5%) ❌ 1.00 (1%)
["problem", "spellcheck", "spellcheck"] 0.94 (5%) ✅ 1.00 (1%)
["random", "types", ("rand!", "RandomDevice", "Int64")] 1.85 (25%) ❌ 1.00 (1%)
["shootout", "binary_trees"] 0.91 (5%) ✅ 1.00 (1%)
["simd", ("Cartesian", "axpy!", "Float32", 2, 64)] 1.29 (20%) ❌ 1.00 (1%)
["simd", ("Cartesian", "axpy!", "Float64", 2, 63)] 0.78 (20%) ✅ 1.00 (1%)
["simd", ("Cartesian", "axpy!", "Float64", 2, 64)] 0.62 (20%) ✅ 1.00 (1%)
["simd", ("Cartesian", "manual_example!", "Float64", 2, 63)] 1.41 (20%) ❌ 1.00 (1%)
["simd", ("Cartesian", "manual_example!", "Float64", 2, 64)] 1.22 (20%) ❌ 1.00 (1%)
["simd", ("Cartesian", "manual_example!", "Int64", 2, 64)] 1.37 (20%) ❌ 1.00 (1%)
["simd", ("CartesianPartition", "manual_example!", "Float64", 2, 63)] 1.30 (20%) ❌ 1.00 (1%)
["simd", ("CartesianPartition", "manual_example!", "Float64", 2, 64)] 1.41 (20%) ❌ 1.00 (1%)
["simd", ("CartesianPartition", "manual_example!", "Int64", 2, 64)] 1.21 (20%) ❌ 1.00 (1%)
["simd", ("CartesianPartition", "two_reductions", "Int32", 2, 31)] 1.22 (20%) ❌ 1.00 (1%)
["simd", ("CartesianPartition", "two_reductions", "Int32", 2, 32)] 1.22 (20%) ❌ 1.00 (1%)
["simd", ("CartesianPartition", "two_reductions", "Int32", 2, 63)] 1.22 (20%) ❌ 1.00 (1%)
["simd", ("CartesianPartition", "two_reductions", "Int32", 2, 64)] 1.22 (20%) ❌ 1.00 (1%)
["simd", ("Linear", "auto_axpy!", "Float32", 4096)] 1.25 (20%) ❌ 1.00 (1%)
["simd", ("Linear", "axpy!", "Float32", 4096)] 1.22 (20%) ❌ 1.00 (1%)
["sparse", "arithmetic", ("unary minus", "(20000, 20000)")] 0.69 (30%) ✅ 1.00 (1%)
["sparse", "constructors", ("Bidiagonal", 10)] 0.94 (5%) ✅ 1.00 (1%)
["sparse", "constructors", ("Bidiagonal", 100)] 0.73 (5%) ✅ 1.00 (1%)
["sparse", "constructors", ("Diagonal", 1000)] 0.95 (5%) ✅ 1.00 (1%)
["sparse", "constructors", ("IJV", 1000)] 0.75 (5%) ✅ 1.00 (1%)
["sparse", "constructors", ("Tridiagonal", 100)] 0.91 (5%) ✅ 1.00 (1%)
["sparse", "index", ("spmat", "OneTo", 10)] 1.34 (30%) ❌ 1.00 (1%)
["sparse", "index", ("spmat", "col", "OneTo", 10)] 1.36 (30%) ❌ 1.00 (1%)
["sparse", "index", ("spmat", "col", "range", 1000)] 1.32 (30%) ❌ 1.00 (1%)
["sparse", "index", ("spvec", "integer", 10000)] 1.34 (30%) ❌ 1.00 (1%)
["sparse", "sparse matvec", "adjoint"] 0.83 (5%) ✅ 1.00 (1%)
["sparse", "sparse solves", "least squares (default), matrix rhs"] 0.40 (5%) ✅ 1.00 (1%)
["sparse", "sparse solves", "least squares (qr), matrix rhs"] 1.14 (5%) ❌ 1.00 (1%)
["sparse", "sparse solves", "least squares (qr), vector rhs"] 1.34 (5%) ❌ 1.00 (1%)
["sparse", "sparse solves", "square system (default), matrix rhs"] 1.32 (5%) ❌ 1.00 (1%)
["sparse", "sparse solves", "square system (default), vector rhs"] 1.18 (5%) ❌ 1.00 (1%)
["sparse", "sparse solves", "square system (lu), matrix rhs"] 0.58 (5%) ✅ 1.00 (1%)
["sparse", "sparse solves", "square system (lu), vector rhs"] 1.23 (5%) ❌ 1.00 (1%)
["sparse", "transpose", ("transpose", "(20000, 10000)")] 0.66 (30%) ✅ 1.00 (1%)
["sparse", "transpose", ("transpose", "(600, 400)")] 0.69 (30%) ✅ 1.00 (1%)
["string", "==(::SubString, ::String)", "different"] 0.94 (5%) ✅ 1.00 (1%)
["string", "repeat", "repeat str len 16"] 0.92 (5%) ✅ 1.00 (1%)
["tuple", "reduction", ("minimum", "(2, 2)")] 0.69 (5%) ✅ 1.00 (1%)
["tuple", "reduction", ("minimum", "(2,)")] 1.05 (5%) ❌ 1.00 (1%)
["union", "array", ("broadcast", "*", "BigFloat", "(false, false)")] 0.93 (5%) ✅ 1.00 (1%)
["union", "array", ("broadcast", "*", "BigFloat", "(false, true)")] 0.95 (5%) ✅ 1.00 (1%)
["union", "array", ("broadcast", "*", "BigFloat", "(true, true)")] 0.93 (5%) ✅ 1.00 (1%)
["union", "array", ("broadcast", "abs", "BigFloat", 1)] 1.06 (5%) ❌ 1.00 (1%)
["union", "array", ("broadcast", "abs", "Int8", 1)] 1.28 (5%) ❌ 1.00 (1%)
["union", "array", ("broadcast", "identity", "ComplexF64", 0)] 0.94 (5%) ✅ 1.00 (1%)
["union", "array", ("collect", "filter", "Bool", 1)] 1.10 (5%) ❌ 1.00 (1%)
["union", "array", ("collect", "filter", "Int64", 1)] 1.08 (5%) ❌ 1.00 (1%)
["union", "array", ("map", "*", "BigFloat", "(false, true)")] 0.92 (5%) ✅ 1.00 (1%)
["union", "array", ("map", "*", "BigFloat", "(true, true)")] 0.92 (5%) ✅ 1.00 (1%)
["union", "array", ("map", "abs", "BigInt", 1)] 0.94 (5%) ✅ 1.00 (1%)
["union", "array", ("map", "identity", "ComplexF64", 0)] 0.94 (5%) ✅ 1.00 (1%)
["union", "array", ("perf_binaryop", "*", "BigFloat", "(false, false)")] 0.92 (5%) ✅ 1.00 (1%)
["union", "array", ("perf_binaryop", "*", "BigFloat", "(false, true)")] 0.92 (5%) ✅ 1.00 (1%)
["union", "array", ("perf_binaryop", "*", "BigFloat", "(true, true)")] 0.92 (5%) ✅ 1.00 (1%)
["union", "array", ("skipmissing", "collect", "Union{Nothing, Bool}", 0)] 1.06 (5%) ❌ 1.00 (1%)
["union", "array", ("skipmissing", "collect", "Union{Nothing, ComplexF64}", 0)] 1.08 (5%) ❌ 1.00 (1%)
["union", "array", ("skipmissing", "collect", "Union{Nothing, Int8}", 0)] 1.06 (5%) ❌ 1.00 (1%)
["union", "array", ("sort", "BigFloat", 0)] 1.13 (5%) ❌ 1.00 (1%)
["union", "array", ("sort", "Union{Missing, BigFloat}", 1)] 1.11 (5%) ❌ 1.00 (1%)
["union", "array", ("sort", "Union{Nothing, BigFloat}", 0)] 1.10 (5%) ❌ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["array", "accumulate"]
  • ["array", "any/all"]
  • ["array", "bool"]
  • ["array", "cat"]
  • ["array", "comprehension"]
  • ["array", "convert"]
  • ["array", "equality"]
  • ["array", "growth"]
  • ["array", "index"]
  • ["array", "reductions"]
  • ["array", "reverse"]
  • ["array", "setindex!"]
  • ["array", "subarray"]
  • ["broadcast"]
  • ["broadcast", "dotop"]
  • ["broadcast", "fusion"]
  • ["broadcast", "mix_scalar_tuple"]
  • ["broadcast", "sparse"]
  • ["broadcast", "typeargs"]
  • ["collection", "deletion"]
  • ["collection", "initialization"]
  • ["collection", "iteration"]
  • ["collection", "optimizations"]
  • ["collection", "queries & updates"]
  • ["collection", "set operations"]
  • ["dates", "accessor"]
  • ["dates", "arithmetic"]
  • ["dates", "construction"]
  • ["dates", "conversion"]
  • ["dates", "parse"]
  • ["dates", "query"]
  • ["dates", "string"]
  • ["find", "findall"]
  • ["find", "findnext"]
  • ["find", "findprev"]
  • ["frontend"]
  • ["inference", "abstract interpretation"]
  • ["inference"]
  • ["inference", "optimization"]
  • ["io", "array_limit"]
  • ["io", "read"]
  • ["io", "serialization"]
  • ["io"]
  • ["linalg", "arithmetic"]
  • ["linalg", "blas"]
  • ["linalg", "factorization"]
  • ["linalg"]
  • ["micro"]
  • ["misc"]
  • ["misc", "23042"]
  • ["misc", "afoldl"]
  • ["misc", "allocation elision view"]
  • ["misc", "bitshift"]
  • ["misc", "foldl"]
  • ["misc", "issue 12165"]
  • ["misc", "iterators"]
  • ["misc", "julia"]
  • ["misc", "parse"]
  • ["misc", "repeat"]
  • ["misc", "splatting"]
  • ["problem", "chaosgame"]
  • ["problem", "fem"]
  • ["problem", "go"]
  • ["problem", "grigoriadis khachiyan"]
  • ["problem", "imdb"]
  • ["problem", "json"]
  • ["problem", "laplacian"]
  • ["problem", "monte carlo"]
  • ["problem", "raytrace"]
  • ["problem", "seismic"]
  • ["problem", "simplex"]
  • ["problem", "spellcheck"]
  • ["problem", "stockcorr"]
  • ["problem", "ziggurat"]
  • ["random", "collections"]
  • ["random", "randstring"]
  • ["random", "ranges"]
  • ["random", "sequences"]
  • ["random", "types"]
  • ["shootout"]
  • ["simd"]
  • ["sort", "insertionsort"]
  • ["sort", "issorted"]
  • ["sort", "mergesort"]
  • ["sort", "quicksort"]
  • ["sparse", "arithmetic"]
  • ["sparse", "constructors"]
  • ["sparse", "index"]
  • ["sparse", "matmul"]
  • ["sparse", "sparse matvec"]
  • ["sparse", "sparse solves"]
  • ["sparse", "transpose"]
  • ["string", "==(::AbstractString, ::AbstractString)"]
  • ["string", "==(::SubString, ::String)"]
  • ["string", "findfirst"]
  • ["string"]
  • ["string", "readuntil"]
  • ["string", "repeat"]
  • ["tuple", "index"]
  • ["tuple", "linear algebra"]
  • ["tuple", "misc"]
  • ["tuple", "reduction"]
  • ["union", "array"]

Version Info

Primary Build

a79e40d61d

Comparison Build

85a6990a9c

Benchmark Report

Job Properties

Commits: https://github.com/JuliaLang/julia@a79e40d61d4d0861c8fcbf15709588fe18ee8f74 vs https://github.com/JuliaLang/julia@85a6990a9c1d49dd5aeaffeb4b38f881dc120823

Comparison Diff: link

Triggered By: link

Tag Predicate: !scalar

Results

Note: If Chrome is your browser, I strongly recommend installing the Wide GitHub
extension, which makes the result table easier to read.

Below is a table of this job's results, obtained by running the benchmarks found in
JuliaCI/BaseBenchmarks.jl. The values
listed in the ID column have the structure [parent_group, child_group, ..., key],
and can be used to index into the BaseBenchmarks suite to retrieve the corresponding
benchmarks.

The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["arithmetic", ("*", "typename(LinearAlgebra.SymTridiagonal)", "Vector", 1024)] 0.53 (45%) ✅ 1.00 (1%)
["arithmetic", ("*", "typename(LinearAlgebra.UpperTriangular)", "Vector", 256)] 0.45 (45%) ✅ 1.00 (1%)
["arithmetic", ("+", "Vector", "Vector", 256)] 1.76 (45%) ❌ 1.00 (1%)
["arithmetic", ("+", "typename(LinearAlgebra.Bidiagonal)", "typename(LinearAlgebra.Bidiagonal)", 1024)] 0.48 (45%) ✅ 1.00 (1%)
["arithmetic", ("+", "typename(LinearAlgebra.Diagonal)", "typename(LinearAlgebra.Diagonal)", 256)] 1.83 (45%) ❌ 1.00 (1%)
["arithmetic", ("-", "Vector", "Vector", 1024)] 0.48 (45%) ✅ 1.00 (1%)
["arithmetic", ("\\", "Matrix", "Matrix", 1024)] 0.17 (45%) ✅ 1.00 (1%)
["arithmetic", ("\\", "Matrix", "Vector", 1024)] 0.27 (45%) ✅ 1.00 (1%)
["arithmetic", ("\\", "typename(LinearAlgebra.LowerTriangular)", "typename(LinearAlgebra.LowerTriangular)", 256)] 0.52 (45%) ✅ 1.00 (1%)
["arithmetic", ("cumsum!", "Float32", 256)] 1.46 (45%) ❌ 1.00 (1%)
["arithmetic", ("log", "typename(LinearAlgebra.Hermitian)", 1024)] 4.02 (45%) ❌ 1.00 (1%)
["blas", "gemm!"] 1.91 (40%) ❌ 1.00 (1%)
["factorization", ("schur", "Matrix", 1024)] 1.89 (45%) ❌ 1.00 (1%)
["factorization", ("schur", "Matrix", 256)] 2.75 (45%) ❌ 1.00 (1%)
["factorization", ("svd", "Matrix", 256)] 1.71 (45%) ❌ 1.00 (1%)
["factorization", ("svd", "typename(LinearAlgebra.UpperTriangular)", 256)] 1.47 (45%) ❌ 1.00 (1%)
["small exp #29116"] 0.74 (5%) ✅ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["arithmetic"]
  • ["blas"]
  • ["factorization"]
  • []

Version Info

Primary Build

a79e40d61d

Comparison Build

85a6990a9c

@aviatesk aviatesk merged commit 1b600f0 into master Jan 5, 2022
@aviatesk aviatesk deleted the avi/43287 branch January 5, 2022 14:11
@KristofferC KristofferC mentioned this pull request Jan 5, 2022
23 tasks
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

aviatesk added a commit that referenced this pull request Jan 5, 2022
…prop' callsite (#43347)

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
aviatesk added a commit that referenced this pull request Jan 5, 2022
…prop' callsite (#43347)

Makes full use of constant-propagation, by addressing this 
[TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
aviatesk added a commit that referenced this pull request Jan 5, 2022
…prop' callsite (#43347)

Makes full use of constant-propagation, by addressing this 
[TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from #43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes #43287
@KristofferC KristofferC mentioned this pull request Feb 15, 2022
40 tasks
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
…prop' callsite (JuliaLang#43347)

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from JuliaLang#43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes JuliaLang#43287
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
…prop' callsite (JuliaLang#43347)

Makes full use of constant-propagation, by addressing this [TODO](https://github.com/JuliaLang/julia/blob/00734c5fd045316a00d287ca2c0ec1a2eef6e4d1/base/compiler/ssair/inlining.jl#L1212).
Here is a performance improvement from JuliaLang#43287:
```julia
ulia> using BenchmarkTools

julia> X = rand(ComplexF32, 64, 64);

julia> dst = reinterpret(reshape, Float32, X);

julia> src = copy(dst);

julia> @Btime copyto!($dst, $src);
  50.819 μs (1 allocation: 32 bytes) # v1.6.4
  41.081 μs (0 allocations: 0 bytes) # this commit
```

fixes JuliaLang#43287
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) needs nanosoldier run This PR should have benchmarks run on it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slow fallback array copyto! in Julia 1.7.0
6 participants