Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

N5N3 · 2022-05-17T13:37:30Z

Local benchmark shows that this makes non-@simd 1d/2d loop faster.

function sumcart_iter(A)
    s = zero(eltype(A))
    for I in CartesianIndices(A)
        @inbounds @fastmath s += A[I] # use @fastmath to enable simd
    end
    s
end

on master:

julia> A = view(rand(256*20),1:256*20);
julia> @btime sumcart_iter($A)
  4.657 μs (0 allocations: 0 bytes)
2539.6790676561955

julia> B = view(rand(256,20),1:256,1:20);
julia> @btime sumcart_iter($B)
  4.657 μs (0 allocations: 0 bytes)
2557.7341880811764

This PR

julia> @btime sumcart_iter($A)
  305.200 ns (0 allocations: 0 bytes)
2539.6790676562046

julia> @btime sumcart_iter($B)
  406.500 ns (0 allocations: 0 bytes)
2557.734188081178

johnnychen94 · 2022-05-17T14:00:45Z

base/multidimensional.jl

        rng = indices[1]
        I = state[1] + step(rng)
-        valid = __is_valid_range(I, rng) && state[1] != last(rng)
+        if N == 1


I'm not sure if I get the idea -- when will N != 1 here?

N is the dimension of the CartesianIndices:

If N > 1, the outermost dimension uses __is_valid_range to preserve the performance improvement introduced in add StepRange support for CartesianIndices #37829.

If N == 1, just use state[1] != last(rng), as __is_valid_range pervents vectorization.

But by calling __inc(state.I, iter.indices, Val(ndims(iter))) as in you did in R404, because of the type annotion state::Tuple{Int}, indices::Tuple{OrdinalRangeInt}, this method R420 would only be called when length(iter.indices) == ndims(iter) == 1, right?

It's also called at R435 as the last input ndim is passed deeper without any change.

BTW, all the test failures should be invalid state. Since we have

julia> iterate(1:2:typemax(Int), typemax(Int)-1) (-9223372036854775808, -9223372036854775808)

I think it's OK to replace them.

base/multidimensional.jl

johnnychen94 · 2022-05-17T15:53:33Z

This is good for me as long as the test passes. Also ping @vchuravy and @timholy as this was originally written in #31011

N5N3 · 2022-05-18T03:48:49Z

Some local BaseBenchmark:

          "index" => 7-element BenchmarkTools.BenchmarkGroup:
                  tags: ["sum", "simd"]
                  ("sumeach", "SubArray{Int32, 2, Matrix{Int32}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}") => TrialJudgement(-85.57% => improvement)
                  ("sumcartesian", "SubArray{Int32, 2, Matrix{Int32}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}") => TrialJudgement(-87.51% => improvement)
                  ("sumcartesian", "1:100000") => TrialJudgement(-100.00% => improvement)
                  ("sumeach", "SubArray{Int32, 2, BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, false}") => TrialJudgement(-87.62% => improvement)
                  ("sumcartesian", "SubArray{Int32, 2, BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, false}") => TrialJudgement(-87.48% => improvement)
                  ("sumcartesian", "SubArray{Int32, 2, Matrix{Int32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}") => TrialJudgement(-84.04% => improvement)
                  ("sumcartesian", "100000:-1:1") => TrialJudgement(-100.00% => improvement)

base/multidimensional.jl

Co-Authored-By: Johnny Chen <[email protected]>

(These were introduced for performance.)

N5N3 · 2022-06-18T07:16:08Z

With today's master (LLVM14), this PR also helps vectorizing 3d CartesianIndices (Very limited though)

N5N3 · 2023-10-08T15:38:10Z

No need after #51606.

jonas-schulze · 2023-10-13T06:49:48Z

It's a bit sad that (subjectively) so many PRs get forgotten about for so long.

N5N3 added the performance Must go faster label May 17, 2022

N5N3 requested a review from johnnychen94 May 17, 2022 13:38

johnnychen94 reviewed May 17, 2022

View reviewed changes

base/multidimensional.jl Show resolved Hide resolved

johnnychen94 approved these changes May 18, 2022

View reviewed changes

johnnychen94 reviewed May 18, 2022

View reviewed changes

base/multidimensional.jl Outdated Show resolved Hide resolved

N5N3 force-pushed the cart-auto-simd branch from d105525 to db66c7a Compare May 18, 2022 10:54

N5N3 and others added 3 commits June 18, 2022 15:15

Make for iter::CartesianIndices{1/2} better vectorized.

9a2c7d0

Add more comments.

9c6c02d

Co-Authored-By: Johnny Chen <[email protected]>

Remove invalid state test.

66b8e2d

(These were introduced for performance.)

N5N3 force-pushed the cart-auto-simd branch from db66c7a to 66b8e2d Compare June 18, 2022 07:15

N5N3 closed this Oct 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

N5N3 commented May 17, 2022

johnnychen94 May 17, 2022

N5N3 May 17, 2022 •

edited

Loading

johnnychen94 May 17, 2022 •

edited

Loading

N5N3 May 17, 2022 •

edited

Loading

johnnychen94 commented May 17, 2022 •

edited

Loading

N5N3 commented May 18, 2022

N5N3 commented Jun 18, 2022 •

edited

Loading

N5N3 commented Oct 8, 2023

jonas-schulze commented Oct 13, 2023

Make for iter::CartesianIndices better vectorized for 1d/2d cases. #45338

Make for iter::CartesianIndices better vectorized for 1d/2d cases. #45338

Conversation

N5N3 commented May 17, 2022

johnnychen94 May 17, 2022

Choose a reason for hiding this comment

N5N3 May 17, 2022 • edited Loading

Choose a reason for hiding this comment

johnnychen94 May 17, 2022 • edited Loading

Choose a reason for hiding this comment

N5N3 May 17, 2022 • edited Loading

Choose a reason for hiding this comment

johnnychen94 commented May 17, 2022 • edited Loading

N5N3 commented May 18, 2022

N5N3 commented Jun 18, 2022 • edited Loading

N5N3 commented Oct 8, 2023

jonas-schulze commented Oct 13, 2023

Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

N5N3 May 17, 2022 •

edited

Loading

johnnychen94 May 17, 2022 •

edited

Loading

N5N3 May 17, 2022 •

edited

Loading

johnnychen94 commented May 17, 2022 •

edited

Loading

N5N3 commented Jun 18, 2022 •

edited

Loading