-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make for iter::CartesianIndices
better vectorized for 1d/2d cases.
#45338
Conversation
rng = indices[1] | ||
I = state[1] + step(rng) | ||
valid = __is_valid_range(I, rng) && state[1] != last(rng) | ||
if N == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I get the idea -- when will N != 1
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N
is the dimension of the CartesianIndices
:
- If
N > 1
, the outermost dimension uses__is_valid_range
to preserve the performance improvement introduced in add StepRange support for CartesianIndices #37829. - If
N == 1
, just usestate[1] != last(rng)
, as__is_valid_range
pervents vectorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But by calling __inc(state.I, iter.indices, Val(ndims(iter)))
as in you did in R404, because of the type annotion state::Tuple{Int}, indices::Tuple{OrdinalRangeInt}
, this method R420 would only be called when length(iter.indices) == ndims(iter) == 1
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also called at R435 as the last input ndim
is passed deeper without any change.
BTW, all the test failures should be invalid state
. Since we have
julia> iterate(1:2:typemax(Int), typemax(Int)-1)
(-9223372036854775808, -9223372036854775808)
I think it's OK to replace them.
Some local "index" => 7-element BenchmarkTools.BenchmarkGroup:
tags: ["sum", "simd"]
("sumeach", "SubArray{Int32, 2, Matrix{Int32}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}") => TrialJudgement(-85.57% => improvement)
("sumcartesian", "SubArray{Int32, 2, Matrix{Int32}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}") => TrialJudgement(-87.51% => improvement)
("sumcartesian", "1:100000") => TrialJudgement(-100.00% => improvement)
("sumeach", "SubArray{Int32, 2, BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, false}") => TrialJudgement(-87.62% => improvement)
("sumcartesian", "SubArray{Int32, 2, BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, false}") => TrialJudgement(-87.48% => improvement)
("sumcartesian", "SubArray{Int32, 2, Matrix{Int32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}") => TrialJudgement(-84.04% => improvement)
("sumcartesian", "100000:-1:1") => TrialJudgement(-100.00% => improvement) |
Co-Authored-By: Johnny Chen <[email protected]>
(These were introduced for performance.)
With today's master (LLVM14), this PR also helps vectorizing 3d |
No need after #51606. |
It's a bit sad that (subjectively) so many PRs get forgotten about for so long. |
Local benchmark shows that this makes non-
@simd
1d/2d loop faster.on master:
This PR