-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix overflow in CartesianIndices iteration #31011
Conversation
1aac477
to
eebadbc
Compare
I should add that this helps vectorize 1D CartesianIndices, but doesn't help with ND. The loop we lower two has two induction variables. instead of being a inner and a outer for-loop. I would love for: function vecfail(A)
@inbounds begin
acc = zero(eltype(A))
for I in CartesianIndices(A)
acc = Base.FastMath.add_fast(acc, A[I])
end
end
return acc
end to be more or less equivalent to using Base.Cartesian
@generated function vecfail(A::AbstractArray{T, N}) where {T, N}
quote
@inbounds begin
acc = zero(eltype(A))
@nloops $N i A begin
acc = Base.FastMath.add_fast(acc, @nref($N, A, i))
end
end
return acc
end
end I know |
Ok updated once again after I realized that
is basically equivalent to:
Since the latter already accepts invalid state. I also experimented with checked arithmetic, but that didn't help with vectorisation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I think between this suggestion and my refinement in the comment below, we may have a fix (or close to it) for #9080! That's one of the few bugs whose number I know by memory, so to me that's a big deal.
See this gist and then:
julia> includet("/tmp/iterc.jl")
julia> using BenchmarkTools
julia> @btime sumcart_manual($A)
102.337 ms (0 allocations: 0 bytes)
5.0001993217249475e7
julia> @btime sumcart_iter($A)
131.513 ms (0 allocations: 0 bytes)
5.0001993217249475e7
julia> @btime sumcart_newiter($A)
105.207 ms (0 allocations: 0 bytes)
5.0001993217249475e7
base/multidimensional.jl
Outdated
@@ -334,9 +338,11 @@ module IteratorsMD | |||
iterfirst, iterfirst | |||
end | |||
@inline function iterate(iter::CartesianIndices, state) | |||
# If we increment before the condition check, we run the | |||
# risk of an integer overflow. This intentionally allows for invalid state. | |||
state == last(iter) && return nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the ND case, what if we weave this check together with inc
? Here we check all of the entries of state
and iter
, and if there are any differences we bail. But then we potentially check them all again in inc
. It seems you could be running this check on each entry as you go, and then bail if you've exhausted the tuple.
Something like this:
@inline function iterate(iter::CartesianIndices, state)
inc((), state.I, first(iter).I, last(iter).I)
end
# increment & carry
@inline inc(out, ::Tuple{}, ::Tuple{}, ::Tuple{}) = nothing
@inline function inc(out, state, start, stop)
if state[1] < stop[1]
nextstate = CartesianIndex(out..., state[1]+1, tail(state)...)
return nextstate, nextstate
end
return inc((out..., start[1]), tail(state), tail(start), tail(stop))
end
Now historically Julia's inference has not liked arguments that grow, but experimenting with @code_warntype
up to 8 dimensions suggests it's OK in this case. Cross-checking with @vtjnash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I went a slightly different approach that constructs the tuple in anycase and propagates a boolean to check if it is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still solve #9080?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:/ No.
julia> @benchmark sumcart_iter($A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 190.778 ms (0.00% GC)
median time: 191.650 ms (0.00% GC)
mean time: 191.665 ms (0.00% GC)
maximum time: 192.456 ms (0.00% GC)
--------------
samples: 27
evals/sample: 1
julia> @benchmark sumcart_manual($A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 102.647 ms (0.00% GC)
median time: 103.522 ms (0.00% GC)
mean time: 103.663 ms (0.00% GC)
maximum time: 105.404 ms (0.00% GC)
--------------
samples: 49
evals/sample: 1
With regards to the multidimensional case, note that |
f79e937
to
d7805f7
Compare
I wonder if Polly could do things like this, but right now both |
Interesting using Tim's variant I do see a speedup 200->150, but not the same speedup as if I am using |
Just to check, are you saying that if you "wire this in" to |
Exactly! I rewrote the using Base: tail
@inline function myiterate(iter::CartesianIndices)
iterfirst, iterlast = first(iter), last(iter)
if any(map(>, iterfirst.I, iterlast.I))
return nothing
end
iterfirst, iterfirst
end
@inline function myiterate(iter::CartesianIndices, state)
inc((), state.I, first(iter).I, last(iter).I)
end
# 0-d cartesian ranges are special-cased to iterate once and only once
myiterate(iter::CartesianIndices{0}, done=false) = done ? nothing : (CartesianIndex(), true)
# increment & carry
@inline inc(out, ::Tuple{}, ::Tuple{}, ::Tuple{}) = nothing
@inline function inc(out, state::Tuple{Int}, start::Tuple{Int}, stop::Tuple{Int})
if state[1] < stop[1]
nextstate = CartesianIndex(out..., state[1]+1)
return nextstate, nextstate
end
return nothing
end
@inline function inc(out, state, start, stop)
if state[1] < stop[1]
nextstate = CartesianIndex(out..., state[1]+1, tail(state)...)
return nextstate, nextstate
end
return inc((out..., start[1]), tail(state), tail(start), tail(stop))
end
function sumcart_newiter(A)
s = 0.0
iter = CartesianIndices(A)
ret = myiterate(iter)
@inbounds while !(ret === nothing)
I, state = ret
s += A[I]
ret = myiterate(iter, state)
end
return s
end The only difference is that this a |
Ok... before a755c6a
after:
This is a a bit odd since LLVM prefers @nanosoldier |
What does this lowering change look like in terms of the equivalent Julia syntax? I think there’s an example of for loop lowering in the manual that might need to be adjusted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really fascinating. I am not the best person to comment on LLVM's preferences for various lowered forms, and certainly not the changes to julia-synax.scm
. It will be interesting to see the nanosoldier run.
base/multidimensional.jl
Outdated
@inline function inc(state, start, stop) | ||
# increment post check to avoid integer overflow | ||
@inline inc(out, ::Tuple{}, ::Tuple{}, ::Tuple{}) = nothing | ||
@inline function inc(out, state::Tuple{Int}, start::Tuple{Int}, stop::Tuple{Int}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need this method anymore? I think it's covered by the generic case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I added it because this is a common pattern elsewhere (probably steming from a time when inference liked code like this more).
Since we don't have a
my change changes it back to how it was on 0.6, but with the new iteration protocol
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
The nanosoldier results are confusing and all over the board. How can this change lead to half the memory usage.... |
base/multidimensional.jl
Outdated
@@ -95,6 +95,10 @@ module IteratorsMD | |||
# access to index tuple | |||
Tuple(index::CartesianIndex) = index.I | |||
|
|||
# equality | |||
import Base: == | |||
==(a::CartesianIndex{N}, b::CartesianIndex{N}) where N = a.I == b.I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the import
, this can just be Base.:(==)(...)
, which is more explicit.
The results will be noisier the more benchmarks are run at once, so if there are a few in particular you're interested in, you could do those specifically instead of |
sigh Current Julia:
Looking at the docs for |
Okay. I decided not to experiment with loop-form and just focus on fixing the over/underflows. Notes:
|
I went back to the form This also sets us up with a CGF that I believe to be amenable to be transformed into a loop nest. |
There seems to be an inferrability issue:
Haven't had time to take a deeper look. |
This allows LLVM to vectorize the 1D CartesianIndices case, as well as fixing an overflow bug for: ```julia CartesianIndices(((typemax(Int64)-2):typemax(Int64),)) ``` Co-authored-by: Yingbo Ma <[email protected]>
I plan to merge this tomorrow. |
I wanna see another nanosoldier run first. There were some fairly major regressions in array iteration in the run in February. @nanosoldier |
Fair enough, the run in February was the experiment in changing the loop structure for the iteration protocol. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Co-Authored-By: vchuravy <[email protected]>
@mbauman The nanosoldier results look clean. Are you okay with squash merging this as is? |
Make iteration over CartesianIndices blazing fast.
Fixes #30998, by avoiding the potential overflow.
LLVM was right..., based on the given semantics it
is not possible to determine the loop-bounds of the
loop in:
Since it is possible for the iteration to overflow: