-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
while i <= L
makes function >5x slower the while i < (L+1)
#31416
Comments
Note that the two conditions are not strictly identical because of integer wrap-around. |
when x-ref: #31011 the "right way" of writing the second loop is:
|
The weird part is that I would expect the non-vectorized versions to be slower, not faster. |
Updating function while_vect_test_lt(vec::Vector{T}) where T
ret=T(0)
L=length(vec)::Int
i=0::Int
while i < L
i += 1;
@inbounds v=vec[i]
ret+=v
end
return ret
end Yields: julia> @benchmark while_vect_test_lt(vec)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 65.307 ns (0.00% GC)
median time: 69.715 ns (0.00% GC)
mean time: 84.524 ns (8.87% GC)
maximum time: 55.671 μs (99.78% GC)
--------------
samples: 10000
evals/sample: 976 |
Both the first and second loop are vectorized and are fast. The third loop is not vectorized and is slow. |
Then isn't the explanation backwards since those are the two that compare with |
If |
Oh duh, they both terminate while |
Not sure there's anything to do here: Julia and LLVM seem to be handling this correctly. |
I agree that is the semantics that we have and LLVM handles them correctly. I will note that using @ndinsmore you mentioned:
Do you have more information on that? |
Also, note that if you use a range as is idiomatic, then this has good performance: function vect_test_range(vec::Vector{T}) where T
ret = T(0)
L = length(vec)
for i = 1:L
@inbounds v = vec[i]
ret += v
end
return ret
end Unfortunately, the even more idiomatic version just doing iteration is less fast: function vect_test_iterate(vec::Vector{T}) where T
ret = T(0)
for v in vec
ret += v
end
return ret
end Do we have an issue tracking the iteration protocol being able to automatically All my timings:
|
Looks the same to me? julia> @btime vect_test_range(vec)
538.303 ns (1 allocation: 16 bytes)
50005000
julia> @btime vect_test_iterate(vec)
545.455 ns (1 allocation: 16 bytes)
50005000 #27386 should have fixed that. |
I'm getting consistently different timings: julia> @btime vect_test_range(vec)
47.356 ns (1 allocation: 16 bytes)
500500
julia> @btime vect_test_iterate(vec)
73.695 ns (1 allocation: 16 bytes)
500500
julia> @btime vect_test_range($vec)
31.103 ns (0 allocations: 0 bytes)
500500
julia> @btime vect_test_iterate($vec)
57.703 ns (0 allocations: 0 bytes)
500500 Will build a fresh master and see if anything changes... |
For small arrays I see some difference but it seems it goes away when increasing the size a bit: julia> vec=collect(1:1000);
julia> @btime vect_test_range(vec)
42.380 ns (1 allocation: 16 bytes)
500500
julia> @btime vect_test_iterate(vec)
66.700 ns (1 allocation: 16 bytes)
500500
julia> vec=collect(1:5000);
julia> @btime vect_test_range(vec)
283.272 ns (1 allocation: 16 bytes)
12502500
julia> @btime vect_test_iterate(vec)
283.392 ns (1 allocation: 16 bytes)
12502500 |
Ok, that's interesting. Not sure how much we should be bothered by that... I am a little. |
So how would you write a non-infinite loop using |
This isn't a Julia-specific problem, it's a general "integers overflow" problem. C avoids this by making integer overflow UB, so the compiler can do anything it wants, but we don't want to introduce UB. You could check that At a higher level, why are you writing while loops rather than iterating over a range? |
So it started with an obsession the performance of iteration on dicts: using BenchmarkTools
function iterate_test_func(iteration_function,f,iter,N=1)
ret=0
for n in 1:N
ret=0
@inbounds next = iteration_function(iter)
while next !== nothing
(v, state) = next
ret+=f(v)
@inbounds next = iteration_function(iter, state)
end
end
return ret
end
d=Dict(v=>v for v in 1:1000)
@benchmark iterate_test_func(iterate,v->v,values(d))
@benchmark iterate_test_func(iterate,v->v.second,d) results julia> @benchmark iterate_test_func(iterate,v->v,values(d))
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 2
--------------
minimum time: 4.223 μs (0.00% GC)
median time: 4.953 μs (0.00% GC)
mean time: 5.478 μs (0.00% GC)
maximum time: 125.694 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 7
julia> @benchmark iterate_test_func(iterate,v->v.second,d)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 4.522 μs (0.00% GC)
median time: 5.210 μs (0.00% GC)
mean time: 5.675 μs (0.00% GC)
maximum time: 135.197 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 7 Then to see if iterate could be faster I wrote something like: Base.@propagate_inbounds @inline function iterate_slim(v::T, i = v.dict.idxfloor) where T <: Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}}
d=v.dict
slots=d.slots
L = length(slots)
while i <= L
@inbounds if slots[i] == 0x1
@inbounds return ((T <: Base.KeySet) ? d.keys[i] : d.vals[i], i+1)
end
i += 1
end
return nothing
end
julia> @benchmark iterate_test_func(iterate_slim,v->v,values(d),10)
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 2
--------------
minimum time: 36.138 μs (0.00% GC)
median time: 39.947 μs (0.00% GC)
mean time: 42.563 μs (0.00% GC)
maximum time: 827.562 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1 Now knowing about the pitfall of <=: (Very small code change) Base.@propagate_inbounds @inline function iterate_slim_lt(v::T, i = v.dict.idxfloor-1) where T <: Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}}
d=v.dict
slots=d.slots
L = length(slots)
while i < L
i += 1
@inbounds if slots[i] == 0x1
@inbounds return ((T <: Base.KeySet) ? d.keys[i] : d.vals[i], i+1)
end
end
return nothing
end
julia> @benchmark iterate_test_func(iterate_slim_lt,v->v,values(d),1)
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 2
--------------
minimum time: 2.257 μs (0.00% GC)
median time: 2.696 μs (0.00% GC)
mean time: 3.100 μs (0.00% GC)
maximum time: 132.669 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 9 Finally at one point I was trying to understand entitlement so I wrote: function slow_entitlement_test_func(f,iter::Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}},N=1)
ret=0
@inbounds for n in 1:N
ret=0
d=iter.dict
slots=d.slots
L=length(slots)
vals=d.vals
i=d.idxfloor
while i <= L
@inbounds if slots[i] === 0x1
@inbounds v=vals[i]
ret+=f(v)
end
i+=1
end
end
return ret
end
julia> @benchmark slow_entitlement_test_func(v->v,values(d))
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 2
--------------
minimum time: 6.702 μs (0.00% GC)
median time: 7.512 μs (0.00% GC)
mean time: 8.123 μs (0.00% GC)
maximum time: 128.705 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 5 Now get rid of function entitlement_test_func(f,iter::Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}},N=1)
ret=0
@inbounds for n in 1:N
ret=0
d=iter.dict
slots=d.slots
L=length(slots)
vals=d.vals
for i = d.idxfloor:L
@inbounds if slots[i] === 0x1
@inbounds v=vals[i]
ret+=f(v)
end
i+=1
end
end
return ret
end
julia> @benchmark entitlement_test_func(v->v,values(d))
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 2
--------------
minimum time: 979.353 ns (0.00% GC)
median time: 1.005 μs (0.00% GC)
mean time: 1.067 μs (0.00% GC)
maximum time: 8.688 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 17 |
@StefanKarpinski I think that iteration through a function skip_deleted(h::Dict, i)
L = length(h.slots)
@inbounds while i<=L && !isslotfilled(h,i)
i += 1
end
return i
end But have yet been able to be able to prove that. |
Makes sense. It could be changed to function skip_deleted(h::Dict, i)
L = length(h.slots)
@inbounds while true
if i >= L || isslotfilled(h, i)
return i
end
i += 1
end
end That should be provably non-infinite since |
The |
This gives a 2x improvement: julia> @benchmark iterate_test_func(iterate,v->v,values(d),1)
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 2
--------------
minimum time: 4.329 μs (0.00% GC)
median time: 4.949 μs (0.00% GC)
mean time: 5.340 μs (0.00% GC)
maximum time: 119.739 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 7
julia> @benchmark iterate_test_func(iterate,v->v.second,d,1)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 5.056 μs (0.00% GC)
median time: 6.266 μs (0.00% GC)
mean time: 6.722 μs (0.00% GC)
maximum time: 446.593 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 6
julia> function Base.skip_deleted(h::Dict, i)
L = length(h.slots)
for i= i:L
@inbounds if Base.isslotfilled(h,i)
return i
end
end
return L+1
end
julia> @benchmark iterate_test_func(iterate,v->v,values(d),1)
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 2
--------------
minimum time: 2.623 μs (0.00% GC)
median time: 2.839 μs (0.00% GC)
mean time: 3.083 μs (0.00% GC)
maximum time: 25.621 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 9
julia> @benchmark iterate_test_func(iterate,v->v.second,d,1)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 3.832 μs (0.00% GC)
median time: 5.146 μs (0.00% GC)
mean time: 5.316 μs (0.00% GC)
maximum time: 51.718 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 7 |
is the if statement in the following killing vectorization? function Base.skip_deleted(h::Dict, i)
L = length(h.slots)
@inbounds while i<=L
if Base.isslotfilled(h,i)
return i
end
i += 1
end
return i
end Because isn't it provably non-infinite? |
@StefanKarpinski @KristofferC @vchuravy function slow_entitlement_test_func(f,iter::Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}},N=1)
ret=0
@inbounds for n in 1:N
ret=0
d=iter.dict
slots=d.slots
L=length(slots)
vals=d.vals
i=d.idxfloor
while i <= L
#for i = d.idxfloor:L
@inbounds if slots[i] === 0x1
@inbounds v=vals[i]
ret+=f(v)
end
i+=1 # Also comment out for the for loop
end
end
return ret
end This actually the source of this whole issue. |
What happens if |
@vchuravy is there a way to tell the optimizer that it can't happen even if it appears it could? |
While I do not recommend this, there is always a way. function unsafe_inc(i::Int64)
Base.llvmcall("""
%i = add nsw i64 %0, 1
ret i64 %i
""", Int64, Tuple{Int64}, i)
end
function slow_entitlement_test_func(f,iter::Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}},N=1)
ret=0
@inbounds for n in 1:N
ret=0
d=iter.dict
slots=d.slots
L=length(slots)
vals=d.vals
i=d.idxfloor
while i <= L
#for i = d.idxfloor:L
@inbounds if slots[i] === 0x1
@inbounds v=vals[i]
ret+=f(v)
end
i = unsafe_inc(i)
end
end
return ret
end Since this is not a bug in Julia, but due to the defined semantics I am going to close this issue. |
There seems to be a problem when <
=
is used in awhile
loop, where the entire function containing the while loop is >5x slower than the while loop that uses<
.I think that this might be preventing optimizations on inline iterators as well, I can give more details on that as well if needed.
I used the following benchmark functions to identify the problem:
Results
The text was updated successfully, but these errors were encountered: