faster iteration over subarrays (views) #48720

matthias314 · 2023-02-18T22:12:13Z

Iterating over a SubArray (view) of an ~~integer-valued~~ array is often much slower than iterating over the array itself:

f(v) = foldl(+, v; init = zero(eltype(v)))

a = rand(Int8, 2^16);
v = view(a, :);

julia> @btime f($a)
  570.860 ns (0 allocations: 0 bytes)
julia> @btime f($v)
  34.613 μs (0 allocations: 0 bytes)

a = rand(Int, 2^16);
v = view(a, :);

julia> @btime f($a)
  7.957 μs (0 allocations: 0 bytes)
julia> @btime f($v)
  34.741 μs (0 allocations: 0 bytes)

My understanding is that the index transformation necessary for accessing the view prevents the compiler from optimizing the code via SIMD instructions. This PR addresses this by a dedicated iterate method for FastContiguousSubArray. This is the (already existing) subtype of SubArray for which the indices in the parent array are known to be contiguous and increasing; see subarrays.jl for details. For example, view(a, :,:,2:3,4) is of this type, but view(a, 1,:,2:3,4) and view(a, 3:-1:2,1,1,1) are not (although they are of type FastSubArray, meaning that they allow fast linear indexing into the parent array). It's hard to imagine SIMD speed-ups for other subarrays.

In the following benchmarks (made with @btime f($v) etc.), the function f from above is applied to the arrays

v = rand(Int8, 2^16);
u = codeunits(String(rand('a':'z', 2^16)));
a = rand(Int8, 32,32,32,32)

and views into them.

function call	master	this PR
`f(v)`	567.227 ns
`f(view(v,:))`	34.615 μs	563.778 ns
`f(u)`	576.681 ns
`f(view(u,:))`	34.613 μs	581.473 ns
`f(a)`	15.847 μs
`f(view(a, 2:3,1,1,1))`	3.545 ns	4.084 ns
`f(view(a, :,2:3,1,1))`	36.684 ns	10.300 ns
`f(view(a, :,:,2:3,1))`	1.091 μs	25.071 ns
`f(view(a, :,:,:,2:3))`	34.610 μs	572.445 ns

One can see that with the new method, iterating over a view is as fast as iterating over the parent array.

EDIT: Using @fastmath one can obtain analogous speed-ups for floating-point arrays:

g(v) = @fastmath foldl(+, v; init = zero(eltype(v)))

a = rand(Float64, 2^16);
v = view(a, :);

julia> @btime g($a)
  7.963 μs (0 allocations: 0 bytes)

julia> @btime g($v)
  69.192 μs (0 allocations: 0 bytes)   # Julia 1.9.0-beta2
  7.967 μs (0 allocations: 0 bytes)    # this PR

Julia Version 1.10.0-DEV.635
Commit 12d329b6e3 (2023-02-17 20:01 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores

N5N3 · 2023-02-19T02:46:12Z

We just need @inbounds to make SIMD works in this case.
LLVM is clever enough to optimize the index transformation away.
But it's hard for it to prove the access is inbounds as SubArray store's it's own indices.

Seelengrab · 2023-02-19T08:08:00Z

Does SIMDing in foldl preserve the foldl order of operations? I'm not sure we want to use @fastmath by default, since it allows reassociations and can lead to very surprising differences in results.

jakobnissen · 2023-02-19T11:23:37Z

Closes #43295.
Could this be generalized to views with integer StepRange indices, such that it also works for @view a[1:2:5] (i.e FastSubArray not necessary contiguous)?

Also, this introduces inbounds into a view that may wrap any AbstractArray, including one with a mis-implementation of length. I don't think this is safe. Can you check if LLVM can omit the boundscheck automatically?

Thank you for the contribution!

matthias314 · 2023-02-19T18:12:36Z

Note that I'm using foldl just as an example for an iteration.

We just need @inbounds to make SIMD works in this case.

@N5N3 Are you sure? On master the timings for

f(v) = @inbounds foldl(+, v; init = zero(eltype(v)))

are the same as those without @inbounds.

Also, this introduces inbounds into a view that may wrap any AbstractArray, including one with a mis-implementation of length. I don't think this is safe.

@jakobnissen It's only unsafe for a view into some T <: AbstractArray if getindex is defined for T with @propagate_inbounds. I would argue that one should only use @propagate_inbounds if one is sure that the implementation is correct.

Can you check if LLVM can omit the boundscheck automatically?

If I drop the @inbounds in iterate, I get the following timings:

n = 2^16

v = rand(Int8, n);
w = view(v, :);

julia> @btime f($v)
  568.296 ns (0 allocations: 0 bytes)
julia> @btime f($w)
  21.947 μs (0 allocations: 0 bytes)

s = String(rand('a':'z', n));
u = codeunits(s);
v = view(u, :);

julia> @btime f($u)
  562.832 ns (0 allocations: 0 bytes)
julia> @btime f($v)
  662.285 ns (0 allocations: 0 bytes)

Note that for CodeUnits they are very similar, but not for Vector. For Vector I can still see

movabs	rax, offset ijl_bounds_error_ints

in @code_native f(w).

Could this be generalized to views with integer StepRange indices, such that it also works for @view a[1:2:5] (i.e FastSubArray not necessary contiguous)?

One problem I see is that the step value would not be known at compile time. More importantly, are there SIMD instructions that can handle non-contiguous data?

jakobnissen · 2023-02-19T19:22:25Z

Good point regarding safety, and indeed it looks like LLVM can't omit boundscheck.

It's true that the step range would not be known at compile time (except for contiguous arrays where the step size is zero). But you would still have the benefit of inbounds. And it means you can potentially use a single implementation for contiguous and non-contiguous views with no runtime cost.

matthias314 · 2023-02-19T23:14:27Z

you can potentially use a single implementation for contiguous and non-contiguous views with no runtime cost

For that one needs to change iterate to something linke

function Base.iterate(v::Base.FastSubArray,
        state = (v.offset1+v.stride1*firstindex(v), v.offset1+v.stride1*lastindex(v)))
    i, l = state
    @inbounds i-1 < l ? (v.parent[i], (i+v.stride1, l)) : nothing
end

Then there is much smaller performance gain for FastContiguousSubArray,

n = 2^16; v = rand(Int8, n); w = view(v, :);

julia> @btime f($v)
  565.796 ns (0 allocations: 0 bytes)
julia> @btime f($w)
  34.617 μs (0 allocations: 0 bytes)   # master
  21.928 μs (0 allocations: 0 bytes)   # with iterate as above

For FastSubArray, one gets as fast as mapfoldl in this case:

r = 2:2:length(v); w = view(v, r);   # v as above

julia> @btime mapfoldl(i -> @inbounds($v[i]), +, $r; init = zero(eltype($v)))
  10.963 μs (0 allocations: 0 bytes)
julia> @btime f($w)
  190.195 μs (0 allocations: 0 bytes)   # master
  10.980 μs (0 allocations: 0 bytes)    # with iterate as above

However, one could achieve the same by adding @inbounds to the iterate method for AbstractArray, which is the one currently used for SubArray. (I'm omitting code and benchmark here.) That idea was discussed in #40397 and not implemented in the end. In light of the above performance difference, one might have wished for a different outcome.

N5N3 · 2023-02-20T01:11:35Z

are the same as those without @inbounds.

I'm sure. Non of foldl and iterate propagates @inbounds thus @inbounds foldl won't omit the bounds check.
To make this work, you need:

struct SafeArray{T,N,P<:AbstractArray{T,N}} <: AbstractArray{T,N}
    parent::P
end
@inline Base.size(x::SafeArray) = Base.size(x.parent)
@inline Base.getindex(x::SafeArray, is...) = @inbounds x.parent[is...]

Then you will get

julia> @btime f($a)
  244.975 ns (0 allocations: 0 bytes)
157838443398623711

julia> @btime f($v)
  1.480 μs (0 allocations: 0 bytes)
157838443398623711

julia> @btime f(SafeArray($v))
  249.086 ns (0 allocations: 0 bytes)
157838443398623711

In fact, you can also try this

function f_c(v)
    eachindex(v) == eachindex(parent(v)) || throw("") # inbounds hit for LLVM
    foldl(+, v; init = zero(eltype(v)));
end

At least on master I get

julia> @btime f($v)
  1.880 μs (0 allocations: 0 bytes)
157838443398623711

julia> @btime f_c($v)
  247.880 ns (0 allocations: 0 bytes)
157838443398623711

jakobnissen · 2023-02-20T06:51:59Z

@matthias314 When I use the function stride(v, 1) instead of accessing v.stride as a field, for me it constant folds, and I get maximal performance from the contiguous arrays, while also being as fast for non-contiguous arrays. Here is the function:

function Base.iterate(v::Base.FastSubArray,
    state = (v.offset1+stride(v,1)*firstindex(v), v.offset1+stride(v,1)*lastindex(v)))
    i, l = state
    @inbounds i-1 < l ? (v.parent[i], (i+stride(v,1), l)) : nothing
end

And the benchmarks:

f(v) = foldl(+, v; init = zero(eltype(v)))
a = rand(Int8, 2^16);
v = view(a, :);

julia> @btime f($a);
  485.785 ns (0 allocations: 0 bytes)

julia> @btime f($v);
  489.887 ns (0 allocations: 0 bytes)

N5N3 · 2023-02-20T07:20:29Z

stride1 for FastSubArray is specific for the parent while stride(_, 1) is specific for the memory layout.
They can be different and strides might be undefined for FastSubArray if its parent has no strides defined.

BTW, I don't think this PR could be finished with out the added @inbounds.
As #40397 was closed with "this is too unsafe", I'm not sure is there a way to make this PR "safer".
Perhaps we can limit the function sig to StridedArray, which should cover most of the simdable cases, but the concern from #40397 (comment) remains unresolved.

matthias314 · 2023-02-20T15:29:56Z

Thanks a lot for your suggestions! Some of them seem to require end-users to rewrite their code (which was not what I had in mind), but they are all very instructive.

I agree that nothing can be done without using @inbounds, and that may not fly in general. However, I'm reluctant to give up completely. For elements of type Int8, we are talking about potential speed-up factors of 60 (FastContiguousSubArray) or 17 (FastSubArray and I believe SubArray in general). That seems to much to let go of it. At least for views into Array and maybe other standard types one should do something. What do you think?

jakobnissen · 2023-02-20T15:51:33Z

I agree 100%, this is a good contribution and will really help. If you restrict the type signature to only views of Array with either an AbstractUnitRange or AbstractStepRange, then most real-life cases will be covered.

matthias314 · 2023-02-21T19:13:18Z

Here is a new version. It's now safe (that is, with bound checks), but still as fast as iterating over the parent array in many cases: For a FastContiguousSubArray this holds if it is known to be 1-based. For any other FastSubArray a it holds if additionally stride(a,1) can be determined at compile-time (@jakobnissen's idea). Here are some benchmarks:

n = 2^16; m = 2^8

v = rand(Int8, n)
u = codeunits(String(rand('a':'z', n)))
ov = OffsetVector(v, -n:-1)
oa = OffsetMatrix(rand(Int8, m, m), -m:-1, -m:-1)

f(v) = foldl(+, v; init = zero(eltype(v)))

In each case, f is applied to the given array.

array	without view	view (master)	view (this PR)
`view(v, :)`	573.613 ns	34.615 μs	571.708 ns
`@view v[2:2:end]`	10.962 μs	190.205 μs	10.980 μs
`view(u,:)`	577.000 ns	27.047 μs	567.232 ns
`view(ov,:)`	573.751 ns	40.696 μs	34.628 μs
`view(oa,:,:)`	578.016 ns	34.615 μs	588.478 ns

The reason that OffsetVector is so much slower than a higher-dimensional OffsetArray is that in the higher-dimensional case the linear indexing is always 1-based while for a vector it is the given indexing. Also note that without a separate function for FastContiguousSubArray the runtime for a higher-dimensional OffsetArray oa would degrade since stride(oa,1) cannot be determined ahead of time. (OffsetArray is not a subtype of StridedArray.)

With @tkf's proposal #40397 the iteration over all views above would be as fast as over their parent arrays, but I don't know how to achieve this while keeping iterate safe.

N5N3 · 2023-02-22T03:18:49Z

Nice result.
I also make some trial locally, and the following version works well.

@inline function iterate(v::FastSubArray)
    # The cached `stride1` might loss the const info.
    st = compute_stride1(parent(v), v.indices)
    i = v.offset1 + st * firstindex(v)
    l = v.offset1 + st * lastindex(v) + st
    return iterate(v, (i, st, l))
end

@inline function iterate(v::FastSubArray, (i, st, l))
    flipsign(i, st) >= flipsign(l, st) && return nothing
    return v.parent[i], (i + st, st, l)
end

Just wondering why LLVM prefer this version though.
Compared with the AbstractArray fallback, it just forward the boundcheck to parent.

The lower IR shows that LLVM keeps the boundcheck block in the non-simd brach correctly, and it also generates a simd-branch.
If other guys can confirm the benefit, I think we can switch to this version and run BaseBenchmark.

Edit: The problem of 1d OffsetArray could be fixed with the added checkindex hack

checkindex(::Type{Bool}, ind::OneTo{T}, I::T) where {T<:BitSigned} = unsigned(I-1) < last(ind) # make wrapped Array happy

With that we have

n = 2^10; m = 2^5
v = rand(Int, n)
ov = OffsetVector(v, -n:-1)
oa = OffsetMatrix(rand(Int, m, m), -m:-1, -m:-1)
f(v) = foldl(+, v; init = zero(eltype(v)))
@btime f($v);                   #65.000 ns (0 allocations: 0 bytes)
@btime f($(view(v, 1:n)));      #67.041 ns (0 allocations: 0 bytes)
@btime f($(view(ov, -n:-1)));   #67.959 ns (0 allocations: 0 bytes)
@btime f($(view(oa, 1:n)));     #65.816 ns (0 allocations: 0 bytes)

matthias314 · 2023-02-22T13:48:22Z

Great! Compared to the current version of the PR, I get the following for the examples I posted above:

view(v, :), view(u, :), view(oa,:,:): minor (5%-10%) slowdown
@view v[2:2:end]: 50% slowdown
@view v[end:-1:1]: 30% speedup (35.761 μs compared to 51.903 μs; mapfoldl(i -> @inbounds(v[i]), +, length(v):-1:1; init = zero(eltype(v))): 21.906 μs)
view(ov,:): dramatic speedup, only 15% slower than iterating over ov

(Here I'm using the modified checkindex.)

N5N3 · 2023-02-22T14:04:46Z

The enlarged performance gap on Int8 is not that supprising to me.
As the unvectorlized tailing would be much longer than Int64. (127 vs 15)
Since my version can't omit the boundcheck in non-simd branch, the impact could not be neglected.
~~Although on my PC, the gap is much smaller. (Perhaps your PC support AVX512 ? )~~

julia> v = rand(Int8, 1 + 2^16);

julia> @btime f($v);
  496.373 ns (0 allocations: 0 bytes)

julia> @btime f($(view(v, 1:n)));
  543.548 ns (0 allocations: 0 bytes)

julia> @btime f($(view(v, 1:n+1)));
  504.688 ns (0 allocations: 0 bytes)

vtjnash · 2023-02-22T15:50:17Z

Just a naming nit: C-style array are not SafeArray, since C is not a safe language, they are UnsafeArray (or UnsafeView, which exists currently)

N5N3 · 2023-02-22T16:40:03Z

Update the latest benchmark:

ov = OffsetArray(v, -n-1);
oa = OffsetArray(reshape(v, isqrt(n), :), 0, 0);
the result is the slow down ratio compared with UnSafeArray

	`n = 2^16; v = rand(Int8, n);`	`n = 2^10; v = rand(Int, n);`
`v`	99.8%	99.4%
`view(v, 1:n)`	110%	102%
`view(v, 2:2:n)`	130%	125%
`view(v, n:-1:1)`	130%	127%
`view(ov, -n:-1)`	108%	99%
`view(oa, :, :)`	109%	101%

N5N3 · 2023-02-22T17:03:50Z

@nanosoldier runbenchmarks("array", vs=":master")

base/subarray.jl

nanosoldier · 2023-02-22T20:37:13Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

N5N3 · 2023-02-23T01:52:44Z

The Benchmark looks good. But some of the bench suite seems broken after this change.

["array", "index", "sub2ind"]

function perf_sub2ind(sz, irange, jrange, krange)
    linear = LinearIndices(sz)
    s = 0
    for k in krange, j in jrange, i in irange
        ind = linear[i, j, k]
        s += ind
    end
    s
end
perf_sub2ind((1000,1000,1000), 1:1000, 1:1000, 1:1000)

The problem here is perf_sub2ind get vectorlized after the checkindex change. But I'm not sure if this is wanted.

["array", "index", ("mapr_access", "BaseBenchmarks.ArrayBenchmarks.ArrayLF{Float32, 2}")]

function perf_mapr_access(A, B, zz, n) #20517
    @inbounds for j in 1:n
        B[j] = mapreduce(k -> A[j,k]*A[k,j], +, 1:j; init=zz)
    end
    B
end

The problem here is k -> A[j,k]*A[k,j] is not inlined by default.
We can fix this by unsigned(I-1) < unsigned(last(ind)), but it still increases the inline cost (from 3 to 4 for OneTo{Int})
And the result might be not correct for illed OneTo

julia> a = reinterpret(Base.OneTo{Int}, [-10])[]
Base.OneTo(-10)

julia> Base.checkindex(Bool, a, 2)
true

matthias314 · 2023-02-23T02:44:06Z

Although we have already moved forward quite a bit, I would like to ask a basic question: The whole point of this endeavor is to make iterate safe. Otherwise we could just use @inbounds in iterate(a::AbstractArray) (as suggested in #40397) and get what seems to be optimal performance.

Instead of addressing this for each type separately (FastSubArray in our case), one could think of more general approaches. One would be to extend the @inbounds / @propagate_inbounds mechanism to iterate, see #40397 (comment). That seems to me a very clean solution, but one might have to change quite a few method definitions to realize it.

Another approach would be to use @N5N3's idea of ~~SafeArray~~ UnsafeArray (see above). For this type, getindex always uses @inbounds. Defining

unsafe_iter(x) = x
unsafe_iter(x::AbstractArray) = UnsafeArray(x)

one can now replace an iterator iter by unsafe_iter(iter) in any function one wants to speed up. For example, doing so for mapfoldl would already go a long way since many other functions rely to it. One could also think of generators (Generator, zip etc.) and of the lowering of for loops. I've tried out mapfoldl, Generator and zip and got excellent performance for views, with just a few lines of code. Am I overlooking any problems with this approach, or has this already been discussed and rejected?

N5N3 · 2023-02-23T03:19:25Z

Am I overlooking any problems with this approach, or has this already been discussed and rejected?

I think users are free to use unsafe_iter in their code if they can ensure safety.

As for iterate with @inbounds / @propagate_inbounds, I think we wont have it in recent Base as they have been discussed in #40397.

matthias314 · 2023-02-23T03:34:30Z

I think users are free to use unsafe_iter in their code if they can ensure safety.

But why not have it in Base? As far as I understand the discussion in #40397, the concern with @inbounds inside iterate was that the function may be fed with an invalid state. With unsafe_iter encapsulated inside a function from Base, this would not be possible -- assuming that the function is correctly implemented, of course. If one has doubts about that, then one wouldn't use unsafe_iter in this function, analogously to the use or non-use of @inbounds.

N5N3 · 2023-02-23T05:19:52Z

assuming that the function is correctly implemented.

Then there's no difference with @propagate_inbounds + @inbounds?

In my opinion, unsafe_iter is more like a performance hack rather than a stable api, thus a package might be a better choise.
For personall usage, I tend to put these code in the startup file as it's really small.

N5N3 · 2023-02-23T08:07:11Z

@nanosoldier runbenchmarks("array", vs=":master")

nanosoldier · 2023-02-23T11:44:13Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

matthias314 · 2023-02-23T13:30:12Z

Then there's no difference with @propagate_inbounds + @inbounds?

The difference would only be in the way it's implemented. I still think that the cleanest way would be to use @propagate_inbounds for all iterators one wants to speed up. This would be completely in line with getindex and setindex!. However, I fear that this idea would be rejected because it requires a lot of code changes. (I didn't see an explicit reason against this in #40397.) EDIT: Maybe I'm overestimating the necessary changes here.

I believe that with unsafe_iter the changes would be much smaller: One defines UnsafeArray as well as unsafe_iter for each iterator one wants to speed up (including things like Generator and Zip) plus a call to unsafe_iter in any function one wants to speed up. I don't think there are many of them. You may argue that it's a hack, but it may have better chances to be realized than the @inbounds approach. Or am I wrong here?

The main advantage of both approaches compared to the one we are working on here is that it frees us from coming up with rather tricky code for each type that we want to speed up.

N5N3 · 2023-02-24T03:07:48Z

At a second thought, what we need here might not be the iterate specialization.
I tried to make Array use the default iterate fallback and it's simdable.
Theoretically, the default fallback is equivalent to for i in eachindex(...) in foldl and LLVM should be able to omit the boundcheck code.
I tried to play with the checkindex hack on 1.9 beta1

julia> f(v) = foldl(+, v; init = zero(eltype(v)))
f (generic function with 1 method)

julia> n = 2^10; v = rand(Int, n); ov = OffsetArray(v, -1-n); vv = view(v, 1:n); sv = reshape(vv, :, 2^5); vov = view(ov, :); vov2 = view(ov, -n:-1);

julia> for h in (v, ov, vv, sv, vov, vov2)
       println(typeof(h))
       @btime f($h)
       end
Vector{Int64}
  61.601 ns (0 allocations: 0 bytes)
OffsetVector{Int64, Vector{Int64}}
  59.980 ns (0 allocations: 0 bytes)
SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}
  374.879 ns (0 allocations: 0 bytes)
Base.ReshapedArray{Int64, 2, SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}, Tuple{}}
  471.939 ns (0 allocations: 0 bytes)
SubArray{Int64, 1, OffsetVector{Int64, Vector{Int64}}, Tuple{Base.Slice{OffsetArrays.IdOffsetRange{Int64, Base.OneTo{Int64}}}}, true}
  553.476 ns (0 allocations: 0 bytes)
SubArray{Int64, 1, OffsetVector{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}
  373.430 ns (0 allocations: 0 bytes)

julia> Base.checkindex(::Type{Bool}, ind::Base.OneTo{T}, I::T) where {T<:Base.BitInteger} = unsigned(I - one(I)) < unsigned(last(ind))

julia> Base.checkindex(::Type{Bool}, ind::Base.IdentityUnitRange, I::Real) = checkindex(Bool, ind.indices, I) # needed for OffsetArray

julia> for h in (v, ov, vv, sv, vov, vov2)
       println(typeof(h))
       @btime f($h)
       end
Vector{Int64}
  61.546 ns (0 allocations: 0 bytes)
OffsetVector{Int64, Vector{Int64}}
  60.020 ns (0 allocations: 0 bytes)
SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}
  63.646 ns (0 allocations: 0 bytes)
Base.ReshapedArray{Int64, 2, SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}, Tuple{}}
  63.710 ns (0 allocations: 0 bytes)
SubArray{Int64, 1, OffsetVector{Int64, Vector{Int64}}, Tuple{Base.Slice{OffsetArrays.IdOffsetRange{Int64, Base.OneTo{Int64}}}}, true}
  68.641 ns (0 allocations: 0 bytes)
SubArray{Int64, 1, OffsetVector{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}
  63.265 ns (0 allocations: 0 bytes)

I haven't check more AbstractArray types but looks like it works quite well.

Of course view(v, 2:2:n) is another story, as length(::StepRange) is not inlined by default.
For now user could use view(v, 2:static(2):n) via Static.jl to make it inlined.
And in future this problem should be solved with #47844

the `checkindex` hack is based on the assumption that `last(ind) >= 0` remove `iterate` specialization.

N5N3 · 2023-02-24T03:30:15Z

@nanosoldier runbenchmarks("array", vs=":master")

nanosoldier · 2023-02-24T07:03:51Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

NHDaly · 2023-08-10T17:39:38Z

Marking this PR as backport to 1.9, to fix #50831, since this PR was required in order for #48933 to be perf-neutral, but #48933 was previously backported to 1.9 without this one.

This change simplifies the boundscheck in loop as LLVM would lift the const subtraction. Simd block would be generated in more cases. Co-authored-by: N5N3 <[email protected]> (cherry picked from commit 1543cdd)

added iterate method for FastContiguousSubArray

5be83f9

dkarrasch added performance Must go faster arrays [a, r, r, a, y, s] iteration Involves iteration or the iteration protocol labels Feb 19, 2023

made iterate method safe

70d1d95

matthias314 force-pushed the m3/iterate-subarray branch from 6062a58 to 70d1d95 Compare February 21, 2023 18:48

fixed problem with negative stride1

3b2d9b0

oscardssmith reviewed Feb 22, 2023

View reviewed changes

base/subarray.jl Outdated Show resolved Hide resolved

matthias314 commented Feb 22, 2023

View reviewed changes

base/subarray.jl Outdated Show resolved Hide resolved

N5N3 force-pushed the m3/iterate-subarray branch 2 times, most recently from 78210bf to 7fa7e09 Compare February 23, 2023 05:34

N5N3 force-pushed the m3/iterate-subarray branch from 7fa7e09 to 370d946 Compare February 24, 2023 03:21

switch to checkindex optimization.

bb84177

the `checkindex` hack is based on the assumption that `last(ind) >= 0` remove `iterate` specialization.

N5N3 force-pushed the m3/iterate-subarray branch from 370d946 to bb84177 Compare February 24, 2023 03:27

N5N3 merged commit 1543cdd into JuliaLang:master Feb 24, 2023

N5N3 mentioned this pull request Feb 24, 2023

iterate on SubArray uses slow default fallback #43295

Open

matthias314 mentioned this pull request Feb 27, 2023

faster folding operations for "reversed" views #48803

Open

matthias314 deleted the m3/iterate-subarray branch March 5, 2023 00:36

NHDaly mentioned this pull request Aug 10, 2023

Performance regression in vcat(::Vector, ::Number) in v1.9 #50831

Closed

NHDaly added the backport 1.9 Change should be backported to release-1.9 label Aug 10, 2023

IanButterworth mentioned this pull request Aug 19, 2023

[release-1.9] Backports for Julia 1.9 #50977

Merged

31 tasks

IanButterworth removed the backport 1.9 Change should be backported to release-1.9 label Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster iteration over subarrays (views) #48720

faster iteration over subarrays (views) #48720

matthias314 commented Feb 18, 2023 •

edited

Loading

N5N3 commented Feb 19, 2023

Seelengrab commented Feb 19, 2023

jakobnissen commented Feb 19, 2023 •

edited

Loading

matthias314 commented Feb 19, 2023

jakobnissen commented Feb 19, 2023

matthias314 commented Feb 19, 2023

N5N3 commented Feb 20, 2023 •

edited

Loading

jakobnissen commented Feb 20, 2023

N5N3 commented Feb 20, 2023 •

edited

Loading

matthias314 commented Feb 20, 2023 •

edited

Loading

jakobnissen commented Feb 20, 2023

matthias314 commented Feb 21, 2023 •

edited

Loading

N5N3 commented Feb 22, 2023 •

edited

Loading

matthias314 commented Feb 22, 2023

N5N3 commented Feb 22, 2023 •

edited

Loading

vtjnash commented Feb 22, 2023

N5N3 commented Feb 22, 2023

N5N3 commented Feb 22, 2023

nanosoldier commented Feb 22, 2023

N5N3 commented Feb 23, 2023 •

edited

Loading

matthias314 commented Feb 23, 2023

N5N3 commented Feb 23, 2023

matthias314 commented Feb 23, 2023

N5N3 commented Feb 23, 2023

N5N3 commented Feb 23, 2023

nanosoldier commented Feb 23, 2023

matthias314 commented Feb 23, 2023 •

edited

Loading

N5N3 commented Feb 24, 2023 •

edited

Loading

N5N3 commented Feb 24, 2023

nanosoldier commented Feb 24, 2023

NHDaly commented Aug 10, 2023

faster iteration over subarrays (views) #48720

faster iteration over subarrays (views) #48720

Conversation

matthias314 commented Feb 18, 2023 • edited Loading

N5N3 commented Feb 19, 2023

Seelengrab commented Feb 19, 2023

jakobnissen commented Feb 19, 2023 • edited Loading

matthias314 commented Feb 19, 2023

jakobnissen commented Feb 19, 2023

matthias314 commented Feb 19, 2023

N5N3 commented Feb 20, 2023 • edited Loading

jakobnissen commented Feb 20, 2023

N5N3 commented Feb 20, 2023 • edited Loading

matthias314 commented Feb 20, 2023 • edited Loading

jakobnissen commented Feb 20, 2023

matthias314 commented Feb 21, 2023 • edited Loading

N5N3 commented Feb 22, 2023 • edited Loading

matthias314 commented Feb 22, 2023

N5N3 commented Feb 22, 2023 • edited Loading

vtjnash commented Feb 22, 2023

N5N3 commented Feb 22, 2023

N5N3 commented Feb 22, 2023

nanosoldier commented Feb 22, 2023

N5N3 commented Feb 23, 2023 • edited Loading

matthias314 commented Feb 23, 2023

N5N3 commented Feb 23, 2023

matthias314 commented Feb 23, 2023

N5N3 commented Feb 23, 2023

N5N3 commented Feb 23, 2023

nanosoldier commented Feb 23, 2023

matthias314 commented Feb 23, 2023 • edited Loading

N5N3 commented Feb 24, 2023 • edited Loading

N5N3 commented Feb 24, 2023

nanosoldier commented Feb 24, 2023

NHDaly commented Aug 10, 2023

matthias314 commented Feb 18, 2023 •

edited

Loading

jakobnissen commented Feb 19, 2023 •

edited

Loading

N5N3 commented Feb 20, 2023 •

edited

Loading

N5N3 commented Feb 20, 2023 •

edited

Loading

matthias314 commented Feb 20, 2023 •

edited

Loading

matthias314 commented Feb 21, 2023 •

edited

Loading

N5N3 commented Feb 22, 2023 •

edited

Loading

N5N3 commented Feb 22, 2023 •

edited

Loading

N5N3 commented Feb 23, 2023 •

edited

Loading

matthias314 commented Feb 23, 2023 •

edited

Loading

N5N3 commented Feb 24, 2023 •

edited

Loading