Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The the permutedims compiling problem for high dimensional arrays #42486

Merged
merged 5 commits into from
Oct 5, 2021

Conversation

GiggleLiu
Copy link
Contributor

This PR fixes #42438. I do not know why it is more friendly to the compiler, I just some try and errors. Someone might want to look deeper into this issue (might be in LLVM according to @timholy 's comment), but we need to fix this issue due to the the need in some practical applications.

using Base.Cartesian
using Base: size_to_strides, checkdims_perm
using Random

for (V, PT, BT) in Any[((:N,), BitArray, BitArray), ((:T,:N), Array, StridedArray)]
    @eval @generated function newperm!(P::$PT{$(V...)}, B::$BT{$(V...)}, perm) where $(V...)
        quote
            checkdims_perm(P, B, perm)

            #calculates all the strides
            native_strides = size_to_strides(1, size(B)...)
            strides_1 = 0
            @nexprs $N d->(strides_{d+1} = native_strides[perm[d]])

            #Creates offset, because indexing starts at 1
            offset = 1 - sum(@ntuple $N d->strides_{d+1})

            sumc = 0
            ind = 1
            @nexprs 1 d->(counts_{$N+1} = strides_{$N+1}) # a trick to set counts_($N+1)
            @nloops($N, i, P,
                    d->(df_d=i_d*strides_{d+1} ;sumc += df_d), # PRE
                    d->(sumc -= df_d), # POST
                    begin # BODY
                        @inbounds P[ind] = B[sumc+offset]
                        ind += 1
                    end)

            return P
        end
    end
end


using Test

@testset "newperm" begin
    n=25
    t=randn(rand(1:2, n)...)
    perm = randperm(n)
    p = zeros(eltype(t), size.(Ref(t), (perm...,)));
    @time newperm!(p, t, perm);
    # 0.395072 seconds (520.17 k allocations: 20.894 MiB, 99.99% compilation time)
    @time permutedims!(p, t, perm);
    # 41.520901 seconds (502.11 k allocations: 20.155 MiB, 100.00% compilation time)
    @test newperm!(p, t, perm)  permutedims!(p, t, perm)
end

using BenchmarkTools
# high dim
n=25
t=randn(rand(2:2, n)...)
perm = randperm(n)
p = zeros(eltype(t), size.(Ref(t), (perm...,)))
@btime newperm!($p, $t, $perm)
# 179.814 ms (2 allocations: 96 bytes)
@btime permutedims!($p, $t, $perm)
# 201.038 ms (2 allocations: 96 bytes)

# low dim (make sure no performance regression)
t=randn(100, 200, 300)
perm = [3,2,1]
p = zeros(eltype(t), size.(Ref(t), (perm...,)))
@btime newperm!($p, $t, $perm)
# 19.203 ms (2 allocations: 96 bytes)
@btime permutedims!($p, $t, $perm);
# 19.339 ms (2 allocations: 96 bytes)

@GiggleLiu
Copy link
Contributor Author

(sorry for some unrelated commit history, whoever reviews this PR should use squash and merge)

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

base/multidimensional.jl Show resolved Hide resolved
@kshyatt kshyatt added arrays [a, r, r, a, y, s] performance Must go faster labels Oct 4, 2021
@timholy timholy merged commit 5b7bb08 into JuliaLang:master Oct 5, 2021
KristofferC pushed a commit that referenced this pull request Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unreasonable slow just in time compiling of permutedims when the array dimension is high
5 participants