Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support repeat at any dimension #29626

Merged
merged 8 commits into from
Oct 23, 2018
Merged

Support repeat at any dimension #29626

merged 8 commits into from
Oct 23, 2018

Conversation

johnnychen94
Copy link
Member

This solves issue #29614 as an enhancement

now repeat(ones(2,2), 2,2,2) won't raise BoundsError but instead return a Array{T,3} as intuition expects.

Benchmark:

julia> a = ones(100,100,100);
# before this patch
julia> @btime repeat(a, 2,2,2);
  81.971 ms (40 allocations: 114.44 MiB)
# after this patch
julia> @btime repeat(a, 2,2,2);
  82.136 ms (40 allocations: 114.44 MiB)

base/abstractarraymath.jl Outdated Show resolved Hide resolved
@mbauman
Copy link
Member

mbauman commented Oct 16, 2018

Welcome and thanks for the contribution!

@mbauman mbauman added the arrays [a, r, r, a, y, s] label Oct 16, 2018
base/abstractarraymath.jl Outdated Show resolved Hide resolved
@johnnychen94
Copy link
Member Author

Thanks for the reviewing @mbauman and sorry for the back-and-forth commits on the same code.

I am not very familiar with the index mechanisms of Julia Arrays; I thought 1 is equivalent to OneTo(1) in this case, but the test says no.

I was even thinking of using

R[axes(A)...,ones(Int, length(shape) - length(inner))...] = A

to replace

idxs = (axes(A)..., ntuple(n->OneTo(1), length(shape) - length(inner))...)
R[idxs...] = A

But not very sure if this works. (Actually I don't know how to fully test it in my own machine)

@mbauman
Copy link
Member

mbauman commented Oct 17, 2018

The best way to test changes like this locally is to use Revise.jl, but sometimes it's quickest/easiest to just @eval the code in the REPL. Just type @eval Base (where Base is the module in which the original code was defined) and then paste the new definition.

Sorry for the misdirection on the colons — of course we just want to be assigning into the first block. Using 1 instead of OneTo(1) does indeed behave the same in setindex! and does indeed work… you just got bit by a spurious network interruption in an unrelated test on CI. I do find 1 to be a bit simpler and clearer in intent, but it's not a big deal. More importantly, though, I think you want to just use ndims(R) - ndims(A) instead of the lengths of inner and shape. Could you add this test case?

repeat(ones(2,2), inner=(1, 1, 1), outer=(2, 2, 2))

With respect to your question about ones: the challenge is doing these index manipulations in a way that is type-stable — that is, in a way where Julia can infer the length at compile time. Splatting an array like ones(…) into a tuple is definitely not what you want in that case as it allocates an array and Julia doesn't store array lengths in the type system. Using ntuple is better — especially if we can use an inferable constant for the second argument… but that subtraction isn't inferable. It looks like this whole function could use a bit of hand optimizing… so making this better here can be future work. Also good future work could be making this work in cases where there is an inner repeat.

@johnnychen94
Copy link
Member Author

johnnychen94 commented Oct 18, 2018

Thanks for this explanation, this helps me a lot.

With regard to optimization, I did some related benchmark tests as a reference (kind of interesting to me). This benchmark says, at least, we can optimize repeat for some special cases.

# Baseline
# without this patch
julia> @btime repeat(ones(100,100), 20, 20); # 2000×2000 Array{Float64,2}
  8.828 ms (4 allocations: 30.59 MiB)

# without this patch (slightly faster than Baseline)
julia> @btime reshape(repeat(ones(100,100),20,20),2000,2000,1,1,1); # 2000×2000×1×1×1 Array{Float64,5}
  8.590 ms (6 allocations: 30.59 MiB)

# without this patch
julia> @btime repeat(reshape(ones(100,100),100,100,1,1,1),20,20,1,1,1); # 2000×2000×1×1×1 Array{Float64,5}
66.962 ms (3625 allocations: 61.17 MiB)

# with this patch
julia> @btime repeat(ones(100,100), 20, 20,1,1,1); # 2000×2000×1×1×1 Array{Float64,5}
70.049 ms (3625 allocations: 61.17 MiB)

I'll tell you when I have a solution to support this feature without loss much performance. (I have some other jobs to do, so no sooner than Nov)
But still, I think it's okay to merge this now and recreate a new PR when we find a way to optimize this.

@johnnychen94
Copy link
Member Author

johnnychen94 commented Oct 20, 2018

ping @mbauman I think this is good to merge. Can you have a quick look?

In the meantime, I'm wondering if we can support partial indexing

julia> a = ones(5,5,5);
julia> a[1] # this works, indexing by memory order
julia> a[1,1] # this doesn't work though, can we support this?

If this feature can be supported, then this patch becomes unnecessary. And other similar operations can be done without code changing.

Perhaps it's because supporting this would make the performance terrible? or because this would introduce ambiguity with a[1,1,:]?

@mbauman
Copy link
Member

mbauman commented Oct 23, 2018

Yes, you're exactly right and this did indeed work previously. It had the somewhat strange behavior that the trailing index was always allowed to "linearly span" (working through memory order for plain old Array) over any omitted dimensions. This ran us into trouble in a number of situations, including consistency with offset arrays, the meaning of the end keyword, and expectations of bounds errors. For more details see: #14470, #20079, and #21750. We now have the following rules:

  • Indexing with one index is linear indexing
  • Indexing with less than ndims indices is only supported if the dimensions omitted are all "singleton" (e.g., length 1)
  • Indexing with more than ndims indices is only supported if the extra dimensions provided are all selecting the first index (e.g., 1)

In other words, you've found and cleaned up some of my own dirty laundry — thank you!

@mbauman mbauman merged commit 84024a1 into JuliaLang:master Oct 23, 2018
@johnnychen94
Copy link
Member Author

Cool, I'll keep this in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants