Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cartesian Product iteration #1917

Closed
GlenHertz opened this issue Jan 8, 2013 · 17 comments
Closed

Cartesian Product iteration #1917

GlenHertz opened this issue Jan 8, 2013 · 17 comments
Assignees

Comments

@GlenHertz
Copy link
Contributor

This is a follow-up to the discussion on the julia-users mailing list here.

There seemed to be good progress but I didn't see any code committed.

@JeffBezanson
Copy link
Member

In fact, I've been thinking of adding this to the language as follows:

for I in X...
    A[I...] = B[I...]
end

where X is an n-tuple of iterables. This expands to a nest of n for loops and delivers a tuple of elements in I. Other syntax/behavior ideas welcome.

@StefanKarpinski
Copy link
Member

I like both the idea and the syntax. People are inevitably going to ask for support for X being other kinds of collections. Is that feasible or can we not infer their size well enough?

@JeffBezanson
Copy link
Member

It is feasible, since after all we can't infer the size of every tuple anyway. You just want to avoid spurious use of non-tuples, like {1:n,1:n} instead of (1:n,1:n). This can be implemented by calling the equivalent of gen_cartesian_map.

@timholy
Copy link
Member

timholy commented Jan 11, 2013

I like it a lot. Maybe an interesting use case would be to consider what it would look like to compute the discrete Laplacian of a multidimensional array this way. Presumably you'd have to do iteration over the components of I?

@johnmyleswhite
Copy link
Member

+1

@timholy
Copy link
Member

timholy commented Jul 13, 2013

Would you be able to do math on the index tuples? For example, suppose I wanted to implement imfilter for arrays of arbitrary dimensions: I might want something like this: (edit: rats, stray keypress had me submit early, continuing to edit)

function imfilter(X,K)
    center = ceil([size(K)...]/2)
    Y = similar(X)
    for I in size(X)...    # Note: have to handle boundaries somehow!
        s = zero(eltype(X))
        for J in size(K)...
            s += K[J...]*X[I...+J...-center...]
        end
        Y[I...] = s
    end
end

@StefanKarpinski
Copy link
Member

It would be something like ranges(X)... not size(I)... since that would iterate a single time with I equal to size(I). Also: man, that's a lot of dots.

@JeffBezanson
Copy link
Member

We could define things like interior(X, 2) to get various useful index ranges for use with ....
A syntax like X[(I.+J.-center)...] might work.

@timholy
Copy link
Member

timholy commented Jul 14, 2013

Stefan, you're right of course.

Another thing to think about is that there may be powerful performance arguments to be able to reduce most array accesses to linear indexing. I seem to remember that 4d indexing was something like 20x slower than linear, when the accesses were in a pattern that allowed you to precalculate the outer dimensions. The imfilter example would probably be best implemented in terms of a list of coefficients and stride offsets.

But for boundary conditions you also have to keep track of the actual coordinates, and feed that into your linear indexing, so you do need cartesian iteration in parallel with linear indexing. If you imagine implementing reflecting, periodic, and NA boundary conditions (the latter meaning "just pretend any undefined elements don't exist, normalizing by the ones you have"), it seems like the sub2ind-style transformations could get fairly hairy. Certainly, this was a pain to handle in Grid.jl (which implements even more boundary conditions), although it eventually worked out.

@GunnarFarneback
Copy link
Contributor

For the record there's an imfilter implementation for arbitrary dimensions at
GunnarFarneback@2c3d599

You definitely want to do the inner loop over the kernel as a linear loop and in many cases handle boundary conditions with pre-padding.

@timholy
Copy link
Member

timholy commented Jul 15, 2013

A bit off-topic, but @GunnarFarneback, your imfilter is certainly better than the one in Images.jl. I'd welcome a pull request, if you're willing to share.

While linear indexing when you can do it will always be faster, I suspect my 20x number needs revision in light of e550406. Nice work, Jeff.

@GunnarFarneback
Copy link
Contributor

Sharing is no problem, deciding how to fit it in is more of one. But maybe it's time to get that implementation replaced.

Going back to topic, the bilateral filter may be a good concrete example of how to stress a Cartesian product language feature. In 2D a somewhat naive implementation looks like

function bilateral(in::Matrix{Float64}, kernel_size::Int,
                   sigma_s::Float64, sigma_i::Float64)
    k_s = -0.5 / sigma_s^2
    k_i = -0.5 / sigma_i^2
    kernel_radius = fld(kernel_size - 1, 2)
    out = similar(in)
    height = size(in, 1)
    width = size(in, 2)
    for x = 1:width
        for y = 1:height
            center_value = in[y, x]
            sum_w = 0.0
            sum = 0.0

            for dx = -kernel_radius:kernel_radius
                xi = x + dx
                if xi < 1 || xi > width
                    continue
                end
                for dy = -kernel_radius:kernel_radius
                    yi = y + dy
                    if yi < 1 || yi > height
                        continue
                    end

                    value = in[yi, xi]
                    diff = value - center_value
                    wi = exp(k_s * (dx^2 + dy^2) + k_i * diff^2)
                    sum_w += wi
                    sum += wi * value
                end
            end

            out[y, x] = sum / sum_w
        end
    end
    return out
end

It would be lovely if this could be written compactly for arbitrary dimensionality.

@timholy
Copy link
Member

timholy commented Jul 21, 2013

Just wanted to give another example here, showing how useful/necessary this will be for switching over to returning views.

It would be nice to write a sum function something like this:

function (+)(A::AbstractSubView, B::AbstractSubView)
    if size(A) != size(B)
        error("Dimension mismatch")
    end
    C = Array(promote(eltype(A), eltype(B)), size(A))
    for I in C...
        C[I...] = A[indexes(A,I.)...] + B[indexes(B,I.)...]
    end
    C
end

I suspect we won't be happy with the performance until this type of syntax generates code something like this:

# Implementation for 3d
function (+)(A::SubView, B::SubView)
    if size(A) != size(B)
        error("Dimension mismatch")
    end
    C = Array(promote(eltype(A), eltype(B)), size(A))
    s1 = size(C, 1)
    s2 = size(C, 2)
    strdC2 = s1
    strdC3 = s1*s2
    strdA2 = size(parent(A), 1)
    strdA3 = size(parent(A), 2)*strdA2
    strdB2 = size(parent(B), 1)
    strdB3 = size(parent(B), 2)*strdB2
    for i3 = 1:size(C, 3)
        o3 = (i3-1)*strdC2
        oA3 = (A.indexes[3][i3]-1)*strdA3
        oB3 = (B.indexes[3][i3]-1)*strdB3
        for i2 = 1:s2
            o2 = o3 + (i2-1)*strdC2
            oA2 = oA3 + (A.indexes[2][i2]-1)*strdA2
            oB2 = oB3 + (B.indexes[2][i2]-1)*strdB2
            for i1 = 1:s1
                C[i1+o2] = parent(A)[A.indexes[1][i1]+oA2] + parent(B)[B.indexes[1][i1]+oB2]
            end
        end
    end
end

Here I've not required that SubViews be strided, i.e., the indexes could be AbstractVectors.

@timholy
Copy link
Member

timholy commented Sep 8, 2013

Since some discussions have started focusing on post-0.2, let me chime in with an update on this issue (which I think is an important 0.3 milestone).

First, let me correct something I said immediately above: in recent julias, cartesian indexing of multidimensional arrays is completely viable. (I suspect it's been this way ever since Jeff's "hoist array size" fix for #3712.) So a lot of the "convert multidimensional indices into a linear index" gymnastics we've done in the past simply aren't necessary, in my view. That simplifies many things, and in particular makes a tuple-of-indexes (as Jeff proposes above) the obvious foundation for Cartesian iteration.

Second, my Cartesian package serves two purposes: (1) it lets me do the things I need to do now, and (2) it's a research project for figuring out what facilities we need to have in base. Therefore, let me provide an update. Cartesian is now feeling quite full-featured, and I'm finding its current incarnation allows a very wide range of algorithms to be written efficiently and easily. As Jeff pointed out, the code is quite hard to read (see the example below), but it is so useful that I have internal code where the majority of my functions make heavy use of these facilities. I should note that the README is completely out-of-date, so if you're interested in seeing what's up you'll want to check the code. (I'm not certain it's worth writing a nice README now because it would be outdated once tools such as these move into base, and the internal in-code comments should help a lot.) I should also add that there could probably be a bit of consolidating---different capabilities were added incrementally---but neither would I describe it as "crufty." In other words, we may want an equivalent of at least half of the macros in Cartesian.

One of the main surprises in developing and using Cartesian has been how important it is to have a rich interface for specifying what happens in different dimensions. By far the most significant innovation was "inlined expression parsing" which allows one to generate expressions like this (this is a fake and useless example, but I use real stuff similar to this all the time):

k = 5
for i3 = 2:size(A, 3)-1, i2 = 2:size(A,2)-1, i1 = 2:size(A,1)-1
    B[i1, i2, k] += A[i1, i2, i3]
end

using the following code (here N=3):

k = 5
@nloops $N i d->2:size(A,d)-1 begin
    (@nref $N B d->(d==$N)?k:i_d) += (@nref $N A i)
end

Those anonymous function-expressions get "inlined" at parsing time, so this is literally translated into the efficient version above.

Such tricks allow you to implement reductions, broadcasting, nearest-neighbor searches, etc almost trivially. (In particular, every example that's been mentioned in this issue so far is quite easy to implement.) Moreover, most of your code naturally ends up being cache-efficient, which as @lindahua has pointed out is the most performant way to perform reductions. Finally, the resulting algorithms are performant for both Arrays and SubArrays, often with speedups of 100x for the latter (and without a major gap compared to Array).

Without the anonymous function-expression inlining, I found that I was much more limited in terms of what I could do---in terms of my own algorithms, developing this very simple functionality was a big breakthrough. So I'd argue that a facility like this will be an important consideration for whatever ends up in base.

If you want to play with Cartesian at all, I've provided function-versions of all the macros to make it easy to see how things get translated (example: Cartesian._nloops(3, :i, :(d->(d==2)?7:k_d), :(f(A))).

@timholy
Copy link
Member

timholy commented Sep 8, 2013

I should also add that I'm not advertising for users of Cartesian; it's definitely an at-your-own-risk and could-change-API-without-warning kind of package (like I say, it's a research project), and one that I fervently hope gets outdated soon.

@timholy
Copy link
Member

timholy commented Oct 11, 2013

Mentioned on the mailing list, but posting here so it doesn't get lost between now and when people have time to start thinking about this again: a more flexible syntax is here. It allows the iterator ranges to be specified and introduces a new keyword with to indicate coupled iteration variables. (As shown in the Cartesian README, the latter is useful particularly for reductions and broadcasting, and can also be used to implement efficient mapslices behaviors.) It uses a generalized comprehension notation for most of this, see also #4470.

@timholy
Copy link
Member

timholy commented Mar 1, 2014

If we implement APL indexing (#5949), we'll need either this or #5395. That is, unless we restrict the indexes to something for which linear indexing is efficient. I think it's simply unfeasible otherwise.

A brief update on where we stand on this issue, relative to this gist (I'm excising JIT warmups, etc.):

julia> A = rand(500,500,30);

julia> @time sumloop(A,30)
elapsed time: 0.407903035 seconds (64 bytes allocated)
1.1249159436884916e8

julia> @time sumtuple(A,30)
elapsed time: 0.783183949 seconds (64 bytes allocated)
1.1249159436884916e8

Pretty impressive, although I suspect some would be loath to give up a factor of 2.

But the more relevant case reveals we're not there yet:

julia> @time sumtuple_gen(A,1)
elapsed time: 6.926610589 seconds (1740671904 bytes allocated)
3.749719812291454e6

More than 500-fold slower (note it's doing 30-fold less work).

@timholy timholy mentioned this issue Mar 2, 2014
timholy added a commit that referenced this issue Sep 20, 2014
waTeim pushed a commit to waTeim/julia that referenced this issue Nov 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants