Fix findmax and findmin with iterables #14086

nalimilan · 2015-11-21T17:12:06Z

The state is not guaranteed to be equivalent to the index of the element.
Fixes #14085

I wonder whether iterstate could be removed, as it doesn't appear to be used anywhere after this change, and it's potentially misleading.

tkelman · 2015-11-21T18:45:08Z

ref #13203, @timholy is this okay?

~~also you'll need to rebase any pr's you opened in the last day to fix appveyor~~ nevermind looks okay now

timholy · 2015-11-21T21:12:59Z

base/array.jl

        iold = i
-        ai, i = next(a, i)
+        ai, s = next(a, s)
        if ai > m || m!=m
            m = ai
            mi = iold


This could just be i and you could get rid of iold.

Of course. I fixed the code a bit too directly.

timholy · 2015-11-21T21:14:17Z

Yes, since you're now using an integer counter, it's fine to get rid of iterstate.

The state is not guaranteed to be equivalent to the index of the element.

nalimilan · 2015-11-22T11:42:39Z

@timholy I wonder why you didn't write the code that way in the first place? Did you have a more complex case in mind? Or for performance? My version sounds so terribly simple. :-)

I've updated the PR.

nalimilan · 2015-11-22T11:52:52Z

Actually, this PR loses the support for array indexing, it will only return the linear index. It looks like a simple solution would be to have the current method as a fallback for generic iterables, and add a specialized method for AbstractArray using for i in eachindex(A) and returning the index for the maximum. Does that sound OK?

We also need to make a decision as regards backporting, as this is technically an API break -- even if leaving it broken for ranges isn't acceptable either.

timholy · 2015-11-22T11:54:00Z

Nah, I think your version is just plain better 😄. I suspect I was wrestling with the idea of whether we should be returning the state (e.g., a CartesianIndex) rather than an integer index, since in some cases a[i] will be an error if i is an integer. But this would be breaking change, so I should have done the rewrite the way you did.

Thanks for doing this!

timholy · 2015-11-22T11:55:10Z

I think findmax has always returned an integer index, so for now backward compatibility wins (esp. for backports).

But you're right that a better solution would be nice.

nalimilan · 2015-11-22T12:34:47Z

@timholy OK, cool. This commit is good for backporting. But I wonder whether we shouldn't improve the behavior in master and return a cartesian index for arrays. That sounds more natural than a linear index (and indeed ?ind2sub gives precisely this example). Adding these lines could be enough:

    if isa(a, AbstractArray)
        return (m, ind2sub(size(a), mi))
    else
        return (m, mi)
    end

timholy · 2015-11-22T13:59:11Z

That wouldn't be type-stable, but it wouldn't be hard to make it so. In general I like the idea of returning a native iterator. One issue, though, is that this simple solution will break if/when we switch to supporting arrays with arbitrary indexes (i.e., not just 1:size(A, d) but also allow indexes which range, e.g., over -5:5).

nalimilan · 2015-11-22T14:28:32Z

Why wouldn't it be type-stable? The behavior would depend only on the type of the array. Regarding arrays with arbitrary indexes, we would have to take into account not only the size of the array, but the array itself (or its type) so that a specialized method could translate the index correctly. I can add a, say, readable_index(a, i) helper method which would also differ from ind2sub in that it would be a no-op for LinearSlow arrays, for which eachindex already returns cartesian indices.

I've also discovered one possible failure case: the one in which the iterable is also indexable, but the indexes differ from the rank of the elements. The only case I know of at the moment is strings, for which findmax doesn't sound very useful. But in an ideal world we would dispatch on an Indexable trait, in which case we would iterate over indexes rather than using the iteration protocol.

timholy · 2015-11-22T14:38:56Z

While this may change someday, isa does not currently get evaluated at compile time:

julia> function foo(x)
           if isa(x, AbstractArray)
               return 1
           else
               return 1.0
           end
       end
foo (generic function with 1 method)

julia> @code_warntype foo(rand(2,2))
Variables:
  x::Array{Float64,2}

Body:
  begin  # none, line 2:
      unless (Main.isa)(x::Array{Float64,2},Main.AbstractArray)::Bool goto 0 # none, line 3:
      return 1
      goto 1
      0:  # none, line 5:
      return 1.0
      1: 
  end::Union{Float64,Int64}

But if you define it as

foo(x::AbstractArray) = 1
foo(x) = 1.0

then it is type-stable.

timholy · 2015-11-22T14:43:54Z

Regarding arrays with arbitrary indexes, we would have to take into account not only the size of the array, but the array itself (or its type) so that a specialized method could translate the index correctly. I can add a, say, readable_index(a, i) helper method which would also differ from ind2sub in that it would be a no-op for LinearSlow arrays, for which eachindex already returns cartesian indices.

This is getting pretty close to the implementation we have now---the only difference is that I held onto the state and called sub2ind at the end, and you're proposing holding onto an integer and calling ind2sub at the end.

Maybe a better fix would be to leave the current one in place and write a special exception for Ranges? Really, for findmax there's no point iterating over each element of the range, because you can compute it directly (the maximum value is either first(r) or last(r)).

nalimilan · 2015-11-22T14:57:27Z

I don't think we can get away with this by making an exception for ranges: the fallback implementation should work in all cases and not rely on assumptions that are not in the API contract of iterables to support all possible custom types. (Of course, a more efficient implementation for ranges would be useful for other purposes.)

My proposal differs from the current implementation in two ways:

It would only apply the AbstractArray-specific behavior to those types.
It would return cartesian indices even for LinearFast arrays.

timholy · 2015-11-22T15:10:58Z

I guess there's still a part of me that wonders whether the state is the more fundamental object, as you yourself pointed out above for strings. But your plan makes a lot of sense. I have no objections.

nalimilan · 2015-11-22T15:16:11Z

But can I always call sub2ind, passing it the index obtained via eachindex, or do I need a special function for LinearSlow arrays?

(As regards the iteration state, we might consider requiring that it is equal to the index for indexable types.)

timholy · 2015-11-22T15:25:18Z

It would be a no-op for LinearSlow arrays, or perhaps it should extract the tuple from inside the CartesianIndex. However, if one is passing back the state, there is no need to call sub2ind---that's only if you want to pass back an integer.

(As regards the iteration state, we might consider requiring that it is equal to the index for indexable types.)

I wonder about that too. Anyone else want to chime in about whether this would be overly restrictive? Obviously Ranges would need some restructuring.

toivoh · 2015-11-22T20:46:09Z

I'm not sure how BitArray iteration is implemented right now, but I've got a feeling that nice iteration state could involve a (perhaps partially shifted) piece of the packed storage.

I don't think we should be in the business of prescribing the format of iteration state. Better in that case to introduce a function that will translate it into an index or similar, based on the type that is iterated over.

StefanKarpinski · 2015-11-22T23:24:35Z

+1 to what @toivoh said.

timholy · 2015-11-23T14:09:35Z

Sounds like we shouldn't assume that the state can be used as an index. So I'd say go ahead with your plan, @nalimilan, and we'll deal with difficulties from generalization as (or if) they arise.

nalimilan · 2015-11-23T14:15:39Z

OK. Could you tell me how I can extract the index tuple from the CartesianIndex?

timholy · 2015-11-23T14:46:58Z

julia/base/multidimensional.jl

Line 18 in 9c90d2c

I::NTuple{N,Int}

nalimilan · 2015-11-23T22:19:38Z

Thanks, I confused this simple structure with the more complex iterator state.

Though there are more issues to tackle:

If we return a (cartesian index) tuple of indices for matrices and above, should keep returning a scalar for vectors? The inconsistency doesn't sound great to me (cf. e.g. zip() returns scalar instead of tuple when passed a single argument in 0.4 #13979), yet returning a tuple with one element isn't exactly what I'd call an improved usability.
The same question applies to findmin(A, region). Should we keep returning an array of scalars when a single dimension is passed as region? Should an array of tuples be returned when several dimensions are passed?

At least always returning a linear index was simple. :-)

timholy · 2015-11-24T01:25:11Z

I don't know whether this makes it more or less complicated, but there's findn for the give-me-cartesian-indexes case. So we could just return linear indexes and call it good enough. Or we could be ambitious and clean this mess up, if only we figure out how it should be done 😄.

timholy · 2015-11-24T01:25:58Z

Maybe one could specify the desired output type, e.g.,

findmax(Int, a)
findmax(CartesianIndex, a)

nalimilan · 2015-11-27T08:50:55Z

Maybe one could specify the desired output type, e.g.,

That would be a possibility (and could also lead to replace findn(x) with find(CartesianIndex, x)), but that still doesn't answer the question about what to return by default. The current solution of returning a linear index is consistent with find, and it's predictable. Its drawback is that it's not necessarily the most efficient for LinearSlow arrays, though I'm not sure it matters in practice (maybe it would be more important for find?).

So I can see two solutions, to apply identically to find, findmax and findmin:

Keep returning a linear index as now and with this PR.
Return whatever eachindex returns by default.

In both cases, we can optionally support passing Int or CartesianIndex as the first argument to request a specific index type.

We can also consider in this discussion findmax(x, region), which currently returns a linear index, where it could also usefully return the index of the maximum on the dimension over which the reduction is performed. find(CartesianIndex, x, region) could work here too.

timholy · 2015-11-27T12:09:44Z

Yes, I also favor returning a linear index by default.

nalimilan · 2015-11-29T14:07:28Z

So, let's say we go with (and backport) this PR, and will tackle the question of findmax(CartesianIndex, x) later. I'll merge it if nobody objects.

timholy · 2015-11-29T14:27:25Z

👍

Fix findmax and findmin with iterables

The state is not guaranteed to be equivalent to the index of the element. (cherry picked from commit bbaaf54) ref #14086

nalimilan mentioned this pull request Nov 21, 2015

indmax and findmax on ranges do not behave as for other sequences #14085

Closed

nalimilan added the backport pending 0.4 label Nov 21, 2015

nalimilan force-pushed the nl/findminmax branch from dd8a737 to dc55d04 Compare November 21, 2015 17:15

timholy reviewed Nov 21, 2015
View reviewed changes

nalimilan force-pushed the nl/findminmax branch from dc55d04 to 2b7e75c Compare November 22, 2015 11:33

Fix findmax and findmin with iterables

bbaaf54

The state is not guaranteed to be equivalent to the index of the element.

nalimilan force-pushed the nl/findminmax branch from 2b7e75c to bbaaf54 Compare November 22, 2015 11:41

nalimilan added a commit that referenced this pull request Nov 29, 2015

Merge pull request #14086 from JuliaLang/nl/findminmax

eb5d37c

Fix findmax and findmin with iterables

nalimilan merged commit eb5d37c into master Nov 29, 2015

nalimilan deleted the nl/findminmax branch November 29, 2015 17:42

nalimilan added a commit that referenced this pull request Nov 30, 2015

Fix findmax and findmin with iterables

13a68a0

The state is not guaranteed to be equivalent to the index of the element. (cherry picked from commit bbaaf54) ref #14086

tkelman removed the backport pending 0.4 label Dec 7, 2015

ivarne mentioned this pull request Apr 11, 2016

findmax() on range #15825

Closed

nalimilan mentioned this pull request Aug 24, 2017

RFC: make indexable collection API more uniform (keys, values, pairs) #22907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix findmax and findmin with iterables #14086

Fix findmax and findmin with iterables #14086

nalimilan commented Nov 21, 2015

tkelman commented Nov 21, 2015

timholy Nov 21, 2015

nalimilan Nov 22, 2015

timholy commented Nov 21, 2015

nalimilan commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

timholy commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

timholy commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

toivoh commented Nov 22, 2015

StefanKarpinski commented Nov 22, 2015

timholy commented Nov 23, 2015

nalimilan commented Nov 23, 2015

timholy commented Nov 23, 2015

nalimilan commented Nov 23, 2015

timholy commented Nov 24, 2015

timholy commented Nov 24, 2015

nalimilan commented Nov 27, 2015

timholy commented Nov 27, 2015

nalimilan commented Nov 29, 2015

timholy commented Nov 29, 2015

Fix findmax and findmin with iterables #14086

Fix findmax and findmin with iterables #14086

Conversation

nalimilan commented Nov 21, 2015

tkelman commented Nov 21, 2015

timholy Nov 21, 2015

Choose a reason for hiding this comment

nalimilan Nov 22, 2015

Choose a reason for hiding this comment

timholy commented Nov 21, 2015

nalimilan commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

timholy commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

timholy commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

nalimilan commented Nov 22, 2015

timholy commented Nov 22, 2015

toivoh commented Nov 22, 2015

StefanKarpinski commented Nov 22, 2015

timholy commented Nov 23, 2015

nalimilan commented Nov 23, 2015

timholy commented Nov 23, 2015

nalimilan commented Nov 23, 2015

timholy commented Nov 24, 2015

timholy commented Nov 24, 2015

nalimilan commented Nov 27, 2015

timholy commented Nov 27, 2015

nalimilan commented Nov 29, 2015

timholy commented Nov 29, 2015