Improve error messages in abstractarray.jl #11797

ScottPJones · 2015-06-21T11:50:27Z

This change improves the error messages for AbstractArray, and also (until one of the genius Julians fixes it, apparently quite soon) improves the coverage information.

ScottPJones · 2015-06-21T21:47:53Z

@mbauman @carnaval @timholy I know you weren't fans of #11792 (which I didn't like either, but felt driven to it by a desire for better coverage information), could you please look at this PR on it's own merits?
@kshyatt Imitation is the greatest form of flattery... You've done so much to improve error messages and coverage, I had thought I'd try my hand at it as well. Does this look good to you?

Also, I wanted to also see about adding more testing, because the coverage results I've been getting have been rather dismal. I'm not sure if it is because simply more tests are needed, or because the instructions for getting accurate coverage information require deleting sys.dylib, but AbstractArray lives in inference.dylib/.ji.

If you wonder why I picked abstractarray.jl, I think it is a great piece of enabling infrastructure.

mbauman · 2015-06-21T22:12:04Z

I agree some of those lines are ridiculously long (e.g., https://github.com/JuliaLang/julia/pull/11797/files#diff-2264bb51acec4e7e2219a3cb1c733651L65), and restructuring is a definite gain in readability. And I like that you've added arguments to some BoundsError()s.

But:

I really don't like that there is a change in semantics in some places. (x | y | z) && error() is different from if x; error(); end; if y; error(); end; if z; error(); end;. This file is very performance sensitive, and any little change like that could result in a difference between getting inlined or not (which is huge). Or it could simply change the performance slightly. Either way, I really don't like doing this blindly.

I don't like all the replacements from throw(BoundsError(…)) to throw_boundserror, assuming they were done blindly. Just like I hope we'll eventually get multi-line code coverage up and running, it also looks hopeful that there will be a solution to the GC frame issues in throw. So I really don't want to be using throw_boundserror more than we need to, and I especially don't want to go blindly replacing their usages without profiling or closer inspection. There could easily be other things in the functions that are causing a GC frame in any case, which means that the change isn't necessary.

I also find some of these rearrangements much less readable. checkbounds(…) && checkbounds(…) || error() is a very common idiom throughout this file. I find it's very nice to be able to quickly identify a guard line at the top of a function and see that it's checking bounds. I find the alternative much more taxing — my mind takes a few seconds to parse out what case is getting selected by !(x && y):

if !(checkbounds(…) && checkbounds(…))
   error()
end

There are other very short lines that were very readable before and now are harder for me to parse. I'm sure I would get used to it eventually, but if we're judging this independently from code coverage changes I don't see the purpose.

Finally, I frankly don't like spending time on this. I know you want everything in Julia to be done in the best way possible, but my time here is donated. Discussions like this are taxing and, to my mind, relatively unproductive. There are much better uses of all of our time than surface cosmetics and vetting blind changes.

ScottPJones · 2015-06-21T22:56:22Z

The splitting up of some of the tests was in order to be able to get good error messages.
Almost all of the BoundsError before did not give any arguments at all.
The throw_boundserror change was simply because I felt it was better to be safe than sorry... Is there downside to using throw_boundserror? If there really will be a solution soon, then no problem, I'll put the back.
About the short lines converted to ifs, that's just from what a number of people have told me, that they find if easier to read... you are the first who's come out to say they preferred it the short way.
I hadn't seen any objections to @kshyatt's changes that did exactly that, so I'm surprised that I get them,
for exactly the same type of changes.
In particular, the case with checkbounds you mentioned is exactly the sort of case that seems very difficult to parse with just the short circuit idiom (when there are more than one short-circuit operator,
I'd always prefer the if, for just a single one, if the condition is short, I've accustomed myself to the idiom)
I'm sorry, but I didn't think this was an issue of surface cosmetics... the only reason I looked at this was because 1) AbstractArray seems like something very fundamental, which deserves full coverage testing. 2) I wanted to improve the error messages, which I think is useful.
I can put the ifs back on one line, since it looks like the coverage issue will be fixed.
I'm very sorry that you think this is unproductive. I'm sorry I asked you to look at it.

mbauman · 2015-06-21T23:48:39Z

Perhaps I misunderstood your request to view this PR on its own merits. And I'm quite certain I'm having trouble distancing this from #11792.

When I looked through the diff, I saw lots of surface syntax changes, changes in semantics, and what looked to be blind usage of a quirky performance trick. Just like I said in #11792, if this is accompanied by an active push to improve testing coverage, I would definitely view this in a different light (I don't mean the same PR, just ongoing work). Or if this were just the improvements in error messages, again, I'd view that differently.

ScottPJones · 2015-06-22T00:31:12Z

I've been running the coverage tests on this, since I started (with the problem of the results may not be good because this is part of inference.dylib and I can't delete that like sys.dylib), and I wanted to start with this because this seems like such an important piece of work. I take all your points about not wanting stuff that might adversely affect performance, and if the GC issue is going to be resolved soon, like the coverage issue, then it just boils down to adding more information to the errors, adding more tests,
and seeing that they show up on the coverage... For me, it was a way of trying to grok some interesting work, and the last thing I wanted to do was to waste any of your time... on the contrary, I'd hoped that it would save you some time checking coverage and adding tests... I left the 3 line ifs, simply because of people saying they preferred it that way, and not seeing any objections to Kate's nice work. I'll take that out if you prefer (and definitely back out the throw_boundserror and semantic changes... I wonder though, if it were done with the same check as before, but then separate checks it that is true, to display different bounds error information, would be OK? Could that affect performance adversely?
Remember, I opened #11792 only because people had told me that line coverage was all that could be done, but that's rapidly been proven incorrect, much to my pleasure.

ScottPJones · 2015-06-23T01:28:29Z

This now only has the improved error messages, and a couple of lines with added whitespace where it was over the 92 character limit.

ScottPJones · 2015-07-07T20:45:58Z

Bump: @mbauman just wondering if you'd have a chance to look at this, thanks!

mbauman · 2015-07-07T20:49:46Z

Yes, this looks much better. I've been meaning to get to running a few cursory performance checks, but I think this should be good to go once I get the chance. Sorry I haven't gotten to it yet.

yuyichao · 2015-07-07T20:54:29Z

base/abstractarray.jl

-        throw(ArgumentError("source and destination must have same size"))
+        length(ir_dest) != length(ir_src) &&
+            throw(ArgumentError("source ($(length(ir_src))) and destination ($(length(ir_dest))) must have same size"))
+        throw(ArgumentError("source ($(length(jr_src))) and destination ($(length(jr_dest))) must have same size"))


The error message seems wrong.

What seems wrong to you? It is the same as the comparision, in this case, ir_dest compared to ir_src, and jr_dest compared to jr_src.

Well, the error message would be sth like

ArgumentError: source (2) and destination (3) must have same size

but 2 and 3 are not the source or destination.

I think a common pattern is to write source and destination must have the same size (got X and Y)

ScottPJones · 2015-07-07T20:54:38Z

NP! Thanks!

yuyichao · 2015-07-07T20:56:08Z

base/abstractarray.jl

@@ -676,7 +696,7 @@ function hcat{T}(A::AbstractVecOrMat{T}...)
    for j = 1:nargs
        Aj = A[j]
        if size(Aj, 1) != nrows
-            throw(ArgumentError("number of rows must match"))
+            throw(ArgumentError("number of rows of A[$j] ($(size(Aj,1))) must match $nrows"))


Better to avoid the A in the error message since it doesn't mean anything to the user of the function. Printing the object is probably too much, just print out the index of the argument is probably ok.

Same here, I think you can say number of rows of arrays must match (got $(map(x->size(x, 1), A))).

nalimilan · 2015-07-08T08:48:00Z

Great idea!

ScottPJones · 2015-07-08T11:48:57Z

I hope I've addressed all (but one) of the great review comments, thanks very much @nalimilan and @yuyichao for your time!
The only thing that I don't know how to completely address is the message for hvcat(nbc::Integer, as...).
I tried to improve it along the lines of other review comments, but I don't know how to display all the lengths (and it doesn't seem to be checking that, at least as far as I can tell).

mbauman · 2015-07-09T16:08:37Z

base/abstractarray.jl

@@ -62,7 +62,11 @@ ndims{T<:AbstractArray}(::Type{T}) = ndims(super(T))
 length(t::AbstractArray) = prod(size(t))::Int
 endof(a::AbstractArray) = length(a)
 first(a::AbstractArray) = a[1]
-first(a) = (s = start(a); done(a, s) ? throw(ArgumentError("collection must be non-empty")) : next(a, s)[1])
+function first(arr)


This is rather pedantic, I know, but I'd prefer the argument here to remain as a and not arr. This definition isn't just for arrays — it's defined for all iterables. Naming it arr strongly implies that it's just for arrays. If you really want to steer away from the one-letter name, itr would be better.

Similarly, s is commonly used as the iterable's state throughout Julia. str reads as string to me.

state then?

Sure, that'd be great.

That's fixed now, you're right.
To me, the a also strongly implied it was for arrays, so I do think itr is clearer,
and given that I've spent most all of my time looking at string handle code, where s reads as string and not state, you can see why I like a little clearer names.
(I know you know exactly what you meant, but the idea is to make things clearer for somebody new to this part of the code)

mbauman · 2015-07-09T16:35:42Z

base/abstractarray.jl

    end
    st = start(src)
    for j = 1:(soffs-1)
-        done(src, st) && throw(BoundsError())
+        done(src, st) && throw(BoundsError(src, j))


I'm not entirely sure about this. In this method, src is a general iterable and not an abstract array (there are later specializations for abstract array sources). Are there other places where we use bounds errors for iterables (that aren't arrays) terminating earlier than expected? If not, a more accurate message might be something along the lines of: throw(ArgumentError("source has fewer elements than required, expected at least $soffs, got $(j-1)")).

OK, I'll fix those up after lunch! Thanks!

mbauman · 2015-07-09T16:47:13Z

base/abstractarray.jl

@@ -690,7 +710,7 @@ function hcat{T}(A::AbstractVecOrMat{T}...)
    for j = 1:nargs
        Aj = A[j]
        if size(Aj, 1) != nrows
-            throw(ArgumentError("number of rows must match"))
+            throw(ArgumentError("number of rows of arrays must match (got $(map(x->size(x,1), A)))"))


Maybe "number of rows in each array must match (…)"?

mbauman · 2015-07-09T17:07:15Z

Besides a few very minor comments, this looks good. No perf regressions that I can measure, either. Thanks, Scott!

ScottPJones · 2015-07-10T17:21:52Z

@mbauman All done (took a while to handle the breakage in the tests from returning ArgumentError instead of BoundsError, esp. doing it the way @tkelman wanted to check that it was the specific error thrown) (I hope the way I did it is OK, I couldn't figure out anything better yet)

mbauman · 2015-07-10T18:01:03Z

The error type changing based upon the type of the source does seem unfortunate, especially in the context of that test file. It is technically more accurate, but I wonder if it'll just be a headache for folks to handle (e.g., if you know that the source might not have enough elements and want to catch the error, this will make your life very difficult).

mbauman · 2015-07-10T18:02:05Z

base/abstractarray.jl

-        throw(BoundsError())
+        dmax > length(dest) && throw(BoundsError(dest, dmax))
+        doffs < 1 && throw(BoundsError(dest, doffs))
+        throw(BoundsError(src, soffs))


Your tests will be a little more sane if this also throws an ArgumentError. But I'm not totally convinced that this is the way to go.

This needs to be consistent. Either we should change this to be an ArgumentError here, too, or we should revert the other back to a BoundsError. I don't really have strong feelings either way, but we gotta do one or the other.

ScottPJones · 2015-07-10T23:08:10Z

Ok, this looks clean now, thanks so much for all your time reviewing!!!

ScottPJones · 2015-07-12T23:54:44Z

I've rebased this (because of the other tests being added for abstractarrays), so now it can again me merged. Thanks again for the reviewing!

mbauman · 2015-07-13T00:27:51Z

We still need to resolve #11797 (comment) before this can be merged.

ScottPJones · 2015-07-13T00:33:37Z

Ah, sorry, I missed that one (away disconnecting with family this weekend!).
I think that the ArgumentError is more general, I'll change it that way, and fix the tests to match.

ScottPJones · 2015-07-13T13:46:38Z

I've been digging this, and unfortunately, it's a bit of a rathole. The copy! functions in general can return either BoundsError or ArgumentError (and do so currently, that's not new with this PR).
BoundsError is unfortunately rather limited when it comes to dealing with returning a message for an iterable, which is why I preferred ArgumentError.
Any particular consideration for which ones I should change?
Thanks, Scott.

mbauman · 2015-07-13T14:48:20Z

It's not that bad, is it? I think it's just lines 242 and 276. All the other bounds errors are called on an AbstractArray (dest), which is actually indexable. I don't mind having the function throw two different types of errors, but those pseudo-out-of-bounds errors from the iterable should be the same type. The point is that in your tests you shouldn't need to change error types depending upon the index value (instead just test if the source is an abstract array).

ScottPJones · 2015-07-13T15:34:03Z

Ok, thanks, I'll take care of that when I get home

ScottPJones · 2015-07-14T05:21:45Z

All updated, rebased, tests simpler (and passed), anything else?

mbauman · 2015-07-14T14:09:43Z

LGTM. Squash it, and then this should be good to merge.

Updated messages based on review Update variable names per Matt Updated tests to handle ArgumentError instead of BoundsError when passed iterable More specific exceptions for test_throws Fix dangling right parenthesis Update non-abstract array error messages and tests

ScottPJones · 2015-07-14T14:16:21Z

Done!

Improve error messages in abstractarray.jl

ScottPJones changed the title ~~Address #11792 in abstractarray.jl~~ Improve error messages in abstractarray.jl Jun 21, 2015

ScottPJones force-pushed the spj/cover1 branch from e266eaa to 0775d2d Compare June 23, 2015 00:17

yuyichao reviewed Jul 7, 2015
View reviewed changes

ScottPJones force-pushed the spj/cover1 branch from 0775d2d to a999cc8 Compare July 8, 2015 11:44

ScottPJones force-pushed the spj/cover1 branch from a999cc8 to 1558c58 Compare July 8, 2015 12:29

ScottPJones mentioned this pull request Jul 8, 2015

The Push for 90% Test Coverage #11885

Closed

14 tasks

ScottPJones force-pushed the spj/cover1 branch from 1558c58 to abd7abe Compare July 8, 2015 23:19

mbauman reviewed Jul 9, 2015
View reviewed changes

ScottPJones force-pushed the spj/cover1 branch from abd7abe to 8b153a8 Compare July 9, 2015 16:30

mbauman reviewed Jul 9, 2015
View reviewed changes

ScottPJones force-pushed the spj/cover1 branch 2 times, most recently from 6442a56 to e833fb9 Compare July 10, 2015 13:18

mbauman reviewed Jul 10, 2015
View reviewed changes

ScottPJones force-pushed the spj/cover1 branch from e833fb9 to 2a1105c Compare July 10, 2015 19:23

ScottPJones force-pushed the spj/cover1 branch from 2a1105c to c9bceae Compare July 12, 2015 22:56

ScottPJones force-pushed the spj/cover1 branch from c9bceae to f333a1a Compare July 14, 2015 04:49

ScottPJones force-pushed the spj/cover1 branch from f333a1a to f47c9b7 Compare July 14, 2015 14:16

mbauman added a commit that referenced this pull request Jul 14, 2015

Merge pull request #11797 from ScottPJones/spj/cover1

cecee20

Improve error messages in abstractarray.jl

mbauman merged commit cecee20 into JuliaLang:master Jul 14, 2015

ScottPJones deleted the spj/cover1 branch July 14, 2015 20:43

Improve error messages in abstractarray.jl #11797

Improve error messages in abstractarray.jl #11797

Conversation

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

mbauman commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

mbauman commented Jun 21, 2015

ScottPJones commented Jun 22, 2015

ScottPJones commented Jun 23, 2015

ScottPJones commented Jul 7, 2015

mbauman commented Jul 7, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ScottPJones commented Jul 7, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Jul 8, 2015

ScottPJones commented Jul 8, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbauman commented Jul 9, 2015

ScottPJones commented Jul 10, 2015

mbauman commented Jul 10, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ScottPJones commented Jul 10, 2015

ScottPJones commented Jul 12, 2015

mbauman commented Jul 13, 2015

ScottPJones commented Jul 13, 2015

ScottPJones commented Jul 13, 2015

mbauman commented Jul 13, 2015

ScottPJones commented Jul 13, 2015

ScottPJones commented Jul 14, 2015

mbauman commented Jul 14, 2015

ScottPJones commented Jul 14, 2015