Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: changing code idiom for more accurate coverage information #11792

Closed
ScottPJones opened this issue Jun 20, 2015 · 22 comments
Closed

RFC: changing code idiom for more accurate coverage information #11792

ScottPJones opened this issue Jun 20, 2015 · 22 comments

Comments

@ScottPJones
Copy link
Contributor

I've run into issues with the way the coverage tools only work on lines, and not basic blocks,
so that the very common idioms in Julia code of condition && action or condition || action mean that you won't be able to tell if the action part is covered.
Inspired by the tireless work of @kshyatt, which will help make Julia more reliable, I was thinking of
going through the code in Base (with a simple file-query-regex-replace) and changing all occurrences of:

    condition && statement # esp. things like throw, return, etc.
or
    condition || statement

to either

    condition &&
        statement
and respectively
    condition ||
        statement
or
    !(condition) &&
        statement

or

    if condition
        statement
    end
and respectively
    if !(condition)
        statement
    end

I think we need to talk about what the best idiom for use in Julia code is, taking into account the issues of coverage accuracy, conciseness, and ease of understanding the code (esp. for newcomers).
See #11782, and #11735

@tkelman
Copy link
Contributor

tkelman commented Jun 21, 2015

This would likely end up being a large PR that would be highly likely to go conflict-stale quickly, but I do think now that we have regular coverage measurements it would be a good idea to make it easier to see which error paths are or are not currently tested throughout base. Would be nice to have better coverage tools that can be more sophisticated than line-based ones, but that's a harder problem than doing a mass aesthetics-only refactoring. My preference is for the if - end form, it's simpler to understand for people who aren't used to Julia idioms yet and the ends aren't a big deal.

@quinnj
Copy link
Member

quinnj commented Jun 21, 2015

Maybe we need another push for the if x then y idiom to be actually
implemented as a better shorthand to x && y.

On Sat, Jun 20, 2015 at 9:18 PM, Tony Kelman [email protected]
wrote:

This would likely end up being a large PR that would be highly likely to
go conflict-stale quickly, but I do think now that we have regular coverage
measurements it would be a good idea to make it easier to see which error
paths are or are not currently tested throughout base. Would be nice to
have better coverage tools that can be more sophisticated than line-based
ones, but that's a harder problem than doing a mass aesthetics-only
refactoring. My preference is for the if - end form, it's simpler to
understand for people who aren't used to Julia idioms yet and the ends
aren't a big deal.


Reply to this email directly or view it on GitHub
#11792 (comment).

@ScottPJones
Copy link
Contributor Author

So far, 3 votes for if - end form, @kshyatt, you, and myself, none for a 2-line && or || form.
Why would it go conflict-stale quickly? It's the sort of thing I'd do during a single TV show...
it goes into a PR, gets tested, and hopefully somebody merges it quickly.

@ScottPJones
Copy link
Contributor Author

@quinnj if x then y wouldn't help the problem I'm trying to address here... inaccurate coverage info.

@ScottPJones
Copy link
Contributor Author

(I don't have anything against it, just wouldn't use it until there were a basic block coverage tool)

@tkelman
Copy link
Contributor

tkelman commented Jun 21, 2015

Just because it would touch a large number of lines of code (in a trivial way sure, but git doesn't care about that), and because of the

hopefully

It should still be looked over manually just to be absolutely sure your automated refactoring worked correctly. Maybe better as separate small chunks instead of massively all at once. Definitely split it into a few separate commits at least, even if you put them in the same PR.

@ScottPJones
Copy link
Contributor Author

Yes, I was thinking of doing it as a bunch of separate commits, or even PRs... right now, I just want to know which idiom people would prefer.

@carnaval
Copy link
Contributor

Well, it seems a bit strange to me to butcher the codebase because coverage tools had this dumb idea that line numbers are a decent approximation of program control flow. I mean, I'm all for being practical and make this change where needed, but let's not make all the code ugly just because the tooling sucks.

It would seem better to me to put this effort into making a better coverage tool, e.g., based on expressions evaluated instead of lines.

Again, in the meantime, if it helps people make tests better I'm fine with it being applied sparsely, I just don't want for it to be a systematic change. It strikes me as strange to have constructs in the language that are "considered bad". Either you have it and it's great, or you don't.

ScottPJones added a commit to ScottPJones/julia that referenced this issue Jun 21, 2015
@IainNZ
Copy link
Member

IainNZ commented Jun 21, 2015

As an aside, I've learnt that not all coverage reporting tools have this line-equals-meaning problem. So Coveralls, the one we are using essentially just because it was relatively popular and I was the first one to bother to create the interface to link our coverage results to it, doesn't support it AFAIK. However https://codecov.io/ does support partial line coverage, which suggests its achievable with tooling on our end. Of course, we don't really need any of these services anyway - they just provide a nice way to automatically process the raw output in a more easy-to-read form. I'm in Camp @carnaval, don't think there should be a systematic change for this right now but doing it as other changes happen seem reasonable, and work on partial line coverage reporting would be even better.

@ScottPJones
Copy link
Contributor Author

Here's the problem: The current condition && xxx idiom (which I don't have a problem with), combined with whatever is producing the .cov files when you use --code-coverage, does not show the coverage for what are probably some of the most important parts of the code... error cases that should be exercised via the unit tests.
@carnaval - @kshyatt is already systematically changing things like that, at least in the linalg code, so that the coverage info is accurate (and also adding much better error messages).

@ScottPJones
Copy link
Contributor Author

@IainNZ I don't think Coveralls has anything to do with it... it's not producing the .cov files.

@timholy
Copy link
Member

timholy commented Jun 21, 2015

The point still stands. We currently use Coveralls to visualize (and quantify) the results, and it's limited to lines. But if there are services that are more flexible, then the main bottleneck is on our side and therefore something we could fix. It would be interesting to hear what @Keno and others think of the feasibility of doing something like this---does any of the newer LLVM goodness make this easier? Here's a chunk of relevant code:

julia/src/codegen.cpp

Lines 1138 to 1158 in cb77503

static void coverageVisitLine(std::string filename, int line)
{
if (filename == "" || filename == "none" || filename == "no file")
return;
logdata_t::iterator it = coverageData.find(filename);
if (it == coverageData.end()) {
coverageData[filename] = std::vector<GlobalVariable*>(0);
}
std::vector<GlobalVariable*> &vec = coverageData[filename];
if (vec.size() <= (size_t)line)
vec.resize(line+1, NULL);
if (vec[line] == NULL) {
vec[line] = addComdat(new GlobalVariable(*jl_Module, T_int64, false,
GlobalVariable::InternalLinkage,
ConstantInt::get(T_int64,0), "lcnt"));
}
GlobalVariable *v = prepare_global(vec[line]);
builder.CreateStore(builder.CreateAdd(builder.CreateLoad(v),
ConstantInt::get(T_int64,1)),
v);
}

@carnaval
Copy link
Contributor

As I said, when this change is practically useful (i.e., when you actually add the tests that go with it, and probably not tests just for this case unless the function is well covered already), I'm all for it. Hell, if @kshyatt said that a single character per line makes writing tests easier I'd be fine with it (although I hope it won't come to this :-)).

However, it does not mean we cannot admit that this is a failure of the tooling and not the language construct itself. Changing the whole base without even adding the tests that go with it seems like going overboard (and declaring defeat) to me. My nightmare scenario goes something likes this : we change the whole base, when random contributor 133 comes in with this construct in the PR, someone (rightfully) says "please don't use this to stay consistent with the rest of the code", and this happens for a few PR. Then, someone, (rightfully again), being bored of repeating this puts it in the documentation somewhere "the && style of control flow should not be used because random coverage mumbling". There you go, you now have a perfectly valid language construct that is second-class and "you should not use it because best practice" although it's only really a failure of implementation.

The problem is that those things have a tendency to be long lived, and I'd bet you a beer of your choice that a year later you'd still have people somewhere not using && for errors because they read somewhere it was "bad".

Ok, aside from my day dreaming, what is really needed here is to use coverage at the lowered level of the code : regular (= non exceptional) control flow is linearized so at least you have this level of precision because you are closer to the actual program semantic, not this text lines things we petty humans use to input code :-)

There would still be some issues where f(may_throw()) would be reported as tested even if it threw the exception but that's still better than what we have now since it handles && (or any tricks like putting everything on a single lines with semicolons).

I don't know how our coverage stuff works so maybe it's already operating at that level and there is a bug, or the output is just not taking it into account fully.

@ScottPJones
Copy link
Contributor Author

@timholy I had raised the issue earlier, and had been told that line coverage was all that could be done.
I'd much prefer having the tools changed, because it isn't just in these easy to change situations with throw and return, but rather any conditional code on a single line.

@carnaval
Copy link
Contributor

Ok so looking at the code (

coverageVisitLine(filename, lno);
), it is emitting a coverage slot only when encountering a line number ast node. It should probably emit a slot for every statement in the function, linking those back to the current line number (= last line node encountered). Then, if the coverage tool supports it, you say that a line was only visited X% if only X% of the slots referring to that line are visited by the program.

I think experimenting in that direction is a much better use of energy that going around changing all the base error checking at once :-)

@ScottPJones
Copy link
Contributor Author

@carnaval That was precisely my point on when I commented on @kshyatt's #11518.
Note, that PR was merged without any controversy, and it basically does the same thing as this PR,
and doesn't add any tests. I don't see why I (or Kate) should be responsible for adding all the tests, that really should be the responsibility of the author(s) of the code in question.
@StefanKarpinski made these comments:

+1 – even though I like the terse one-line idiom, having clear, accurate coverage metrics is more important. Also these error messages tend to be long, so having them on a separate line is helpful.

and

There's nothing we can do about coverage tools having that problem – as long as the external services are line-based, they're going to have that problem. I would very much like to have basic-block-based coverage analysis, however.

@ScottPJones
Copy link
Contributor Author

@carnaval

  • I'm concerned about how well covered the code in base is (just as Kate seems to be)
  • I don't have the knowledge (yet) to do anything about the lack of basic block support in the coverage output (it seems like you do though!) (although it shouldn't be a slot per statement, it needs to have a slot per basic block [and I thought with a few exceptions, such as return, using, and import julia was just expressions, not statements (or so Stefan IIRC said)
  • Multiple people have said that using an if was actually more understandable (see Stefan's comment above, a few others have also said the same)
  • Experimenting in the direction of improving coverage analysis would be a LOT more energy for me than simply using file-query-regex-replace while I'm watching the TV with my wife! 😀 If you (or somebody else with experience in the area) have the energy to do so, that would absolutely fantastic,
    and that is what I originally asked for in RFC: prettify single line throws/returns and add more descriptive errors for linalg #11518. However, that doesn't detract from the value of getting better coverage information NOW, especially in parts of the code that have been implicated in performance issues/regressions. The more information we have, the better, IMO.
  • Who said anything about changing all of the base error checking at once??? 😀 I don't watch that much TV!

@ScottPJones
Copy link
Contributor Author

Also, I agree totally with you that this shouldn't ever be put in the documentation as the recommended way to do things... if the question comes up (before the coverage tools are improved), I'd just say clearly,
for now only, you need to do this to get better coverage info, but that that situation will be changing hopefully soon.

@carnaval
Copy link
Contributor

  • Don't make me say something I didn't, I'm not saying anyone was responsible for writing tests
  • Changing all with a regexp is precisely what I don't want. I would want this to only be done where we (you, @kshyatt, me, whoever wants to) actually plan to improve testing coverage.
  • The if being more understandable is unrelated, wherever it is more understandable it should be used, if it happens to be more understandable everywhere then we simply remove short circuit (but I'm sure that's not the case).
  • We do need a slot per statement because of exceptions which are not modeled by llvm basic blocks. So a();b();c();d() is a single BB but can abort in the middle. return is a statement, everything have it's own line in code_lowered is a statement.
  • If it is what it takes to avoid a full blown polemic on whether short circuit is a good thing in itself I'll try to find time to make this work better, say at the juliacon hackathon or something

@ScottPJones
Copy link
Contributor Author

OK, but you had said this:

when this change is practically useful (i.e., when you actually add the tests that go with it, and probably not tests just for this case unless the function is well covered already), I'm all for it.

  • It sounded like, in contrast to RFC: prettify single line throws/returns and add more descriptive errors for linalg #11518, you felt this change to abstractarray.jl should only go in if I added tests for all of the cases... that does seem like you felt I should be responsible for it (at least, if I want to have this get merged). I did use this immediately to test coverage of abstractarray.jl, and found that it's almost not covered at all... I don't know just how to get those methods called to improve the testing myself. Do you want to see my .cov file?
  • The if being more understandable was related, in that I also could have used a 2-line short circuit just as well, which is why I created an RFC issue to sound out which one people liked better.
  • OK, in the COS compiler I worked on, do a(),b(),c(),d() would be considered 4 BBs, for the same reason, you can't tell what the side effects are, like exceptions, changing global variables, etc.
    I'm surprised that it is considered a single BB here, for the same reason.
  • Now you see my evil plan, I've managed to annoy a really smart person into fixing the problem I'd raised! 😀

@mbauman
Copy link
Member

mbauman commented Jun 21, 2015

+1 for not doing this systematically, and instead doing it on a case by case basis driven by active work in improving coverage. There's hope that coverage tooling may improve in the future so let's not churn the waters unnecessarily.

@JeffBezanson
Copy link
Member

Then, if the coverage tool supports it, you say that a line was only visited X% if only X% of the slots referring to that line are visited by the program.

Sounds pretty good to me. I'd much rather do this than start picking bits of language syntax to effectively ban.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants