Improve coverage of the test suite #314

fingolfin · 2015-10-28T09:05:01Z

I think one of the major problems in GAP development is that we lack a comprehensive test suite. Hence, any given change, no matter how innocent it may appear, can break functionality in GAP, and it may take us months until we notice it, say after a user reports an issue.

Let's try to fix that. We discussed this often enough, so, I'll try to summarize what I think the conclusion of those discussions is. This issue here then could be considered a "meta-issue", tracking the overal progress, with the actual work done elsewhere.

The test setup should be improved, to make running tests easier. See issue Improve the test setup #260.
Of course, adding new tests should also be easy. I believe that our current test system is very useful, but is also limited in some regards, so perhaps we need to add some new functionality there (not to replace the existing test system, but to augment it). My UnitTests package was an experiment in that direction, and I feel there are some other ideas we could explore in this regard. But that should not be discussed here, but rather on a separate issue and/or the mailing lists, and/or in live meetings.
We need a way to measure test coverage. @ChrisJefferson wrote code that can generate webpage that shows for each file in the GAP library whether it is hit by any test or not. Of course line coverage is not as good as actually thinking in detail about each function and whether every situation in it is properly covered; but it is still tremendously better than not having any way to find out which code is tested or not. Therefore, I think we should make heavy use of this feature. In particular:
1. Make sure it is as easy as possible to use, and well-documented, so that devs can and will uses it on their machines regularly (not just for testing GAP's test coverage, but also of their packages)
2. Have a sever somewhere which automatically runs this regulalry, and provides the results easily accessible to everybody. Ideally, we'd also track some aggregate coverage numbers over time. This way, people can watch progress on the code coverage. This can be a huge motivation (think "gameification").
Get people to actually write meaningful tests. I know that the previous three points would already help motivate me to invest some time into that. But in the end, it's a huge project. But it might be something that could be used to get people involved in the project. Of course there are some functions were you really need to know what's going on in order to properly (and correctly) test them, but right now, I believe there are tons of low-hanging fruits. E.g. writing tests for the ZmodnZ code in all its variants.
Perhaps some people can also get fund for this, and hire some students to help out with this. Let's be creative!

The text was updated successfully, but these errors were encountered:

olexandr-konovalov · 2015-10-28T09:45:14Z

@ChrisJefferson I think it's now a hot time to put your profiling package on deposit - it would be useful by many reasons to distribute it with GAP 4.8 beta, not waiting until the first public GAP 4.8.

By the way, do we have a way to consolidate test coverage results from several test sessions? For example, the union of teststandard and testmanuals may exercise more code that any of these alone.

olexandr-konovalov · 2015-10-28T09:50:48Z

@ChrisJefferson furthermore, teststandard runs teststandard.g in a single GAP session, but testmanuals runs each manual chapter in a separate GAP session.

@fingolfin certainly we could set up Jenkins job to run the coverage test e.g. weekly and then publish them somewhere. I suggest that the first step towards this should be actual depositing of @ChrisJefferson's package.

frankluebeck · 2015-10-28T15:39:24Z

Of course, I fully agree that the testing setup and coverage needs to be improved. Important aspects for me would be:

easy to use (preferably via GAP level utilities)
easy to extend
easy to do the same for package code
usable on any local machine (setup should not depend on external tools, CI or jenkins configurations, and so on)

olexandr-konovalov · 2015-10-28T21:51:39Z

Of course, I fully agree that the testing setup and coverage needs to be improved. Important aspects for me would be:

Can't agree more with the four items below. Tried to collect some current and desirable things:

easy to use (preferably via GAP level utilities)

Works: for example, Read("tst/testinstall.g");
Desirable: clean up tests to be able to run the same test file several times in the same GAP session

easy to extend

Now easier: with new TestDirectory, adding new test file does not require any changes in tst/testinstall.g etc.
Desirable: clean up existing tests and run new ones in the way that the order of their execution does not matter. In particular, tests that require specific packages should be performed separately or moved to packages.

easy to do the same for package code

Good: Packages may reproduce the same design for their test files and also specify a default (not recommended to be longer than 5-10 minutes) test in PackageInfo.g file to be checked during GAP regression tests.
However: Only 56 of 119 packages are currently specifying such test file in their PackageInfo.g (though they may still have some test files, including some extended tests).

usable on any local machine (setup should not depend on external tools, CI or jenkins configurations, and so on)

This was a strong design guideline followed all the time since making GAP 4.5. Many steps in the testing and releasing workflow are VCS and CI agnostic, in particular:

version control is used only on the initial step of wrapping the release candidate - checking out the repository. The rest can be performed in a total isolation from the repository.
All tests that we run in Jenkins have corresponding make targets which may be performed independently.

stevelinton · 2015-10-29T11:06:43Z

There are a lot more things it would be good to be able to do at GAP level than
Read("tst/testinstall.g");
allows. @alex-konovalov has already mentioned rerunning tests. Access to the results of tests from GAP level (success/fail and performance metrics) would also be nice.

It would also be nice to have a finer structure than a whole test file, so that a file could contain many tests which could pass and fail independently (and perhaps also have independent timings etc.). Ideally you want to see something like "97 out of 103 tests in this file passed" and then drill down to see (personally or from a programme) the details of the failures -- "sum of mutable vectors was not mutable", "break loop executing such and such a command", "no break loop executing some other command", "no result returned when one was expected", etc.

ChrisJefferson · 2015-10-29T11:28:30Z

This kind of thing shouldn't be too hard to add, it's mainly a matter of file format!

There are in my mind two broad options:

Invent a new testing format (or use @fingolfin's unittesting package)
Extend the existing testing format.

Seeing as the existing testing format allows comments in files, it wouldn't be hard to have some simple special formatted lines to split files up: Perhaps something simple like "Start a line with ## to denote the start of a new block of tests", so you would write something like:

## test intersection
gap> Intersection(G,H) = Intersection(H,G);
true
gap> Intersection(G,G) = G
true

## test size
gap> Size(G)
6

frankluebeck · 2015-10-29T16:03:59Z

@ChrisJefferson: My approach would be 2) (Extend the existing testing format).

I would write utilities in GAP on top of Test. As already mentioned, one could use comment lines in test files for further metadata.

One idea: Using the utilities for code coverage one could determine (and cache) for each test file the list of code files containing code that is used during these tests. After editing some files one could then run exactly those tests which are relevant for the changed files.

fingolfin · 2015-10-30T11:53:25Z

My approach actually would be both 1) and 2). There are some tests for which the existing test format are nice, and improving it with annotations would be helpful. But there are also some kinds of test for which I find it very awkward and cumbersome to use the existing test format; e.g. for the tests in the recog package, rewriting them in the existing test format would IMHO quite annoying. This is the main reason why I wrote the UnitTests package (however, let me stress once more: That package is just a proof-of-concept, and definitely not finished or meant for global consumption; nor is it meant to replace the existing tests).

On another note, I think the Homalg people have yet another system, where they put tests into comments in their source files, and then extract them from there to be run?!? or so?!? perhaps @sebasguts could elaborate?

sebasguts · 2015-10-30T12:14:17Z

I do not think our approach is new here. We test manual examples. Sometimes we put them in the source code, see for example
https://github.com/homalg-project/homalg_project/blob/master/Gauss/gap/Sparse.gi .
We also have a different syntax for those tests in AutoDoc, see for example
https://github.com/homalg-project/CAP_project/blob/master/CAP/examples/testfiles/SpectralSequencesTest.gi
Those are then added to the manual and tested by RunExamples after extracting them by ExtractExamples
The latter has the advantage that the files are still readable by GAP or can be copied into the GAP shell.
The first has the advantage that they are very close to the methods which they test. But all of this uses GAPDoc.

We also have a package ExamplesForHomalg, which we use for further testing. But those tests are far away from the source code and almost certainly not what you have in mind here, right?

olexandr-konovalov · 2015-10-30T12:33:29Z

For the record, so do I in my packages - see e.g. wedderga/makedocrel.g. Whenever I build the manuals with this script, it also extracts all examples and writes them into test files (one per chapter) in the tst directory. These test files are not kept under version control, but are included in the release and run as parts of the default test which is referenced in PackageInfo.g.

frankluebeck · 2015-10-30T12:55:48Z

Testing examples in the manual is certainly one sensible test. But manual examples are usually not extensive tests for the code in question, and vice versa an extensive test suite for some code should not be put in manual examples.

If people want test code close to the implementation one could generate test files in the same way as the manual is collected from source (and other) files. E.g., using <#Test instead of <#GAPDoc pseudo-tags.

olexandr-konovalov · 2015-10-30T17:24:11Z

@frankluebeck of course that's true - it's a way to easily generate test files for the package and to ensure that examples within a chapter run 100% if they are in the same order, but not much more. Using <#Test sounds like an excellent idea!

ChrisJefferson · 2015-10-31T17:31:08Z

You can see the current coverage of both the C and GAP code at:

https://caj.host.cs.st-andrews.ac.uk/gap-cover/

This output is generated using the scripts (and vagrant machine) at:

https://github.com/ChrisJefferson/GAPCoverage

These scripts build a custom version of GAP. This is needed for the coverage of the C code.

olexandr-konovalov · 2015-10-31T17:52:00Z

@ChrisJefferson Thanks! Is is documented somewhere how to read this, e.g. what different colours in the GAP code coverage report mean?

ChrisJefferson · 2015-10-31T18:13:13Z

Some documentation would be good, more is coming :)

Green (Blue in the C code ) = line read and executed
Red = line read, not executed
Grey = lines which don't have code on (in GAP some of them might look like they have code on, that's just how GAP stores code internally)

In short -- we want no red on either profile.

markuspf · 2015-11-09T05:51:26Z

We need to run this on a regular basis and put the results on a webpage though, or are you doing that already?

olexandr-konovalov · 2015-11-09T09:46:23Z

I could set up a Jenkins build, but hardly earlier than November 24th.

ChrisJefferson · 2017-02-21T19:55:39Z

We've done most of this -- although it might still be nice to have a new test format.

fingolfin added the kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements label Oct 28, 2015

markuspf mentioned this issue Oct 29, 2015

Neaten output filenames from TestDirectory #321

Merged

fingolfin added the topic: tests issues or PRs related to tests label Jan 12, 2017

ChrisJefferson closed this as completed Feb 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve coverage of the test suite #314

Improve coverage of the test suite #314

fingolfin commented Oct 28, 2015

olexandr-konovalov commented Oct 28, 2015

olexandr-konovalov commented Oct 28, 2015

frankluebeck commented Oct 28, 2015

olexandr-konovalov commented Oct 28, 2015

stevelinton commented Oct 29, 2015

ChrisJefferson commented Oct 29, 2015

frankluebeck commented Oct 29, 2015

fingolfin commented Oct 30, 2015

sebasguts commented Oct 30, 2015

olexandr-konovalov commented Oct 30, 2015

frankluebeck commented Oct 30, 2015

olexandr-konovalov commented Oct 30, 2015

ChrisJefferson commented Oct 31, 2015

olexandr-konovalov commented Oct 31, 2015

ChrisJefferson commented Oct 31, 2015

markuspf commented Nov 9, 2015

olexandr-konovalov commented Nov 9, 2015

ChrisJefferson commented Feb 21, 2017

Improve coverage of the test suite #314

Improve coverage of the test suite #314

Comments

fingolfin commented Oct 28, 2015

olexandr-konovalov commented Oct 28, 2015

olexandr-konovalov commented Oct 28, 2015

frankluebeck commented Oct 28, 2015

olexandr-konovalov commented Oct 28, 2015

stevelinton commented Oct 29, 2015

ChrisJefferson commented Oct 29, 2015

frankluebeck commented Oct 29, 2015

fingolfin commented Oct 30, 2015

sebasguts commented Oct 30, 2015

olexandr-konovalov commented Oct 30, 2015

frankluebeck commented Oct 30, 2015

olexandr-konovalov commented Oct 30, 2015

ChrisJefferson commented Oct 31, 2015

olexandr-konovalov commented Oct 31, 2015

ChrisJefferson commented Oct 31, 2015

markuspf commented Nov 9, 2015

olexandr-konovalov commented Nov 9, 2015

ChrisJefferson commented Feb 21, 2017