Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve coverage of the test suite #314

Closed
fingolfin opened this issue Oct 28, 2015 · 18 comments
Closed

Improve coverage of the test suite #314

fingolfin opened this issue Oct 28, 2015 · 18 comments
Labels
kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements topic: tests issues or PRs related to tests

Comments

@fingolfin
Copy link
Member

I think one of the major problems in GAP development is that we lack a comprehensive test suite. Hence, any given change, no matter how innocent it may appear, can break functionality in GAP, and it may take us months until we notice it, say after a user reports an issue.

Let's try to fix that. We discussed this often enough, so, I'll try to summarize what I think the conclusion of those discussions is. This issue here then could be considered a "meta-issue", tracking the overal progress, with the actual work done elsewhere.

  1. The test setup should be improved, to make running tests easier. See issue Improve the test setup #260.
  2. Of course, adding new tests should also be easy. I believe that our current test system is very useful, but is also limited in some regards, so perhaps we need to add some new functionality there (not to replace the existing test system, but to augment it). My UnitTests package was an experiment in that direction, and I feel there are some other ideas we could explore in this regard. But that should not be discussed here, but rather on a separate issue and/or the mailing lists, and/or in live meetings.
  3. We need a way to measure test coverage. @ChrisJefferson wrote code that can generate webpage that shows for each file in the GAP library whether it is hit by any test or not. Of course line coverage is not as good as actually thinking in detail about each function and whether every situation in it is properly covered; but it is still tremendously better than not having any way to find out which code is tested or not. Therefore, I think we should make heavy use of this feature. In particular:
    1. Make sure it is as easy as possible to use, and well-documented, so that devs can and will uses it on their machines regularly (not just for testing GAP's test coverage, but also of their packages)
    2. Have a sever somewhere which automatically runs this regulalry, and provides the results easily accessible to everybody. Ideally, we'd also track some aggregate coverage numbers over time. This way, people can watch progress on the code coverage. This can be a huge motivation (think "gameification").
  4. Get people to actually write meaningful tests. I know that the previous three points would already help motivate me to invest some time into that. But in the end, it's a huge project. But it might be something that could be used to get people involved in the project. Of course there are some functions were you really need to know what's going on in order to properly (and correctly) test them, but right now, I believe there are tons of low-hanging fruits. E.g. writing tests for the ZmodnZ code in all its variants.
    Perhaps some people can also get fund for this, and hire some students to help out with this. Let's be creative!
@fingolfin fingolfin added the kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements label Oct 28, 2015
@olexandr-konovalov
Copy link
Member

@ChrisJefferson I think it's now a hot time to put your profiling package on deposit - it would be useful by many reasons to distribute it with GAP 4.8 beta, not waiting until the first public GAP 4.8.

By the way, do we have a way to consolidate test coverage results from several test sessions? For example, the union of teststandard and testmanuals may exercise more code that any of these alone.

@olexandr-konovalov
Copy link
Member

@ChrisJefferson furthermore, teststandard runs teststandard.g in a single GAP session, but testmanuals runs each manual chapter in a separate GAP session.

@fingolfin certainly we could set up Jenkins job to run the coverage test e.g. weekly and then publish them somewhere. I suggest that the first step towards this should be actual depositing of @ChrisJefferson's package.

@frankluebeck
Copy link
Member

Of course, I fully agree that the testing setup and coverage needs to be improved. Important aspects for me would be:

  • easy to use (preferably via GAP level utilities)
  • easy to extend
  • easy to do the same for package code
  • usable on any local machine (setup should not depend on external tools, CI or jenkins configurations, and so on)

@olexandr-konovalov
Copy link
Member

Of course, I fully agree that the testing setup and coverage needs to be improved. Important aspects for me would be:

Can't agree more with the four items below. Tried to collect some current and desirable things:

easy to use (preferably via GAP level utilities)

  • Works: for example, Read("tst/testinstall.g");
  • Desirable: clean up tests to be able to run the same test file several times in the same GAP session

easy to extend

  • Now easier: with new TestDirectory, adding new test file does not require any changes in tst/testinstall.g etc.
  • Desirable: clean up existing tests and run new ones in the way that the order of their execution does not matter. In particular, tests that require specific packages should be performed separately or moved to packages.

easy to do the same for package code

  • Good: Packages may reproduce the same design for their test files and also specify a default (not recommended to be longer than 5-10 minutes) test in PackageInfo.g file to be checked during GAP regression tests.
  • However: Only 56 of 119 packages are currently specifying such test file in their PackageInfo.g (though they may still have some test files, including some extended tests).

usable on any local machine (setup should not depend on external tools, CI or jenkins configurations, and so on)

This was a strong design guideline followed all the time since making GAP 4.5. Many steps in the testing and releasing workflow are VCS and CI agnostic, in particular:

  • version control is used only on the initial step of wrapping the release candidate - checking out the repository. The rest can be performed in a total isolation from the repository.
  • All tests that we run in Jenkins have corresponding make targets which may be performed independently.

@stevelinton
Copy link
Contributor

There are a lot more things it would be good to be able to do at GAP level than
Read("tst/testinstall.g");
allows. @alex-konovalov has already mentioned rerunning tests. Access to the results of tests from GAP level (success/fail and performance metrics) would also be nice.

It would also be nice to have a finer structure than a whole test file, so that a file could contain many tests which could pass and fail independently (and perhaps also have independent timings etc.). Ideally you want to see something like "97 out of 103 tests in this file passed" and then drill down to see (personally or from a programme) the details of the failures -- "sum of mutable vectors was not mutable", "break loop executing such and such a command", "no break loop executing some other command", "no result returned when one was expected", etc.

@ChrisJefferson
Copy link
Contributor

This kind of thing shouldn't be too hard to add, it's mainly a matter of file format!

There are in my mind two broad options:

  1. Invent a new testing format (or use @fingolfin's unittesting package)
  2. Extend the existing testing format.

Seeing as the existing testing format allows comments in files, it wouldn't be hard to have some simple special formatted lines to split files up: Perhaps something simple like "Start a line with ## to denote the start of a new block of tests", so you would write something like:

## test intersection
gap> Intersection(G,H) = Intersection(H,G);
true
gap> Intersection(G,G) = G
true

## test size
gap> Size(G)
6

@frankluebeck
Copy link
Member

@ChrisJefferson: My approach would be 2) (Extend the existing testing format).

I would write utilities in GAP on top of Test. As already mentioned, one could use comment lines in test files for further metadata.

One idea: Using the utilities for code coverage one could determine (and cache) for each test file the list of code files containing code that is used during these tests. After editing some files one could then run exactly those tests which are relevant for the changed files.

@fingolfin
Copy link
Member Author

My approach actually would be both 1) and 2). There are some tests for which the existing test format are nice, and improving it with annotations would be helpful. But there are also some kinds of test for which I find it very awkward and cumbersome to use the existing test format; e.g. for the tests in the recog package, rewriting them in the existing test format would IMHO quite annoying. This is the main reason why I wrote the UnitTests package (however, let me stress once more: That package is just a proof-of-concept, and definitely not finished or meant for global consumption; nor is it meant to replace the existing tests).

On another note, I think the Homalg people have yet another system, where they put tests into comments in their source files, and then extract them from there to be run?!? or so?!? perhaps @sebasguts could elaborate?

@sebasguts
Copy link
Member

I do not think our approach is new here. We test manual examples. Sometimes we put them in the source code, see for example
https://github.com/homalg-project/homalg_project/blob/master/Gauss/gap/Sparse.gi .
We also have a different syntax for those tests in AutoDoc, see for example
https://github.com/homalg-project/CAP_project/blob/master/CAP/examples/testfiles/SpectralSequencesTest.gi
Those are then added to the manual and tested by RunExamples after extracting them by ExtractExamples
The latter has the advantage that the files are still readable by GAP or can be copied into the GAP shell.
The first has the advantage that they are very close to the methods which they test. But all of this uses GAPDoc.

We also have a package ExamplesForHomalg, which we use for further testing. But those tests are far away from the source code and almost certainly not what you have in mind here, right?

@olexandr-konovalov
Copy link
Member

For the record, so do I in my packages - see e.g. wedderga/makedocrel.g. Whenever I build the manuals with this script, it also extracts all examples and writes them into test files (one per chapter) in the tst directory. These test files are not kept under version control, but are included in the release and run as parts of the default test which is referenced in PackageInfo.g.

@frankluebeck
Copy link
Member

Testing examples in the manual is certainly one sensible test. But manual examples are usually not extensive tests for the code in question, and vice versa an extensive test suite for some code should not be put in manual examples.

If people want test code close to the implementation one could generate test files in the same way as the manual is collected from source (and other) files. E.g., using <#Test instead of <#GAPDoc pseudo-tags.

@olexandr-konovalov
Copy link
Member

@frankluebeck of course that's true - it's a way to easily generate test files for the package and to ensure that examples within a chapter run 100% if they are in the same order, but not much more. Using <#Test sounds like an excellent idea!

@ChrisJefferson
Copy link
Contributor

You can see the current coverage of both the C and GAP code at:

https://caj.host.cs.st-andrews.ac.uk/gap-cover/

This output is generated using the scripts (and vagrant machine) at:

https://github.com/ChrisJefferson/GAPCoverage

These scripts build a custom version of GAP. This is needed for the coverage of the C code.

@olexandr-konovalov
Copy link
Member

@ChrisJefferson Thanks! Is is documented somewhere how to read this, e.g. what different colours in the GAP code coverage report mean?

@ChrisJefferson
Copy link
Contributor

Some documentation would be good, more is coming :)

Green (Blue in the C code ) = line read and executed
Red = line read, not executed
Grey = lines which don't have code on (in GAP some of them might look like they have code on, that's just how GAP stores code internally)

In short -- we want no red on either profile.

@markuspf
Copy link
Member

markuspf commented Nov 9, 2015

We need to run this on a regular basis and put the results on a webpage though, or are you doing that already?

@olexandr-konovalov
Copy link
Member

I could set up a Jenkins build, but hardly earlier than November 24th.

@fingolfin fingolfin added the topic: tests issues or PRs related to tests label Jan 12, 2017
@ChrisJefferson
Copy link
Contributor

We've done most of this -- although it might still be nice to have a new test format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements topic: tests issues or PRs related to tests
Projects
None yet
Development

No branches or pull requests

7 participants