Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test outcome analysis: check that all available test cases have been executed #3458

Conversation

gilles-peskine-arm
Copy link
Contributor

@gilles-peskine-arm gilles-peskine-arm commented Jun 25, 2020

Introduce a new script tests/scripts/analyze_outcomes.py which is meant to analyze the collected test outcome files from a whole CI run.

In this pull request, the analysis does one thing: check the list of test cases that have run at least once against the list of available test cases (from unit tests and from ssl-opt.sh), and report the available test cases that haven't run at all. This is reported as a warning for now; in follow-ups, we should analyze the reports, work out which configurations we should add to exercise all the test cases, and finally crank up the warning to an error.

This addresses #2691, but does not fix it, since a failure only produces a warning, not failed CI.

This pull request also required some preliminaries:

  • Rename *-*.py to *_*.py. You can't import a script with a dash in its name.
  • Refactor check_test_cases.py (formerly check-test-cases.py) so that the new script can reuse its test case gathering code.
  • A tweak to basic-in-docker.sh. This script was aligned with Travis but I accidentally broke that in Rationalize Travis builds #3218. We should probably update it but it's out of scope of this PR; here I only do a minor documentation update.

Backports:

Call all.sh for sanity checks, rather than maintain an explicit list.
This was done in .travis.yml in 3c7ffd7

Travis has diverged from basic-in-docker. This commit updates the
description of basic-in-docker to no longer refer to Travis. Alignment
with Travis may be desirable but that is beyond the scope of this commit.

Signed-off-by: Gilles Peskine <[email protected]>
You can't import a Python script whose name includes '-'.

Signed-off-by: Gilles Peskine <[email protected]>
Parametrize the code that iterates over test case descriptions by the
function to apply on each description.

No behavior change.

Signed-off-by: Gilles Peskine <[email protected]>
@gilles-peskine-arm gilles-peskine-arm added enhancement needs-review Every commit must be reviewed by at least two team members, needs-backports Backports are missing or are pending review and approval. component-platform Portability layer and build scripts labels Jun 25, 2020
@gilles-peskine-arm gilles-peskine-arm self-assigned this Jun 25, 2020
Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this! Looks pretty good to me, just have a couple of questions and minor points.

Make the structure more Pythonic: use classes for abstraction and
refinement, rather than higher-order functions.

Convert walk(function, state, data) into instance.walk(data) where
instance has a method that implements function and state is a field of
instance.

No behavior change.

Signed-off-by: Gilles Peskine <[email protected]>
With previous refactorings, some functions are now solely meant to be
called from other functions in a particular class. Move them into this
class.

No behavior change.

Signed-off-by: Gilles Peskine <[email protected]>
This is a new script designed to analyze test outcomes collected
during a whole CI run.

This commit introduces the script, the code to read the outcome file,
and a very simple framework to report errors. It does not perform any
actual analysis yet.

Signed-off-by: Gilles Peskine <[email protected]>
Check that every available test case in the test suites and ssl-opt.sh
has been executed at least once.

For the time being, only report a warning, because our coverage is
incomplete. Once we've updated all.sh to have full coverage, this
warning should become an error.

Signed-off-by: Gilles Peskine <[email protected]>
@gilles-peskine-arm gilles-peskine-arm force-pushed the analyze_outcomes-count_test_cases-1 branch from b5b43b1 to 8d3c70a Compare June 26, 2020 16:30
@gilles-peskine-arm
Copy link
Contributor Author

I had accidentally erased the signoff line on two commits. Fixed in a force-push that doesn't change anything else.

@gilles-peskine-arm
Copy link
Contributor Author

I've analyzed the test cases reported as not executed. Some of these are false positives:

  • SSL tests whose description involves shell expansion (\", $1). I'm thinking of fixing this by adding an option ssl-opt.sh --list-available-test-cases and using its output instead of hand-parsing ssl-opt.sh.
  • Deliberately skipped SSL tests. This could also be resolved with ssl-opt.sh --list-available-test-cases.

I propose to treat all of these as known defects for now, to be corrected in follow-up PRs.

mpg
mpg previously approved these changes Jun 29, 2020
Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I think we can either merge this as is and create a follow-up PR to handle known issues (and switch from warning to errors for the rest), or expand this PR, as you prefer.

ronald-cron-arm
ronald-cron-arm previously approved these changes Jul 2, 2020
Copy link
Contributor

@ronald-cron-arm ronald-cron-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good to me. I've been able to play with analyze_outcomes.py and the outcome of ./tests/scripts/all.sh test_full_cmake_gcc_asan: only 322 tests not run. Just this test covers already a lot. By targeting specific tests with other configurations, we may be able to reduce the duration of the CI to check the PRs. Maybe something to look at here. I've just two suggestions for comment improvements that you may want to address.

file_name, line_number, description):
"""Process a test case.

per_file_state: a new object returned by per_file_state() for each file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per_file_state is an object and a method, it was a bit confusing to me to start with. Would it be possible to improve the documentation, naming?

@@ -76,59 +76,98 @@ def check_description(results, seen, file_name, line_number, description):
len(description))
seen[description] = line_number

def walk_test_suite(function, results, descriptions, data_file_name):
"""Iterate over the test cases in the given unit test data file.
class TestDescriptionExplorer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a comment on this line but on the short file purpose description.

With the addition of this abstract calls to iterate over test cases maybe the short file description should be updated. For the time being it is still just: "Sanity checks for test data.".

@gilles-peskine-arm
Copy link
Contributor Author

By targeting specific tests with other configurations, we may be able to reduce the duration of the CI to check the PRs.

I doubt that there's much to gain if anything. For most of the configurations, we aren't just interested in test cases that are specific to that particular configuration, but in all test cases, some of which trigger different behavior under the hood. There are even configurations where we aren't really interesting in running the tests, just in ensuring that the compilation succeeds, because the main risk is a missing #ifdef somewhere.

@mpg
Copy link
Contributor

mpg commented Jul 3, 2020

@gilles-peskine-arm According to github, you requested a review from Ronald and me, but we had both approved the PR and I can't see any change since our approval. Was it intentional? If so, could you clarify?

Signed-off-by: Gilles Peskine <[email protected]>
@gilles-peskine-arm gilles-peskine-arm dismissed stale reviews from ronald-cron-arm and mpg via bbb3664 July 3, 2020 07:33
@gilles-peskine-arm
Copy link
Contributor Author

@mpg Oops. I had made the improvements suggested by Ronald, but I only just now pushed them to the correct branch.

@mpg
Copy link
Contributor

mpg commented Jul 3, 2020

Ok, makes more sense now :)

Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mpg mpg removed the needs-backports Backports are missing or are pending review and approval. label Jul 3, 2020
Copy link
Contributor

@ronald-cron-arm ronald-cron-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the last changes.

@mpg mpg added needs-ci Needs to pass CI tests and removed needs-review Every commit must be reviewed by at least two team members, labels Jul 3, 2020
@gilles-peskine-arm gilles-peskine-arm added approved Design and code approved - may be waiting for CI or backports and removed needs-ci Needs to pass CI tests labels Jul 3, 2020
@gilles-peskine-arm gilles-peskine-arm merged commit 2426506 into Mbed-TLS:development Jul 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Design and code approved - may be waiting for CI or backports component-platform Portability layer and build scripts enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants