coverage feature increases test runtime by factor 3 #879

MarkusH · 2017-09-20T08:05:07Z

Hi,

The 3.29.0 release added support to use coverage for its example optimization. According to the release notes, "Tests that are already running Hypothesis under coverage will likely get faster.".

We were and are running this command from within a tox environment, testing a Django project:

{envbindir}/coverage run project/manage.py test --setting project.test_settings -v 2 {posargs:project tests}

When updating to hypothesis >=3.29.0 our test runtime increases from around 10 minutes to 35 minutes. We did not set the use_coverage flag and don't have any other hypothesis settings.

The culprit seems to be / could be that coverage all the sudden starts collecting stats for files in the .tox environment as well as in /opt/ (we're running on CircleCI), not just in the application code. Here's the [run] section from the .coveragerc file:

[run]
source =
        project
        tests
branch = True
omit =
        project/*/migrations/*

I'm happy to give further details, just not sure what's helpful right now.

The text was updated successfully, but these errors were encountered:

DRMacIver · 2017-09-20T10:04:56Z

Ugh. Sorry that it's been that big of a slow down. I knew this could cause performance regressions on some workloads, but 3x was a lot more than I expected!

The culprit seems to be / could be that coverage all the sudden starts collecting stats for files in the .tox environment as well as in /opt/ (we're running on CircleCI), not just in the application code. Here's the [run] section from the .coveragerc file

Hypothesis currently doesn't take the coveragerc into account. It's something of a deliberate choice that it does this - coverage information from the code you call is actually pretty good for finding interesting behaviours - but good point that I hadn't taken that factor into account when writing the line about performance, sorry!

One thing to confirm is that you are running with the coverage C extension. It seems unlikely that you wouldn't be, but worth checking. If you run python -c 'from coverage.tracer import CTracer' in the virtualenv you're testing in, do you get an ImportError?

Anyway, performance of this mode is definitely something I'll be working on in the near to medium-term future. One of the likely changes is that I'm going to lower the default max_examples from 200 to 100 - with using coverage to improve the test case generation, running half as many examples should get you at least as much testing as you used to get (ideally more, but right now that's probably not true). You might find it useful to do the same for now as a workaround - it won't get you back to the old performance, but a 50% slow-down is at least more palatable than a 3x slowdown!

DRMacIver · 2017-09-20T15:49:21Z

With any luck, once #880 has been merged you should see default performance go back to roughly on par with what you had before - it implements the change of the default max_examples from 200 to 100, and it also removes branch tracking from our internal use of coverage so that ideally should get you the rest of the way (but it might not quite).

MarkusH · 2017-09-21T20:51:05Z

Thanks for looking into this!

Ugh. Sorry that it's been that big of a slow down. I knew this could cause performance regressions on some workloads, but 3x was a lot more than I expected!

It's software. Something is always not going to go the way one expects. Don't worry or stress yourself out over this.

One thing to confirm is that you are running with the coverage C extension. It seems unlikely that you wouldn't be, but worth checking. If you run python -c 'from coverage.tracer import CTracer' in the virtualenv you're testing in, do you get an ImportError?

I will give it a shot and report back.

We pinned hypothesis to <3.29.0 for now. That works for us.

MarkusH · 2017-09-22T10:19:02Z

I ran the command as requested:

tox.ini:

[tox]
skipsdist = True
envlist = hypothesis

[testenv]
basepython = python3.5
deps =
        -r{toxinidir}/requirements.txt
        -r{toxinidir}/requirements-test.txt
commands = python -c 'from coverage.tracer import CTracer ; print(CTracer)'

circle.yml:

machine:
    python:
        version: 3.5.3

test:
    override:
        - tox --recreate

Output for hypothesis 3.28.3

tox --recreate
hypothesis create: /home/ubuntu/project/.tox/hypothesis
hypothesis installdeps: -r/home/ubuntu/project/requirements.txt, -r/home/ubuntu/project/requirements-test.txt
hypothesis installed: ...,coverage==4.4.1,...,hypothesis==3.28.3,hypothesis-django==2.0.0
hypothesis runtests: PYTHONHASHSEED='3858773694'
hypothesis runtests: commands[0] | python -c from coverage.tracer import CTracer ; print(CTracer)
<class 'coverage.CTracer'>
___________________________________ summary ____________________________________
  hypothesis: commands succeeded
  congratulations :)

Output for hypothesis 3.30.0:

tox --recreate
hypothesis create: /home/ubuntu/project/.tox/hypothesis
hypothesis installdeps: -r/home/ubuntu/project/requirements.txt, -r/home/ubuntu/project/requirements-test.txt
hypothesis installed: ...,coverage==4.4.1,...,hypothesis==3.30.0,hypothesis-django==2.0.0,...
hypothesis runtests: PYTHONHASHSEED='278030502'
hypothesis runtests: commands[0] | python -c from coverage.tracer import CTracer ; print(CTracer)
<class 'coverage.CTracer'>
___________________________________ summary ____________________________________
  hypothesis: commands succeeded
  congratulations :)

DRMacIver · 2017-09-27T10:00:10Z

OK. You should see at least a factor of two improvement by updating to Hypothesis 3.30.4 (because it halves the number of examples...), hopefully but not definitely more.

To get the rest of the way and then some, I'm going to be addressing performance problems on the coverage side (funding for which would be extremely welcome if you felt so inclined 😉). I'll leave this ticket open for now to track the impact on the Hypothesis side of things, but it probably won't see much change outside of coverage.

Zac-HD · 2017-10-09T04:59:01Z

@MarkusH, could you test with Hypothesis 3.31.5 or later? #922 and #916 have substantially improved performance with coverage=True, and anecdotally our own tests under coverage used to take (all approximately) 18 minutes, peaked at around 45, and is now down to 12 minutes.

DRMacIver mentioned this issue Sep 20, 2017

Smart example selection based on coverage #880

Merged

lmshk mentioned this issue Sep 21, 2017

Coverage ignores .coveragerc #883

Closed

Zac-HD added process performance go faster! use less memory! and removed process labels Sep 29, 2017

dangitall mentioned this issue Oct 7, 2017

Significant performance degradation for v3.30.4 #919

Closed

jacg mentioned this issue Oct 11, 2017

Understand and fix hypothesis issues next-exp/IC#371

Open

Zac-HD mentioned this issue May 19, 2018

Coverage adds a lot of overhead when the base test is fast #914

Closed

DRMacIver mentioned this issue Sep 8, 2018

Remove and deprecate coverage-guided testing #1564

Merged

DRMacIver closed this as completed in #1564 Sep 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coverage feature increases test runtime by factor 3 #879

coverage feature increases test runtime by factor 3 #879

MarkusH commented Sep 20, 2017

DRMacIver commented Sep 20, 2017 •

edited

Loading

DRMacIver commented Sep 20, 2017

MarkusH commented Sep 21, 2017

MarkusH commented Sep 22, 2017

DRMacIver commented Sep 27, 2017

Zac-HD commented Oct 9, 2017

coverage feature increases test runtime by factor 3 #879

coverage feature increases test runtime by factor 3 #879

Comments

MarkusH commented Sep 20, 2017

DRMacIver commented Sep 20, 2017 • edited Loading

DRMacIver commented Sep 20, 2017

MarkusH commented Sep 21, 2017

MarkusH commented Sep 22, 2017

DRMacIver commented Sep 27, 2017

Zac-HD commented Oct 9, 2017

DRMacIver commented Sep 20, 2017 •

edited

Loading