-
-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent report when combining reports across Python versions #1572
Comments
Seeing the same thing. Tried and failed at getting a repro. As soon as I reduced it to a single file using with statements and a test file, the problem goes away. Also threw xdist in the mix since that's where we are seeing this problem manifest. I even generated 10,000 fake tests and ran them all in parallel to try and rule out xdist as a culprit. Failed reproducer in the details. deps
Coverage config in the setup.cfg
File def f():
with open("hello", "w") as f:
f.write("world")
os.getenv("HELLO")
with open("hello") as f:
thing = f.read()
return thing
def g():
if 1 == 1:
return f()
def what():
if 1 == 0:
return g()
else:
return f()
import f
def test_f():
assert f.f() == "world"
def test_g():
assert f.g() == "world"
# and then 10,000 of the same thing
def test():
assert 1 == 1
def another():
assert 2 == 2 |
Attempting to triage this ticket at the pycon 2023 sprints |
Working with @paxnovem and @marcgibbons at PyCon 2023 and we were finally able to get this down to a consistently reproducible test case! 🎉 example.py: from contextlib import nullcontext
def foo():
with nullcontext():
pass
foo() shell python3.11 coverage erase
python3.8 coverage run example.py
python3.11 coverage report Output:
When coverage is run and the report is generated on the same version (Python 3.8 + Python 3.8 or Python 3.11 + Python 3.11), the issue does not occur and the coverage comes back as 100%. The issue appears to only occur using a version before Python 3.9 to run the tracer and a version of 3.10 and after the run the report. |
Tracing through code and how this all performs, we were able to determine the cause of this issue. When running coverage, the tracer only stores the captures lines and branches (arcs) that were executed during the run. When reporting on coverage, the reporter takes the captured traced lines and branches (arcs) that were executed and compares it to the parsed AST to determine what lines and branches existed to be executed. The bug currently occurs because of a difference in the number of branches that are detected from the AST in Python 3.11, where there are 2 additional branches added for example.py from contextlib import nullcontext
import sys
def foo():
if sys.version_info < (3, 10):
with nullcontext():
pass
foo() Shell: $ python3.8 -m coverage run example.py
$ python3.8 -m coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------
example.py 7 0 2 1 89%
----------------------------------------------
TOTAL 7 0 2 1 89%
$ python3.11 -m coverage run example.py
$ python3.11 -m coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------
example.py 7 2 4 1 55%
----------------------------------------------
TOTAL 7 2 4 1 55%
$ python3.11 -m coverage combine .coverage-py311 .coverage-py38
Combined data file .coverage-py38
$ python3.8 -m coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------
example.py 7 0 2 0 100%
----------------------------------------------
TOTAL 7 0 2 0 100%
$ python3.11 -m coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------
example.py 7 0 4 1 91%
----------------------------------------------
TOTAL 7 0 4 1 91% Looking at the "Branch" column you can see the 2 new branches in Python 3.11 that are not present in Python 3.8. Additionally, when the reports are combined together you can see the 1 missing branch for Python 3.11 (the second one for the My recommendation would be to always run the coverage report on the lowest version instead of on the highest version in order to avoid missing phantom branches. |
Describe the bug
Combining reports across different versions of Python yields inconsistent coverage results, depending on the version of Python that was used to produce the report.
To Reproduce
I've tried, but have been unsuccessful at reducing this to a minimal example; building an isolated version of the lines that are the source of the problem makes the problem go away. It manifests running the test suite of Briefcase on this PR, at commit 6ae1b86; it's a Pytest test suite of a pure Python codebase, using unittest.mock and monkeypatch. In CI we run coverage on Python 3.8-3.12, on macOS, Windows and Linux; however, the problem manifests with just Python3.9.13 and Python3.10.9 in the mix.
To reproduce on macOS (I'm seeing the same result on Ventura on M1, and Monterey on x86_64):
If you run this on Python3.9, it reports:
However, if you run on Python3.10, it reports:
Note the 3.10 version has an additional missing branch:
The problematic code is this (system.py: L187-206)
The test suite is hitting these lines; if you run a coverage reports on just 3.9, you see L190 missing; if you run a report on just 3.10, you get L192-193 missing (as expected). It's only when the two reports are combined that the "L192->202" branch is apparently uncovered.
Expected behavior
Coverage reports shouldn't be dependent on the Python version used to generate them.
Additional context
Interestingly, if you use Python3.10 to report on just 3.9 coverage (i.e., using the script above, use Py3.10 to generate the venv, but only run
tox -e py39
, you get a really weird report:This report includes the "missing" 192->202 branch, along with many others. A quick survey of the missing branches on
system.py
shows that they are all context managerwith
clauses; however, they're not reported as missing branches if the report is generated on Python3.9.My initial (mostly uneducated) guess is that the logic on Python3.10 for evaluating the list of branches is different to that on Python3.9 (possibly due to the change in context managers that allows for multiple context declarations in a single
with
statement?). Most of these "extra" missing branches are covered when the test suite is run on Python3.10; but the problem192->202
branch is version specific, it won't ever run on Python3.10, and so the Python3.10 report sees missing coverage.The text was updated successfully, but these errors were encountered: