-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coverage.py test suite failure since commit 0749244d134 #115832
Comments
A bit more explanation: coverage.py can run with three different "measurement cores": a sys.settrace function written in C, or in Python, or with sys.monitoring. The tests are fine with either sys.settrace, but freeze with sys.monitoring. There are four different multiprocessing tests, which can run with fork or spawn. Any of the four freeze with spawn. Excluding "spawn", the entire test suite runs to completion. |
I'm investigating and will report back with any findings later today. |
I've figured out what's happening and I have a fix in mind, so you can assign this to me. My commit added what's essentially a thread-local cache of the global instrumentation version. During interpreter shutdown, that can get out of sync with the global version (the root of the issue). If we run Python code later in the shutdown process (from a GC, in this case), that can cause an infinite loop because this check thinks the current code object is out of date, while this check deeper in |
Great, I would never have been able to figure that out! |
…ter shutdown In python/cpython@bc613e4ecb3c, I introduced a bug to `interpreter_clear()`: it sets `interp->ceval.instrumentation_version` to 0, without making the corresponding change to `tstate->eval_breaker` (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and [this version check in bytecodes.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/bytecodes.c#L147-L152) will see a different result than [this one in instrumentation.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/instrumentation.c#L894-L895), causing an infinite loop. The fix itself is straightforward, and is what I should've done in `interpreter_clear()` in the first place: also clear `tstate->eval_breaker` when clearing `interp->ceval.instrumentation_version`. I also restored a comment that I'm not sure why I deleted in the original commit. To make bugs of this type less likely in the future, I changed `instrumentation.c:global_version()` to read the version from a `PyThreadState*` rather than a `PyInterpreterState*`, so it's reading the version from the same location as the interpreter loop. This had some fan-out effects on callers, although a lot of them were passing `tstate->interp` already, so this made them (very) slighty shorter. - Issue: pythongh-115832
I have a fix ready, but it depends on #115837 and I'm not sure if there's a good way to do stacked PRs, so I'll wait for that to land before posting it. |
…ter shutdown In python/cpython@bc613e4ecb3c, I introduced a bug to `interpreter_clear()`: it sets `interp->ceval.instrumentation_version` to 0, without making the corresponding change to `tstate->eval_breaker` (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and [this version check in bytecodes.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/bytecodes.c#L147-L152) will see a different result than [this one in instrumentation.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/instrumentation.c#L894-L895), causing an infinite loop. The fix itself is straightforward, and is what I should've done in `interpreter_clear()` in the first place: also clear `tstate->eval_breaker` when clearing `interp->ceval.instrumentation_version`. I also restored a comment that I'm not sure why I deleted in the original commit. To make bugs of this type less likely in the future, I changed `instrumentation.c:global_version()` to read the version from a `PyThreadState*` rather than a `PyInterpreterState*`, so it's reading the version from the same location as the interpreter loop. This had some fan-out effects on callers, although a lot of them were passing `tstate->interp` already, so this made them (very) slighty shorter. - Issue: pythongh-115832
…ter shutdown In python/cpython@0749244d13412d, I introduced a bug to `interpreter_clear()`: it sets `interp->ceval.instrumentation_version` to 0, without making the corresponding change to `tstate->eval_breaker` (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and [this version check in bytecodes.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/bytecodes.c#L147-L152) will see a different result than [this one in instrumentation.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/instrumentation.c#L894-L895), causing an infinite loop. The fix itself is straightforward, and is what I should've done in `interpreter_clear()` in the first place: also clear `tstate->eval_breaker` when clearing `interp->ceval.instrumentation_version`. I also restored a comment that I'm not sure why I deleted in the original commit. To make bugs of this type less likely in the future, I changed `instrumentation.c:global_version()` to read the version from a `PyThreadState*` rather than a `PyInterpreterState*`, so it's reading the version from the same location as the interpreter loop. This had some fan-out effects on callers, although a lot of them were passing `tstate->interp` already, so this made them (very) slighty shorter. - Issue: pythongh-115832
…ter shutdown In python/cpython@0749244d13412d, I introduced a bug to `interpreter_clear()`: it sets `interp->ceval.instrumentation_version` to 0, without making the corresponding change to `tstate->eval_breaker` (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and [this version check in bytecodes.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/bytecodes.c#L147-L152) will see a different result than [this one in instrumentation.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/instrumentation.c#L894-L895), causing an infinite loop. The fix itself is straightforward, and is what I should've done in `interpreter_clear()` in the first place: also clear `tstate->eval_breaker` when clearing `interp->ceval.instrumentation_version`. I also restored a comment that I'm not sure why I deleted in the original commit. To make bugs of this type less likely in the future, I changed `instrumentation.c:global_version()` to read the version from a `PyThreadState*` rather than a `PyInterpreterState*`, so it's reading the version from the same location as the interpreter loop. This had some fan-out effects on its transitive callers, although most of them already had the current tstate availale. - Issue: pythongh-115832
…utdown (#115856) A previous commit introduced a bug to `interpreter_clear()`: it set `interp->ceval.instrumentation_version` to 0, without making the corresponding change to `tstate->eval_breaker` (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and the version check in bytecodes.c will see a different result than the one in instrumentation.c causing an infinite loop. The fix itself is straightforward: clear `tstate->eval_breaker` when clearing `interp->ceval.instrumentation_version`.
…ter shutdown (python#115856) A previous commit introduced a bug to `interpreter_clear()`: it set `interp->ceval.instrumentation_version` to 0, without making the corresponding change to `tstate->eval_breaker` (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and the version check in bytecodes.c will see a different result than the one in instrumentation.c causing an infinite loop. The fix itself is straightforward: clear `tstate->eval_breaker` when clearing `interp->ceval.instrumentation_version`.
…ter shutdown (python#115856) A previous commit introduced a bug to `interpreter_clear()`: it set `interp->ceval.instrumentation_version` to 0, without making the corresponding change to `tstate->eval_breaker` (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and the version check in bytecodes.c will see a different result than the one in instrumentation.c causing an infinite loop. The fix itself is straightforward: clear `tstate->eval_breaker` when clearing `interp->ceval.instrumentation_version`.
…ter shutdown (python#115856) A previous commit introduced a bug to `interpreter_clear()`: it set `interp->ceval.instrumentation_version` to 0, without making the corresponding change to `tstate->eval_breaker` (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and the version check in bytecodes.c will see a different result than the one in instrumentation.c causing an infinite loop. The fix itself is straightforward: clear `tstate->eval_breaker` when clearing `interp->ceval.instrumentation_version`.
Bug report
Bug description:
Coverage.py runs its test suite against nightly builds of Python. The tests stalled last night. Running them locally, there's one test that freezes the suite. The GitHub action is eventually cancelled by timing out after six hours: https://github.com/nedbat/coveragepy/actions/runs/8000886484
Bisecting CPython, the first bad commit is 0749244
I'm still investigating, but I'm hoping that someone familiar with the change (@swtaarrs?) will be able to help.
To reproduce:
and there it is stuck.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux, macOS
Linked PRs
The text was updated successfully, but these errors were encountered: