-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[subinterpreters] crash when importing _sre in subinterpreters in parallel (Python 3.9 regression) #90228
Comments
Seems as though cpython is broken when working with subinterpreters. The problematic change could be (d0d2965) affecting import.c.[1] Reverting this commit and rebuilding python fixes the issues on my system with some scripts that import sqlite, for example, the Kodi plugin YouTube[2] and the IMDB Trailers plugin. Others have reported similar breakage with other python code[3]. Example output when the bug manifests: ERROR <general>: Traceback (most recent call last):
ERROR <general>: File "/usr/lib/python3.10/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
ERROR <general>:
ERROR <general>: register_converter("timestamp", convert_timestamp)
ERROR <general>:
ERROR <general>: KeyError
ERROR <general>: :
ERROR <general>: 'timepart_full'
ERROR <general>:
ERROR <general>: Exception ignored deletion of interned string failed
ERROR <general>: : References: |
@eric - I have not seen this on 3.8 or 3.9. No data before 3.8. |
While this is being evaluated, can someone give an opinion about the sanity of simply reverting https://hg.python.org/lookup/d0d29655ff for now in order to use 3.10.1? Thanks. |
We have been seeing intermittent crashes on jep that we tracked down to the same change (d0d2965). I have created a sample program using _testcapi that crashes about 50% of the time when run on Windows with Python 3.9.9. We have not been able to reproduce problems on any other OS. See also: ninia/jep#366 |
I can sometimes reproduce the crash on Windows with Python 3.9. Call stack (most recent to oldest frames):
The crash occurs in meth_dealloc(), when deallocating the _sre_compile() method object stored in _sre module dictionary as the attribute "compile". The PyGC_Head.prev pointer is a dangling pointer. On Python 3.9, the "re" module is not imported at startup, but it's imported indirectly by "import importlib.util" via "import typing". On Python 3.10, the re module is no longer imported by "import importlib.util". The crash is random. Sometimes, I need 3 or 4 tries. Sometimes, it crash using -X dev. Sometimes, it crash immediately. When debugging in Visual Stuido, the crash seems easier to reproduce. On Python 3.9, the _sre exetnsion uses the old API: PyModule_Create() with PyModuleDef.m_size = -1. On Python 3.10, the _sre extension has been converted to multiphase init API: PyModuleDef_Init() with PyModuleDef.m_size = sizeof(_sremodulestate). Moreover, "import importlib.util" no longer imports indirectly the "re" module. |
Using attached bug.py, it's possible to trigger the crash on the main branch. I modified the reproducer to use the "_asyncio" extension which still uses the old API PyModule_Create() with PyModuleDef.m_size = -1. |
Hum, maybe bug.py exposes a different kind of bug. The _asyncio extension uses a non-trivial initialize code which doesn't seem to handle well concurrent "import _asyncio". |
FWIW, I've managed to reproduce once with win_py399_crash_reproducer.py on macOS 12.1 (with very high load average). With bug.py, I can reproduce almost always (more than 90% of the time). |
I can reproduce the crash on my macOS with main branch version. Fatal Python error: Segmentation fault Thread 0x0000700010389000 (most recent call first): Current thread 0x000070000f386000 (most recent call first): Thread 0x000070000e383000 (most recent call first): Thread 0x000070000d380000 (most recent call first): Thread 0x000000010a590e00 (most recent call first): Extension modules: _testcapi (total: 1) |
I have this happening on Linux with a Flask app after upgrading from Fedora 34 to 35. libpython keeps crashing httpd. I see this from journalctl:
I see this in /var/log/httpd/ssl_error_log: [Sat Jan 01 05:17:21.248640 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
[Sat Jan 01 05:17:21.249193 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
[Sat Jan 01 05:17:21.249222 2022] [wsgi:error] [pid 257749:tid 257758] File "/usr/lib64/python3.10/sqlite3/dbapi2.py", line
83, in register_adapters_and_converters
[Sat Jan 01 05:17:21.249453 2022] [wsgi:error] [pid 257749:tid 257758] register_converter("timestamp", convert_timestamp)
[Sat Jan 01 05:17:21.249469 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'timepart_full'
[Sat Jan 01 05:17:21.249484 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
[Sat Jan 01 05:17:21.249488 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
[Sat Jan 01 05:17:21.249493 2022] [wsgi:error] [pid 257749:tid 257758] File "/usr/lib64/python3.10/sqlite3/dbapi2.py", line
83, in register_adapters_and_converters
[Sat Jan 01 05:17:21.249572 2022] [wsgi:error] [pid 257749:tid 257758] register_converter("timestamp", convert_timestamp)
[Sat Jan 01 05:17:21.249582 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'timepart'
[Sat Jan 01 05:17:21.249590 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
[Sat Jan 01 05:17:21.249594 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
[Sat Jan 01 05:17:21.249598 2022] [wsgi:error] [pid 257749:tid 257758] File "/usr/lib64/python3.10/sqlite3/dbapi2.py", line
83, in register_adapters_and_converters
[Sat Jan 01 05:17:21.249667 2022] [wsgi:error] [pid 257749:tid 257758] register_converter("timestamp", convert_timestamp)
[Sat Jan 01 05:17:21.249676 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'datepart'
[Sat Jan 01 05:17:21.249697 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
[Sat Jan 01 05:17:21.249701 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
[Sat Jan 01 05:17:21.249706 2022] [wsgi:error] [pid 257749:tid 257758] File "<frozen importlib._bootstrap>", line 688, in _
load_unlocked
[Sat Jan 01 05:17:21.249804 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'convert_timestamp'
[Sat Jan 01 05:17:21.249813 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
[Sat Jan 01 05:17:21.249817 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
[Sat Jan 01 05:17:21.249822 2022] [wsgi:error] [pid 257749:tid 257758] File "<frozen importlib._bootstrap>", line 688, in _
load_unlocked
[Sat Jan 01 05:17:21.249889 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'convert_date'
[Sat Jan 01 05:17:21.249898 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
[Sat Jan 01 05:17:21.249901 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
[Sat Jan 01 05:17:21.249906 2022] [wsgi:error] [pid 257749:tid 257758] File "<frozen importlib._bootstrap>", line 688, in _
load_unlocked
[Sat Jan 01 05:17:21.249946 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'adapt_datetime'
[Sat Jan 01 05:17:21.249971 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
[Sat Jan 01 05:17:21.249977 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
[Sat Jan 01 05:17:21.249981 2022] [wsgi:error] [pid 257749:tid 257758] File "<frozen importlib._bootstrap>", line 688, in _
load_unlocked
[Sat Jan 01 05:17:21.250021 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'adapt_date'
[Sat Jan 01 05:17:22.058701 2022] [wsgi:error] [pid 249217:tid 249327] [client 31.13.127.13:53220] Truncated or oversized response headers received from daemon process 'https_site': /home/site/dmnes-site/viewer.wsgi libpython is crashing httpd a few times a minute for me, and I definitely was not seeing this with Fedora 34. I have Python 3.10.1 on F35, and had 3.9.9 on F34. If there's any further information I can provide, I'd be happy to help. |
In reply to the first comment here https://bugs.python.org/issue46070#msg408520 which affects several Kodi plugins, it seems that commenting out lines 80-84 in /usr/lib/python3.10/sqlite3/dbapi2.py "fixes" the bug in python 3.10.1. I do not know if that helps diagnose this future or not. |
Fedora issue: https://bugzilla.redhat.com/show_bug.cgi?id=2034962 |
It would be nice to add some tests. |
Victor:
Eric Snow:
Well, I recall that your change had to be reverted 2 or 3 times because there were many crashes on FreeBSD, and no one understood why it crashed. The root cause was bugs related to the GIL and daemon threads. It took me a while (and multiple commits) to identify and fix all of them: I decided to split your work into smaller changes to better debug these crashes. bpo-36854 contains a few changes, but these changes are based on work that I pushed earlier. For example, there was a tricky bug related to clearing a Python thread state: Also, once the GC was made per interpreter, we started to discover more and more tricky reference leaks: I spent a significant time to reorder code of Py_Finalize() and Py_EndInterpreter() to clear objects earlier or in a different order. Recently, I made sure that the free lists can no longer be used after they are cleared. It took some notes at: One of the hardest fix was the commit 9ad58ac of bpo-19466. To make this change, first I had to fix a very old bug of PyThreadState_Clear() with commit 5804f87 (bpo-20526). Well, it was a long journey and it's not done yet :-) |
pyobject_ob_interp.patch: Quick & dirty patch that I wrote to add PyObject.ob_interp, store in which interpreter an object has been created. |
I created bpo-46368: "faulthandler: add the ability to dump all interpreters, not only the current interpreter". |
sqlite3_crash.py does not crashes on python 3.9 below and equal 3.9.9 and python main branch 12c0012. I confirm it crashes on python 3.9.10, 3.9.11, 3.10 branch commit 9006b44 . Currently bisecting both 3.9.9 to 3.9.10 and 3.10 to 3.11 main branch for bad to good. The patches in this bug report are already merged in the 3.10 branch which crash. I cannot reproduce win_py399_crash_reproducer.py which I used as a basis for this test case. I can reproduce on aarch64 (Odroid C2) LibreElec and builds of cpython on Debian stable x86_64 (the extensive testing of the broken interpreters is done on x86_64 Debian stable bullseye with a cpython clone and running from builddir). |
By "It is fixed in main branch commit 12c0012 ." I mean that this commit is good not that this commit fixes the issue. |
I did 3.9 branch bisect and commit 52937c2 bpo-46070: _PyGC_Fini() untracks objects (GH-30577) (GH-30580) is the one that broke 3.9 . While with my main branch builds this commit is fine, it is not for the 3.9 branches. Proceeding with 3.10 branch bisect of first bad commit and redoing main branch first good commit. |
3.10 branch is fixed if I revert e6bb17f bpo-46070: _PyGC_Fini() untracks objects (GH-30577). Be it if I revert it atop current head 9006b44 or if I test the commit before e6bb17f was merged. As this made no sense with regards to this bug report history that this fix broke the branch, I tried v3.10.1 which lacks this fix. Vanilla is broken too. Also applying the "_PyGC_Fini untracks objects" upon 3.10.1 does not fix the test case crash. I am puzzled. Will try to bisect the commit that fixed the testcase in the 3.10 branch before "_PyGC_Fini untracks objects" was merged and after 3.10.1. |
While bisecting main branch I did not only get segfaults but also exceptions, namely: $ ./python ../python-crash-kodi/sqlite3_crash.py
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 81, in register_adapters_and_converters
register_adapter(datetime.datetime, adapt_datetime)
KeyError: 'isoformat'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
register_converter("timestamp", convert_timestamp)
KeyError: 'timepart_full'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
register_converter("timestamp", convert_timestamp)
KeyError: 'day'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
register_converter("timestamp", convert_timestamp)
KeyError: 'month'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
register_converter("timestamp", convert_timestamp)
KeyError: 'year'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
register_converter("timestamp", convert_timestamp)
KeyError: 'timepart'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
register_converter("timestamp", convert_timestamp)
KeyError: 'datepart'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
KeyError: 'convert_timestamp'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
KeyError: 'convert_date'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
KeyError: 'adapt_datetime'
Exception ignored deletion of interned string failed:
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
KeyError: 'adapt_date' The 3.10 branch bisect pointed to one commit that fix the crash after 3.10.1 which is 72c260c [3.10] bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422) (GH-30425) which makes sense regarding main branch logs. It is commit What remains to be seen is why "bpo-46070: _PyGC_Fini() untracks objects (GH-30577)" looks fine in the main branch (though it has no effect on the import crash) and not in 3.9 and 3.10 branch. |
bisect of main for bug.py with each local python builds get me to commit b127e70 "bpo-46070: Fix asyncio initialization guard (GH-30423)" as the one that fixed bug.py most of the time. At times I can make bug.py segfault be it on python 3.9, 3.10 or main branch. It is pretty hard (I can have a batch of 200 runs without an issue) but seems easier to reproduce with a CPU stressed, then I can have two segfaults in a batch of 50 runs. Bash: with: |
I managed to reproduce the bug.py crash with main branch up to commit 12c0012 . |
So far, only released Python v3.11.0a4 include my "_PyGC_Fini() untracks objects" fix. The 3.9 and 3.10 branch include my fix, but there is no release including it yet. Scheduled releases:
Oh really? This version include my fix. I cannot reproduce it on the current main branch (commit ea2f5bc) nor at commit 12c0012. Which commands did you type to build Python and reproduce the crash? What is your OS and C compiler? |
The original issue is fixed. Please open a new issue for sqlite3_crash.py. |
I created #92036 for sqlite3_crash.py bug. |
Should this be closed then? |
I kept this issue open for this reason:
|
My notes about this bug: https://twitter.com/VictorStinner/status/1481783100191428611 |
release: - https://www.python.org/downloads/release/python-3913/ changelog: - https://docs.python.org/release/3.9.13/whatsnew/changelog.html upstream fix for dropped patch - python/cpython#80254 - python/cpython#11984 - python/cpython#90228 - python/cpython#92036 - python/cpython#92037 - python/cpython#92297 - bpo-36073 - was - python/cpython#30579 - python/cpython#30580
nproc --all
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: