Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[subinterpreters] crash when importing _sre in subinterpreters in parallel (Python 3.9 regression) #90228

Open
graysky mannequin opened this issue Dec 14, 2021 · 60 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-regex topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@graysky
Copy link
Mannequin

graysky mannequin commented Dec 14, 2021

BPO 46070
Nosy @vstinner, @encukou, @ericsnowcurrently, @ndjensen, @hroncok, @uckelman, @corona10, @miss-islington, @shihai1991, @erlend-aasland, @bsteffensmeier, @M-Reimer, @prahal
PRs
  • bpo-46070: Fix asyncio initialisation guard #30423
  • [3.10] bpo-46070: Fix asyncio initialisation guard (GH-30423) #30453
  • [3.9] bpo-46070: Fix asyncio initialisation guard (GH-30423) #30454
  • bpo-46070: Revert "bpo-36854: Move _PyRuntimeState.gc to PyInterpreterState (GH-17287)" #30564
  • [3.10] bpo-46070: Revert "bpo-36854: Move _PyRuntimeState.gc to PyInterpreterState (GH-17287)" #30565
  • [3.9] bpo-46070: Revert "bpo-36854: Move _PyRuntimeState.gc to PyInterpreterState (GH-17287)" #30566
  • bpo-46070: _PyGC_Fini() untracks objects #30577
  • [3.10] bpo-46070: _PyGC_Fini() untracks objects (GH-30577) #30578
  • [3.9] bpo-46070: _PyGC_Fini() untracks objects (GH-30577) #30579
  • [3.9] bpo-46070: _PyGC_Fini() untracks objects (GH-30577) #30580
  • Files
  • win_py399_crash_reproducer.py: Reproducer that intermittently crashes on windows
  • bug.py
  • pyobject_ob_interp.patch
  • sqlite3_crash.py: Reproducer that crashes 90% of the time
  • bug.py_asyncio_cpustressed-crash.log: asyncio bug.py crash while running stress -c nproc --all
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-12-14.10:57:50.149>
    labels = ['interpreter-core', 'type-bug', '3.9', '3.10', '3.11', 'expert-subinterpreters']
    title = '[subinterpreters] crash when importing _sre in subinterpreters in parallel (Python 3.9 regression)'
    updated_at = <Date 2022-03-24.21:16:20.771>
    user = 'https://bugs.python.org/graysky'

    bugs.python.org fields:

    activity = <Date 2022-03-24.21:16:20.771>
    actor = 'prahal'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core', 'Subinterpreters']
    creation = <Date 2021-12-14.10:57:50.149>
    creator = 'graysky'
    dependencies = []
    files = ['50494', '50497', '50560', '50699', '50700']
    hgrepos = []
    issue_num = 46070
    keywords = ['patch']
    message_count = 54.0
    messages = ['408520', '408539', '408540', '408541', '408558', '408568', '408633', '408662', '408664', '408665', '409245', '409255', '409461', '409573', '409686', '409772', '409778', '409780', '409795', '409798', '409802', '409962', '409965', '409966', '409970', '409980', '409983', '409984', '409991', '410010', '410014', '410442', '410444', '410446', '410447', '410493', '410497', '410498', '410500', '410505', '410507', '410509', '410510', '410513', '410517', '410518', '410520', '415954', '415955', '415963', '415964', '415970', '415975', '415977']
    nosy_count = 15.0
    nosy_names = ['vstinner', 'petr.viktorin', 'eric.snow', 'ndjensen', 'hroncok', 'uckelman', 'corona10', 'miss-islington', 'shihai1991', 'erlendaasland', 'graysky', 'bsteffensmeier', 'M-Reimer', 'jokot3', 'prahal']
    pr_nums = ['30423', '30453', '30454', '30564', '30565', '30566', '30577', '30578', '30579', '30580']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue46070'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @graysky
    Copy link
    Mannequin Author

    graysky mannequin commented Dec 14, 2021

    Seems as though cpython is broken when working with subinterpreters. The problematic change could be (d0d2965) affecting import.c.[1] Reverting this commit and rebuilding python fixes the issues on my system with some scripts that import sqlite, for example, the Kodi plugin YouTube[2] and the IMDB Trailers plugin. Others have reported similar breakage with other python code[3].

    Example output when the bug manifests:

    ERROR <general>: Traceback (most recent call last):
    ERROR <general>:   File "/usr/lib/python3.10/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
    ERROR <general>:     
    ERROR <general>: register_converter("timestamp", convert_timestamp)
    ERROR <general>: 
    ERROR <general>: KeyError
    ERROR <general>: : 
    ERROR <general>: 'timepart_full'
    ERROR <general>:                                              
    ERROR <general>: Exception ignored deletion of interned string failed
    ERROR <general>: :

    References:

    1. d0d2965
    2. Plugin incompatible with python 3.10.1 anxdpanic/plugin.video.youtube#255
    3. https://bbs.archlinux.org/viewtopic.php?id=272121

    @graysky graysky mannequin added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Dec 14, 2021
    @ericsnowcurrently
    Copy link
    Member

    (related: bpo-44059)

    Presumably the problem relates to global state used in different interpreters leading to an inconsistent state in the crashing extension (or its dependencies).

    @graysky, do you know if this was a problem before Python 3.8?

    @ericsnowcurrently ericsnowcurrently added topic-subinterpreters 3.9 only security fixes 3.11 only security fixes type-bug An unexpected behavior, bug, or error labels Dec 14, 2021
    @encukou
    Copy link
    Member

    encukou commented Dec 14, 2021

    Interned strings were broken in #64257, see bpo-46006. Maybe that's also the issue here?

    @graysky
    Copy link
    Mannequin Author

    graysky mannequin commented Dec 14, 2021

    @eric - I have not seen this on 3.8 or 3.9. No data before 3.8.

    @graysky
    Copy link
    Mannequin Author

    graysky mannequin commented Dec 14, 2021

    While this is being evaluated, can someone give an opinion about the sanity of simply reverting https://hg.python.org/lookup/d0d29655ff for now in order to use 3.10.1? Thanks.

    @encukou
    Copy link
    Member

    encukou commented Dec 14, 2021

    That was a fix for #61552, which might need to be reverted as well.

    Victor, could you take another look at #61552? I must admit I (still) don't understand this change; what would break if it was reverted (along with the fixup from bpo-44050)?

    @bsteffensmeier
    Copy link
    Mannequin

    bsteffensmeier mannequin commented Dec 15, 2021

    We have been seeing intermittent crashes on jep that we tracked down to the same change (d0d2965).

    I have created a sample program using _testcapi that crashes about 50% of the time when run on Windows with Python 3.9.9. We have not been able to reproduce problems on any other OS.

    See also: ninia/jep#366

    @vstinner
    Copy link
    Member

    I can sometimes reproduce the crash on Windows with Python 3.9. Call stack (most recent to oldest frames):

    • PyObject_GC_UnTrack() - crash on _PyGCHead_SET_NEXT(prev, next) because prev is dangling pointer (0x1fe64dd5250), Visual Studio is unable to read memory
    • meth_dealloc() -- delete _sre_compile() method object
    • (...)
    • PyDict_SetItem() -- set "compile" to None
    • _PyModule_ClearDict() -- clear the "_sre" module dict
    • _PyModule_Clear()
    • _PyImport_Clenaup()
    • Py_EndInterpreter()
    • (...)
    • run_in_subinterp()
    • (...)
    • t_bootstrap()

    The crash occurs in meth_dealloc(), when deallocating the _sre_compile() method object stored in _sre module dictionary as the attribute "compile".

    The PyGC_Head.prev pointer is a dangling pointer.

    On Python 3.9, the "re" module is not imported at startup, but it's imported indirectly by "import importlib.util" via "import typing". On Python 3.10, the re module is no longer imported by "import importlib.util".

    The crash is random. Sometimes, I need 3 or 4 tries. Sometimes, it crash using -X dev. Sometimes, it crash immediately. When debugging in Visual Stuido, the crash seems easier to reproduce.

    On Python 3.9, the _sre exetnsion uses the old API: PyModule_Create() with PyModuleDef.m_size = -1.

    On Python 3.10, the _sre extension has been converted to multiphase init API: PyModuleDef_Init() with PyModuleDef.m_size = sizeof(_sremodulestate). Moreover, "import importlib.util" no longer imports indirectly the "re" module.

    @vstinner
    Copy link
    Member

    Using attached bug.py, it's possible to trigger the crash on the main branch. I modified the reproducer to use the "_asyncio" extension which still uses the old API PyModule_Create() with PyModuleDef.m_size = -1.

    @vstinner vstinner changed the title broken subinterpreters _PyImport_FixupExtensionObject() regression causing a crash in subintepreters Dec 16, 2021
    @vstinner vstinner changed the title broken subinterpreters _PyImport_FixupExtensionObject() regression causing a crash in subintepreters Dec 16, 2021
    @vstinner
    Copy link
    Member

    Hum, maybe bug.py exposes a different kind of bug. The _asyncio extension uses a non-trivial initialize code which doesn't seem to handle well concurrent "import _asyncio".

    @erlend-aasland
    Copy link
    Contributor

    FWIW, I've managed to reproduce once with win_py399_crash_reproducer.py on macOS 12.1 (with very high load average). With bug.py, I can reproduce almost always (more than 90% of the time).

    @corona10
    Copy link
    Member

    I can reproduce the crash on my macOS with main branch version.

    Fatal Python error: Segmentation fault

    Thread 0x0000700010389000 (most recent call first):
    File "/Users/user/oss/cpython/bug.py", line 16 in doIt
    File "/Users/user/oss/cpython/Lib/threading.py", line 968 in run
    File "/Users/user/oss/cpython/Lib/threading.py", line 1031 in _bootstrap_inner
    File "/Users/user/oss/cpython/Lib/threading.py", line 988 in _bootstrap

    Current thread 0x000070000f386000 (most recent call first):
    File "/Users/user/oss/cpython/bug.py", line 16 in doIt
    File "/Users/user/oss/cpython/Lib/threading.py", line 968 in run
    File "/Users/user/oss/cpython/Lib/threading.py", line 1031 in _bootstrap_inner
    File "/Users/user/oss/cpython/Lib/threading.py", line 988 in _bootstrap

    Thread 0x000070000e383000 (most recent call first):
    File "/Users/user/oss/cpython/bug.py", line 16 in doIt
    File "/Users/user/oss/cpython/Lib/threading.py", line 968 in run
    File "/Users/user/oss/cpython/Lib/threading.py", line 1031 in _bootstrap_inner
    File "/Users/user/oss/cpython/Lib/threading.py", line 988 in _bootstrap

    Thread 0x000070000d380000 (most recent call first):
    File "/Users/user/oss/cpython/bug.py", line 16 in doIt
    File "/Users/user/oss/cpython/Lib/threading.py", line 968 in run
    File "/Users/user/oss/cpython/Lib/threading.py", line 1031 in _bootstrap_inner
    File "/Users/user/oss/cpython/Lib/threading.py", line 988 in _bootstrap

    Thread 0x000000010a590e00 (most recent call first):
    File "/Users/user/oss/cpython/Lib/threading.py", line 1125 in _wait_for_tstate_lock
    File "/Users/user/oss/cpython/Lib/threading.py", line 1105 in join
    File "/Users/user/oss/cpython/bug.py", line 23 in func
    File "/Users/user/oss/cpython/bug.py", line 25 in <module>

    Extension modules: _testcapi (total: 1)
    [1] 9098 segmentation fault ./python.exe bug.py

    @uckelman
    Copy link
    Mannequin

    uckelman mannequin commented Jan 1, 2022

    I have this happening on Linux with a Flask app after upgrading from Fedora 34 to 35. libpython keeps crashing httpd.

    I see this from journalctl:

     #0  0x00007fd899baa801 PyObject_Malloc (libpython3.10.so.1.0 + 0xf7801)
      #1  0x00007fd899baab47 PyUnicode_New (libpython3.10.so.1.0 + 0xf7b47)
      #2  0x00007fd899bb9aae _PyUnicode_FromUCS1 (libpython3.10.so.1.0 + 0x106aae)
      #3  0x00007fd899bb9323 r_object (libpython3.10.so.1.0 + 0x106323)
      #4  0x00007fd899bb8d46 r_object (libpython3.10.so.1.0 + 0x105d46)
      #5  0x00007fd899bb90b4 r_object (libpython3.10.so.1.0 + 0x1060b4)
      #6  0x00007fd899bb8d65 r_object (libpython3.10.so.1.0 + 0x105d65)
      #7  0x00007fd899bb9088 r_object (libpython3.10.so.1.0 + 0x106088)
      #8  0x00007fd899bb8e33 r_object (libpython3.10.so.1.0 + 0x105e33)
      #9  0x00007fd899bb9088 r_object (libpython3.10.so.1.0 + 0x106088)
      #10 0x00007fd899c35c28 read_object (libpython3.10.so.1.0 + 0x182c28)
      #11 0x00007fd899c48f56 marshal_loads (libpython3.10.so.1.0 + 0x195f56)
      #12 0x00007fd899bc88d7 cfunction_vectorcall_O (libpython3.10.so.1.0 + 0x1158d7)
      #13 0x00007fd899bc0c80 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x10dc80)
      #14 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #15 0x00007fd899bbccba _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x109cba)
      #16 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #17 0x00007fd899bbbd6d _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x108d6d)
      #18 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #19 0x00007fd899bbbd6d _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x108d6d)
      #20 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #21 0x00007fd899bbbac2 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x108ac2)
      #22 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #23 0x00007fd899bbbac2 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x108ac2)
      #24 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #25 0x00007fd899bc8a9e object_vacall (libpython3.10.so.1.0 + 0x115a9e)
      #26 0x00007fd899bd247c _PyObject_CallMethodIdObjArgs (libpython3.10.so.1.0 + 0x11f47c)
      #27 0x00007fd899bd21d7 PyImport_ImportModuleLevelObject (libpython3.10.so.1.0 + 0x11f1d7)
      #28 0x00007fd899bbfc8e _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x10cc8e)
      #29 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #30 0x00007fd899c360d4 PyEval_EvalCode (libpython3.10.so.1.0 + 0x1830d4)
      #31 0x00007fd899c3d091 builtin_exec (libpython3.10.so.1.0 + 0x18a091)
      #32 0x00007fd899bc94b0 cfunction_vectorcall_FASTCALL (libpython3.10.so.1.0 + 0x1164b0)
      #33 0x00007fd899bc2209 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x10f209)
      #34 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #35 0x00007fd899bc0c80 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x10dc80)
      #36 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #37 0x00007fd899bbbd6d _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x108d6d)
      #38 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #39 0x00007fd899bbbac2 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x108ac2)
      #40 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #41 0x00007fd899bbbac2 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x108ac2)
      #42 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #43 0x00007fd899bc8a9e object_vacall (libpython3.10.so.1.0 + 0x115a9e)
      #44 0x00007fd899bd247c _PyObject_CallMethodIdObjArgs (libpython3.10.so.1.0 + 0x11f47c)
      #45 0x00007fd899bd21d7 PyImport_ImportModuleLevelObject (libpython3.10.so.1.0 + 0x11f1d7)
      #46 0x00007fd899bbfc8e _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x10cc8e)
     #47 0x00007fd899bba984 _PyEval_Vector (libpython3.10.so.1.0 + 0x107984)
      #48 0x00007fd899c360d4 PyEval_EvalCode (libpython3.10.so.1.0 + 0x1830d4)
      #49 0x00007fd899c36006 exec_code_in_module (libpython3.10.so.1.0 + 0x183006)
      #50 0x00007fd899ba33e7 PyImport_ExecCodeModuleObject (libpython3.10.so.1.0 + 0xf03e7)
      #51 0x00007fd899ba3482 PyImport_ExecCodeModuleWithPathnames (libpython3.10.so.1.0 + 0xf0482)
      #52 0x00007fd899e0f542 wsgi_load_source.lto_priv.0 (mod_wsgi_python3.so + 0x17542)
      #53 0x00007fd899e107ed wsgi_execute_script.lto_priv.0 (mod_wsgi_python3.so + 0x187ed)
      #54 0x00007fd899e1b0f6 wsgi_daemon_thread (mod_wsgi_python3.so + 0x230f6)
      #55 0x00007fd89ab52a87 start_thread (libc.so.6 + 0x8da87)
      #56 0x00007fd89abd7640 __clone3 (libc.so.6 + 0x112640)
    

    I see this in /var/log/httpd/ssl_error_log:

    [Sat Jan 01 05:17:21.248640 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
    [Sat Jan 01 05:17:21.249193 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
    [Sat Jan 01 05:17:21.249222 2022] [wsgi:error] [pid 257749:tid 257758]   File "/usr/lib64/python3.10/sqlite3/dbapi2.py", line
     83, in register_adapters_and_converters
    [Sat Jan 01 05:17:21.249453 2022] [wsgi:error] [pid 257749:tid 257758]     register_converter("timestamp", convert_timestamp)
    [Sat Jan 01 05:17:21.249469 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'timepart_full'
    [Sat Jan 01 05:17:21.249484 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
    [Sat Jan 01 05:17:21.249488 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
    [Sat Jan 01 05:17:21.249493 2022] [wsgi:error] [pid 257749:tid 257758]   File "/usr/lib64/python3.10/sqlite3/dbapi2.py", line
     83, in register_adapters_and_converters
    [Sat Jan 01 05:17:21.249572 2022] [wsgi:error] [pid 257749:tid 257758]     register_converter("timestamp", convert_timestamp)
    [Sat Jan 01 05:17:21.249582 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'timepart'
    [Sat Jan 01 05:17:21.249590 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
    [Sat Jan 01 05:17:21.249594 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
    [Sat Jan 01 05:17:21.249598 2022] [wsgi:error] [pid 257749:tid 257758]   File "/usr/lib64/python3.10/sqlite3/dbapi2.py", line
     83, in register_adapters_and_converters
    [Sat Jan 01 05:17:21.249667 2022] [wsgi:error] [pid 257749:tid 257758]     register_converter("timestamp", convert_timestamp)
    [Sat Jan 01 05:17:21.249676 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'datepart'
    [Sat Jan 01 05:17:21.249697 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
    [Sat Jan 01 05:17:21.249701 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
    [Sat Jan 01 05:17:21.249706 2022] [wsgi:error] [pid 257749:tid 257758]   File "<frozen importlib._bootstrap>", line 688, in _
    load_unlocked
    [Sat Jan 01 05:17:21.249804 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'convert_timestamp'
    [Sat Jan 01 05:17:21.249813 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
    [Sat Jan 01 05:17:21.249817 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
    [Sat Jan 01 05:17:21.249822 2022] [wsgi:error] [pid 257749:tid 257758]   File "<frozen importlib._bootstrap>", line 688, in _
    load_unlocked
    [Sat Jan 01 05:17:21.249889 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'convert_date'
    [Sat Jan 01 05:17:21.249898 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
    [Sat Jan 01 05:17:21.249901 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
    [Sat Jan 01 05:17:21.249906 2022] [wsgi:error] [pid 257749:tid 257758]   File "<frozen importlib._bootstrap>", line 688, in _
    load_unlocked
    [Sat Jan 01 05:17:21.249946 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'adapt_datetime'
    [Sat Jan 01 05:17:21.249971 2022] [wsgi:error] [pid 257749:tid 257758] Exception ignored deletion of interned string failed:
    [Sat Jan 01 05:17:21.249977 2022] [wsgi:error] [pid 257749:tid 257758] Traceback (most recent call last):
    [Sat Jan 01 05:17:21.249981 2022] [wsgi:error] [pid 257749:tid 257758]   File "<frozen importlib._bootstrap>", line 688, in _
    load_unlocked
    [Sat Jan 01 05:17:21.250021 2022] [wsgi:error] [pid 257749:tid 257758] KeyError: 'adapt_date'
    [Sat Jan 01 05:17:22.058701 2022] [wsgi:error] [pid 249217:tid 249327] [client 31.13.127.13:53220] Truncated or oversized response headers received from daemon process 'https_site': /home/site/dmnes-site/viewer.wsgi

    libpython is crashing httpd a few times a minute for me, and I definitely was not seeing this with Fedora 34. I have Python 3.10.1 on F35, and had 3.9.9 on F34.

    If there's any further information I can provide, I'd be happy to help.

    @graysky
    Copy link
    Mannequin Author

    graysky mannequin commented Jan 3, 2022

    In reply to the first comment here https://bugs.python.org/issue46070#msg408520 which affects several Kodi plugins, it seems that commenting out lines 80-84 in /usr/lib/python3.10/sqlite3/dbapi2.py "fixes" the bug in python 3.10.1. I do not know if that helps diagnose this future or not.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 4, 2022

    @vstinner
    Copy link
    Member

    vstinner commented Jan 5, 2022

    I applied PR 30123 of bpo-46006: "./python bug.py" does still crash. So bpo-46006 is unrelated to this issue.

    @vstinner
    Copy link
    Member

    New changeset 52937c2 by Victor Stinner in branch '3.9':
    bpo-46070: _PyGC_Fini() untracks objects (GH-30577) (GH-30580)
    52937c2

    @vstinner
    Copy link
    Member

    It would be nice to add some tests.

    @vstinner
    Copy link
    Member

    Victor:

    (*) I made the GC state per-interpreter: commit 7247407 (Nov 20, 2019)

    Eric Snow:

    FYI, this was done by me in an earlier comment which we ended up
    reverting. Later you basically un.reverted that.

    Well, I recall that your change had to be reverted 2 or 3 times because there were many crashes on FreeBSD, and no one understood why it crashed. The root cause was bugs related to the GIL and daemon threads. It took me a while (and multiple commits) to identify and fix all of them:
    https://vstinner.github.io/gil-bugfixes-daemon-threads-python39.html

    I decided to split your work into smaller changes to better debug these crashes. bpo-36854 contains a few changes, but these changes are based on work that I pushed earlier.

    For example, there was a tricky bug related to clearing a Python thread state:
    9da7430

    Also, once the GC was made per interpreter, we started to discover more and more tricky reference leaks:
    https://vstinner.github.io/subinterpreter-leaks.html

    I spent a significant time to reorder code of Py_Finalize() and Py_EndInterpreter() to clear objects earlier or in a different order. Recently, I made sure that the free lists can no longer be used after they are cleared. It took some notes at:
    https://pythondev.readthedocs.io/finalization.html

    One of the hardest fix was the commit 9ad58ac of bpo-19466. To make this change, first I had to fix a very old bug of PyThreadState_Clear() with commit 5804f87 (bpo-20526).

    Well, it was a long journey and it's not done yet :-)

    @vstinner
    Copy link
    Member

    pyobject_ob_interp.patch: Quick & dirty patch that I wrote to add PyObject.ob_interp, store in which interpreter an object has been created.

    @vstinner
    Copy link
    Member

    I created bpo-46368: "faulthandler: add the ability to dump all interpreters, not only the current interpreter".

    @prahal
    Copy link
    Mannequin

    prahal mannequin commented Mar 24, 2022

    sqlite3_crash.py does not crashes on python 3.9 below and equal 3.9.9 and python main branch 12c0012. I confirm it crashes on python 3.9.10, 3.9.11, 3.10 branch commit 9006b44 .
    It is fixed in main branch commit 12c0012 .

    Currently bisecting both 3.9.9 to 3.9.10 and 3.10 to 3.11 main branch for bad to good.

    The patches in this bug report are already merged in the 3.10 branch which crash.

    I cannot reproduce win_py399_crash_reproducer.py which I used as a basis for this test case.
    The backtrace is the same as the ones from the crashes of the kodi addons (me Jellyfin Kodi addon), which is the initial report .
    This looks like importing sqlite3 in threads plays badly.

    I can reproduce on aarch64 (Odroid C2) LibreElec and builds of cpython on Debian stable x86_64 (the extensive testing of the broken interpreters is done on x86_64 Debian stable bullseye with a cpython clone and running from builddir).

    @prahal
    Copy link
    Mannequin

    prahal mannequin commented Mar 24, 2022

    By "It is fixed in main branch commit 12c0012 ." I mean that this commit is good not that this commit fixes the issue.

    @prahal
    Copy link
    Mannequin

    prahal mannequin commented Mar 24, 2022

    I did 3.9 branch bisect and commit 52937c2 bpo-46070: _PyGC_Fini() untracks objects (GH-30577) (GH-30580) is the one that broke 3.9 . While with my main branch builds this commit is fine, it is not for the 3.9 branches.

    Proceeding with 3.10 branch bisect of first bad commit and redoing main branch first good commit.

    @prahal
    Copy link
    Mannequin

    prahal mannequin commented Mar 24, 2022

    3.10 branch is fixed if I revert e6bb17f bpo-46070: _PyGC_Fini() untracks objects (GH-30577). Be it if I revert it atop current head 9006b44 or if I test the commit before e6bb17f was merged.

    As this made no sense with regards to this bug report history that this fix broke the branch, I tried v3.10.1 which lacks this fix. Vanilla is broken too. Also applying the "_PyGC_Fini untracks objects" upon 3.10.1 does not fix the test case crash.

    I am puzzled. Will try to bisect the commit that fixed the testcase in the 3.10 branch before "_PyGC_Fini untracks objects" was merged and after 3.10.1.

    @prahal
    Copy link
    Mannequin

    prahal mannequin commented Mar 24, 2022

    While bisecting main branch I did not only get segfaults but also exceptions, namely:

    $ ./python  ../python-crash-kodi/sqlite3_crash.py 
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 81, in register_adapters_and_converters
        register_adapter(datetime.datetime, adapt_datetime)
    KeyError: 'isoformat'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
        register_converter("timestamp", convert_timestamp)
    KeyError: 'timepart_full'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
        register_converter("timestamp", convert_timestamp)
    KeyError: 'day'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
        register_converter("timestamp", convert_timestamp)
    KeyError: 'month'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
        register_converter("timestamp", convert_timestamp)
    KeyError: 'year'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
        register_converter("timestamp", convert_timestamp)
    KeyError: 'timepart'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "/home/prahal/Projects/WIP/libreelec/cpython_bisect/Lib/sqlite3/dbapi2.py", line 83, in register_adapters_and_converters
        register_converter("timestamp", convert_timestamp)
    KeyError: 'datepart'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
    KeyError: 'convert_timestamp'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
    KeyError: 'convert_date'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
    KeyError: 'adapt_datetime'
    Exception ignored deletion of interned string failed:
    Traceback (most recent call last):
      File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
    KeyError: 'adapt_date'

    The 3.10 branch bisect pointed to one commit that fix the crash after 3.10.1 which is 72c260c [3.10] bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422) (GH-30425) which makes sense regarding main branch logs. It is commit
    35d6540 in the main branch.

    What remains to be seen is why "bpo-46070: _PyGC_Fini() untracks objects (GH-30577)" looks fine in the main branch (though it has no effect on the import crash) and not in 3.9 and 3.10 branch.
    Mind in the main branch "bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422)" was already applied when "bpo-46070: _PyGC_Fini() untracks objects (GH-30577)" went in so it was also unrelated to the fix of the initial report.

    @prahal
    Copy link
    Mannequin

    prahal mannequin commented Mar 24, 2022

    bisect of main for bug.py with each local python builds get me to commit b127e70 "bpo-46070: Fix asyncio initialization guard (GH-30423)" as the one that fixed bug.py most of the time.

    At times I can make bug.py segfault be it on python 3.9, 3.10 or main branch. It is pretty hard (I can have a batch of 200 runs without an issue) but seems easier to reproduce with a CPU stressed, then I can have two segfaults in a batch of 50 runs.

    Bash:
    for i in {1..50}; do ./python ../python-crash-kodi/bug.py ; done
    or sh:
    for i in seq 1 50; do ./python ../python-crash-kodi/bug.py ; done

    with:
    stress -c nproc --all at the same time.

    @prahal
    Copy link
    Mannequin

    prahal mannequin commented Mar 24, 2022

    I managed to reproduce the bug.py crash with main branch up to commit 12c0012 .

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @ezio-melotti ezio-melotti moved this to Todo in Subinterpreters Apr 15, 2022
    @vstinner
    Copy link
    Member

    So far, only released Python v3.11.0a4 include my "_PyGC_Fini() untracks objects" fix. The 3.9 and 3.10 branch include my fix, but there is no release including it yet.

    Scheduled releases:

    I managed to reproduce the bug.py crash with main branch up to commit 12c0012 .

    Oh really? This version include my fix. I cannot reproduce it on the current main branch (commit ea2f5bc) nor at commit 12c0012. Which commands did you type to build Python and reproduce the crash? What is your OS and C compiler?

    @vstinner
    Copy link
    Member

    The original issue is fixed. Please open a new issue for sqlite3_crash.py.

    @vstinner
    Copy link
    Member

    I created #92036 for sqlite3_crash.py bug.

    @erlend-aasland
    Copy link
    Contributor

    Should this be closed then?

    @vstinner
    Copy link
    Member

    I kept this issue open for this reason:

    It would be nice to add some tests.

    @vstinner
    Copy link
    Member

    My notes about this bug: https://twitter.com/VictorStinner/status/1481783100191428611

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-regex topic-subinterpreters type-bug An unexpected behavior, bug, or error
    Projects
    Status: Todo
    Development

    No branches or pull requests

    7 participants