Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Refleak in test_import #102251

Closed
ericsnowcurrently opened this issue Feb 25, 2023 · 9 comments
Closed

Fix Refleak in test_import #102251

ericsnowcurrently opened this issue Feb 25, 2023 · 9 comments
Labels
3.12 bugs and security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@ericsnowcurrently
Copy link
Member

ericsnowcurrently commented Feb 25, 2023

Commit 096d009 (for gh-101758) resulted in refleak failures. It's unclear if the leak is new or if the test simply exposed it. In 984f8ab, I skipped the leaking tests until we can fix the leak.

UPDATE: I've updates the tests to only skip when checking for refleaks. I've also added a few more tests.


Context

(expand)
(Also see gh-101758.)

Loading an extension module involves meaningfully different code paths depending on the content of the PyModuleDef. There's the difference between legacy (single-phase-init, PEP 3121) and modern (multi-phase-init, PEP 489) modules. For single-phase init, there are those that support being loaded more than once (m_size >= 0) and those that can't (m_size == -1). I've added several long comments in import.c explaining about single-phase init modules in detail. I also added some tests to verify the behavior of the single-phase-init cases more thoroughly. Those are the leaking tests.

Relevant state:

  • PyModuleDef.m_size - the size of the module's per-interpreter state (PyModuleObject.md_state)
    • single-phase init and multi-phase init modules can have a value of 0 or greater
    • such modules must not have any process-global state
    • only single-phase init modules can have a value of -1, which means the module does not support being loaded more than once
    • such modules may have process-global state
  • PyModuleDef.m_base.m_index - the index into PyInterpreterState.imports.modules_by_index (same index used for every interpreter)
    • set by PyModuleDef_Init() (e.g. when the module is created)
  • PyModuleDef.m_base.m_copy - a copy of the __dict__ of the last time a module was loaded using this def (only single-phase init where m_size is -1)
    • set exclusively by fix_up_extension()
    • used exclusively by import_find_extension()
    • cleared by (replaced in) fix_up_extension() if m_copy was already set (e.g. multiple interpreters, multiple modules in same file using same def)
    • cleared by _PyImport_ClearModulesByIndex() during interpreter finalization
  • _PyRuntime.imports.extensions - a dict mapping (filename, name) to PyModuleDef
    • entry set exclusively by fix_up_extension() (only for single-phase init modules)
    • entry used exclusively by import_find_extension()
    • entry cleared by _PyImport_Fini() at runtime finalization
  • interp->imports.modules_by_index - a list of single-phase-init modules loaded in this interpreter; lookups (e.g. PyState_FindModule()) use PyModuleDef.m_base.m_index
    • entry set normally only by fix_up_extension() and import_find_extension() (only single-phase init modules)
    • (entry also set by PyState_AddModule())
    • used exclusively by PyState_FindModule()
    • entry cleared normally by _PyImport_ClearModulesByIndex() during interpreter finalization
    • (entry also cleared by PyState_RemoveModule())

Code path when loading a single-phase-init module for the first time (in import.c, unless otherwise noted):

  • imp.load_dynamic()
    • importlib._boostrap._load() (using a spec with ExtensionFileLoader)
      • ExtensionFileLoader.create_module() (in _boostrap_external.py)
        • _imp.create_dynamic() (_imp_create_dynamic_impl())
          • import_find_extension() (not found in _PyRuntime.imports.extensions)
          • _PyImport_LoadDynamicModuleWithSpec() (importdl.c)
            • _PyImport_FindSharedFuncptr()
            • <module init func from loaded binary>()
              • sets process-global state (only where m_size == -1)
              • PyModule_Create()
              • populates per-interpreter module state (only where m_size > 0)
              • sets module attrs (on __dict__)
            • sets def->m_base.m_init (only needed for single-phase-init where m_size >=0)
            • _PyImport_FixupExtensionObject()
              • sets sys.modules[spec.name]
              • fix_up_extension()
                • set interp->imports.modules_by_index[def->m_base.m_index]
                • clear def->m_base.m_copy (only if set and only if m_size == -1)
                • set def->m_base.m_copy to a copy of the module's __dict__ (only if m_size == -1)
                • set _PyRuntime.imports.extensions[(filename, name)]
      • sets missing module attrs (e.g. __file__)

During testing we use a helper to erase (nearly) any evidence of the module having been imported before. That means clearing the state described above.

Here's the code path:

  • _testinternalcapi.clear_extension() (Modules/_testinternalcapi.c)
    • _PyImport_ClearExtension()
      • clear_singlephase_extension()
        • (only if there's a _PyRuntime.imports.extensions entry)
          • clear the module def's m_copy
          • replace the interp->imports.modules_by_index entry with Py_None
          • delete the _PyRuntime.imports.extensions entry

Linked PRs

@ericsnowcurrently ericsnowcurrently added type-feature A feature request or enhancement extension-modules C modules in the Modules dir 3.12 bugs and security fixes labels Feb 25, 2023
@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented Feb 25, 2023

FWIW, I've spent quite a bit of time over the last week trying to find the leak, without success.

The following test method simplifies things to make it easier to chase the leak:

    def test_singlephase_cleared_globals(self):
        name = '_testsinglephase'
        fileobj, filename, _ = imp.find_module(name)
        fileobj.close()

        # Load the module.
        mod = imp.load_dynamic(name, filename)

        # Clear the module's process-global state.
        mod._clear_globals()
        self.assertEqual(mod.initialized(), 0)
        self.assertEqual(mod.initialized_count(), -1)

        # Clean up all traces of the module in the import state.
        del sys.modules[name]
        _testinternalcapi.clear_extension(name, filename)

Run:

  • ./python -m test test_imp -m test_singlephase_cleared_globals
  • ./python -m test test_imp -m test_singlephase_cleared_globals -R 3:3

(UPDATE: That's basically what I later added in test_imp.SinglephaseInitTests.test_cleared_globals,
so ./python -m test test_imp -m test_cleared_globals -R 3:3.)

Observations:

  • that test passes but leaks 1 object
  • if you comment out the last line (_testinternalcapi.clear_extension()) then there is no leak (the module is actually loaded the first refleak iteration and then uses the module def's m_copy for the other 5 iterations)
  • you get that same outcome if you make clear_singlephase_extension() (in import.c) a noop or even return right before Py_CLEAR(def->m_base.m_copy);
  • that outcome implies that the leak is mostly likely something that isn't getting cleaned up in clear_singlephase_extension(), either directly causing the leak or leading to a different code path

The following demonstrates that the leak does not exist between _testinternalcapi.clear_extension() and clear_singlephase_extension() (in import.c):

    def test_singlephase_not_loaded(self):
        name = '_testsinglephase'
        fileobj, filename, _ = imp.find_module(name)
        fileobj.close()
        _testinternalcapi.clear_extension(name, filename)

@ericsnowcurrently ericsnowcurrently self-assigned this Feb 25, 2023
ericsnowcurrently added a commit that referenced this issue Feb 27, 2023
This is related to fixing the refleaks introduced by commit 096d009.  I haven't been able to find the leak yet, but these changes are a consequence of that effort.  This includes some cleanup, some tweaks to the existing tests, and a bunch of new test cases.  The only change here that might have impact outside the tests in question is in imp.py, where I update imp.load_dynamic() to use spec_from_file_location() instead of creating a ModuleSpec directly.

Also note that I've updated the tests to only skip if we're checking for refleaks (regrtest's --huntrleaks), whereas in gh-101969 I had skipped the tests entirely.  The tests will be useful for some upcoming work and I'd rather the refleaks not hold that up.  (It isn't clear how quickly we'll be able to fix the leaking code, though it will certainly be done in the short term.)

#102251
@ericsnowcurrently
Copy link
Member Author

I'm unassigning myself, but we'll make sure this gets addressed within the next few weeks.

@ericsnowcurrently ericsnowcurrently removed their assignment Feb 27, 2023
@ericsnowcurrently ericsnowcurrently moved this to In Progress in Fancy CPython Board Feb 27, 2023
@ericsnowcurrently ericsnowcurrently moved this from In Progress to Todo in Fancy CPython Board Feb 27, 2023
@warsaw
Copy link
Member

warsaw commented Feb 27, 2023

Just be aware that I am working on removing test_imp.py as part of the removal of import related deprecations ticket.

carljm added a commit to carljm/cpython that referenced this issue Feb 28, 2023
* main: (67 commits)
  pythongh-99108: Add missing md5/sha1 defines to Modules/Setup (python#102308)
  pythongh-100227: Move _str_replace_inf to PyInterpreterState (pythongh-102333)
  pythongh-100227: Move the dtoa State to PyInterpreterState (pythongh-102331)
  pythonGH-102305: Expand some macros in generated_cases.c.h (python#102309)
  Migrate to new PSF mailgun account (python#102284)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (in Python/) (python#102193)
  pythonGH-90744: Fix erroneous doc links in the sys module (python#101319)
  pythongh-87092: Make jump target label equal to the offset of the target in the instructions sequence (python#102093)
  pythongh-101101: Unstable C API tier (PEP 689) (pythonGH-101102)
  IDLE: Simplify DynOptionsMenu __init__code (python#101371)
  pythongh-101561: Add typing.override decorator (python#101564)
  pythongh-101825: Clarify that as_integer_ratio() output is always normalized (python#101843)
  pythongh-101773: Optimize creation of Fractions in private methods (python#101780)
  pythongh-102251: Updates to test_imp Toward Fixing Some Refleaks (pythongh-102254)
  pythongh-102296 Document that inspect.Parameter kinds support ordering (pythonGH-102297)
  pythongh-102250: Fix double-decref in COMPARE_AND_BRANCH error case (pythonGH-102287)
  pythongh-101100: Fix sphinx warnings in `types` module (python#102274)
  pythongh-91038: Change default argument value to `False` instead of `0` (python#31621)
  pythongh-101765: unicodeobject: use Py_XDECREF correctly (python#102283)
  [doc] Improve grammar/fix missing word (pythonGH-102060)
  ...
git-x-modules bot pushed a commit to Lamparter/Fluid.Runtime.Library that referenced this issue Mar 4, 2023
This is related to fixing the refleaks introduced by commit 096d009.  I haven't been able to find the leak yet, but these changes are a consequence of that effort.  This includes some cleanup, some tweaks to the existing tests, and a bunch of new test cases.  The only change here that might have impact outside the tests in question is in imp.py, where I update imp.load_dynamic() to use spec_from_file_location() instead of creating a ModuleSpec directly.

Also note that I've updated the tests to only skip if we're checking for refleaks (regrtest's --huntrleaks), whereas in gh-101969 I had skipped the tests entirely.  The tests will be useful for some upcoming work and I'd rather the refleaks not hold that up.  (It isn't clear how quickly we'll be able to fix the leaking code, though it will certainly be done in the short term.)

python/cpython#102251
@ericsnowcurrently ericsnowcurrently changed the title Fix Refleak in test_imp Fix Refleak in test_import May 1, 2023
@JelleZijlstra
Copy link
Member

@sunmy2019 has some analysis of these refleaks in #103879 (comment) and nearby comments.

@sunmy2019
Copy link
Member

I opened a PR for the ref leaks.

sunmy2019 added a commit to sunmy2019/cpython that referenced this issue May 24, 2023
erlend-aasland added a commit that referenced this issue May 30, 2023
…5082)

Correctly decref 'initialized' in init_module()

Co-authored-by: Erlend E. Aasland <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 30, 2023
pythonGH-105082)

Correctly decref 'initialized' in init_module()

(cherry picked from commit d14eb34)

Co-authored-by: sunmy2019 <[email protected]>
Co-authored-by: Erlend E. Aasland <[email protected]>
erlend-aasland added a commit that referenced this issue May 30, 2023
…on (GH-105082) (#105083)

Correctly decref 'initialized' in init_module()

(cherry picked from commit d14eb34)

Co-authored-by: sunmy2019 <[email protected]>
Co-authored-by: Erlend E. Aasland <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 31, 2023
… test_import (pythonGH-105085)

(cherry picked from commit a99b9d9)

Co-authored-by: sunmy2019 <[email protected]>
Co-authored-by: Erlend E. Aasland <[email protected]>
erlend-aasland added a commit that referenced this issue May 31, 2023
erlend-aasland added a commit that referenced this issue May 31, 2023
…n test_import (GH-105085) (#105170)

(cherry picked from commit a99b9d9)

Co-authored-by: sunmy2019 <[email protected]>
Co-authored-by: Erlend E. Aasland <[email protected]>
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Jun 23, 2023
test_basic_multiple_interpreters_main_no_reset() does not support rerunning
@ericsnowcurrently
Copy link
Member Author

Cc @vstinner

@vstinner
Copy link
Member

On Python 3.12, test_import.test_concurrency() leaks on Windows:

vstinner@WIN C:\victor\python\3.12>python -m test test_import -m test_concurrency -R 3:3
(...)
test_import leaked [22, 36, 28] references, sum=86

@sunmy2019
Copy link
Member

On Python 3.12, test_import.test_concurrency() leaks on Windows:

vstinner@WIN C:\victor\python\3.12>python -m test test_import -m test_concurrency -R 3:3
(...)
test_import leaked [22, 36, 28] references, sum=86

I think this is an essentially new issue. See #104796 (comment)

I would suggest opening a new one and closing this.

@vstinner
Copy link
Member

@sunmy2019 created issue #106176.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 18, 2023
Yhg1s pushed a commit that referenced this issue Sep 18, 2023
…) (#109540)

gh-102251: Disable non-rerunnable test in test_import (GH-106013)
(cherry picked from commit 4849a80)

Co-authored-by: Erlend E. Aasland <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants