-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate PyModuleDef to Each Interpreter for Extension/Builtin Modules #101758
Comments
Well, this is the main reason why single-phase-init modules don't support multiple interpreters well. |
This change is almost entirely moving code around and hiding import state behind internal API. We introduce no changes to behavior, nor to non-internal API. (Since there was already going to be a lot of churn, I took this as an opportunity to re-organize import.c into topically-grouped sections of code.) The motivation is to simplify a number of upcoming changes. Specific changes: * move existing import-related code to import.c, wherever possible * add internal API for interacting with import state (both global and per-interpreter) * use only API outside of import.c (to limit churn there when changing the location, etc.) * consolidate the import-related state of PyInterpreterState into a single struct field (this changes layout slightly) * add macros for import state in import.c (to simplify changing the location) * group code in import.c into sections *remove _PyState_AddModule() #101758
#101919 removed |
Or can we be absolutely sure no stable ABI extension since Python 3.2 could use it? |
tl;dr I'm fine with adding it back (and will do so today). Thanks for bringing this up! In my mind, it's exceptionally unlikely that anyone is using the function, but obviously not impossible. (more explanation below) Also, I left this footnote on the PR:
("6 years ago" was in time for the 3.6 release, meaning the function was exposed as technically public API from 3.2 to 3.5.) Let me also explain why it's unlikely anyone is using the function. It's an undocumented "private" function corresponding to a similarly named documented public function, where the only differences are that I'll also point out that PEP 384 says "All functions starting with _Py are not available to applications.", which I'm guessing is why @serhiy-storchaka moved All that said, while there's a limit to how conservative we can be with the stable ABI, from a practicality perspective, in this case there really is no motivation to remove this function from the stable ABI other than to clean up. So I don't mind adding it back. I'll do that today. |
We're adding the function back, only for the stable ABI symbol and not as any form of API. I had removed it yesterday. This undocumented "private" function was added with the implementation for PEP 3121 (3.0, 2007) for internal use and later moved out of the limited API (3.6, 2016) and then into the internal API (3.9, 2019). I removed it completely yesterday, including from the stable ABI manifest (where it was added because the symbol happened to be exported). It's unlikely that anyone is using _PyState_AddModule(), especially any stable ABI extensions built against 3.2-3.5, but we're playing it safe. #101758
Are the |
Yeah. The failures should be fixed now though. |
To be clear, I've skipped the leaky/broken tests in test_imp. If there are other refleak failures, they should not be related to the work I've been doing. |
Thank you!
I do think it's possible that nothing uses this, and maybe we can even prove nothing does, but proving it is harder than keeping the function. Anyway, here's a plausible breaking scenario: something like a language binding gets the list of exposed ABI symbols, and re-exports all of them, regardless of whether they're usable. Removing any symbol will break them.
Yeah, but the trouble is that if they were called e.g. by a public macro, they need to stay in the ABI. Who knows how it's been used over the years -- it's possible to find out, but the archaeology is usually more trouble than keeping the function in. For
FWIW, in cases like these, turning the function into a stup that always raises is acceptable (as a last resort of course), since that doesn't technically break the ABI. (It would break specific code paths that the new CPython presumably can't support, rather than the entire extension being not importable due to a missing symbol.) |
Did you meant Note that the current In 3.2:
Current:
I am sure |
There really isn't much we can do to solve this, so I'm closing the issue. |
Typically each
PyModuleDef
for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a variety of reasons). Isolating eachPyModuleDef
is worth doing, especially if you consider we've already run into problems1 because ofm_copy
.The main focus here is on
PyModuleDef.m_base.m_copy
2 specifically. It's the state that facilitates importing legacy (single-phase init) extension/builtin modules that do not support repeated initialization3 (likely the vast majority).(expand for more context)
PyModuleDef
for an extension/builtin module is usually stored in a static variable and (with immortal objects, see gh-101755) is mostly immutable. The exception ism_copy
, which is problematic in some cases for modules imported in multiple interpreters.Note that
m_copy
is only relevant for legacy (single-phase init) modules, whether builtin and an extension, and only if the module does not support repeated initialization3. It is never relevant for multi-phase init (PEP 489) modules.m_copy
is only set by_PyImport_FixupExtensionObject()
(and thus indirectly_PyImport_FixupBuiltin()
and_imp.create_builtin()
)_PyImport_FixupExtensionObject() is called by
_PyImport_LoadDynamicModuleWithSpec()` when a legacy (single-phase init) extension module is loadedm_copy
is only used inimport_find_extension()
, which is only called by_imp.create_builtin()
and_imp.create_dynamic()
(via the respective importers)When such a legacy module is imported for the first time,
m_copy
is set to a new copy of the just-imported module's__dict__
, which is "owned" by the current interpreter (the one importing the module). Whenever the module is loaded again (e.g. reloaded or deleted fromsys.modules
and then imported), a new empty module is created andm_copy
is [shallow] copied into that object's__dict__
.When
m_copy
is originally initialized, normally that will be the first time the module is imported. However, that code can be triggered multiple times for that module if it is imported under a different name (an unlikely case but apparently a real one). In that case them_copy
from the previous import is replaced with the new one right after it is released (decref'ed). This isn't the ideal approach but it's also been the behavior for quite a while.The tricky problem here is that the same code is triggered for each interpreter that imports the legacy module. Things are fine when a module is imported for the first time in any interpreter. However, currently, any subsequent import of that module in another interpreter will trigger that replacing code. The second interpreter decref's the old
m_copy
, but that object is "owned" by the first interpreter. This is a problem1.Furthermore, even if the decref-in-the-wrong-interpreter problem was gone. When
m_copy
is copied into the new module's__dict__
on subsequent imports, it's only a shallow copy. Thus such a legacy module, imported in other interpreters than the first one, would end up with its__dict__
filled with objects not owned by the correct interpreter.Here are some possible approaches to isolating each module's
PyModuleDef
to the interpreter that imports it:PyModuleDef
for each interpreter (would_PyRuntimeState.imports.extensions
need to move to the interpreter?)m_copy
for/on each interpreter_PyImport_FixupExtensionObject()
some other way...Linked PRs
Footnotes
see https://github.com/python/cpython/pull/101660#issuecomment-1424507393 ↩ ↩2
We should probably consider isolating
PyModuleDef.m_base.m_index
, but for now we simply sync themodules_by_index
list of each interpreter. (Also,modules_by_index
andm_index
are only used for single-phase init modules.) ↩specifically
def->m_size == -1
; multi-phase init modules always havedef->m_size >= 0
; single-phase init modules can also have a non-negativem_size
↩ ↩2The text was updated successfully, but these errors were encountered: