PEP: 547
Title: Running extension modules using the -m option
Version:
Petr Viktorin <[email protected]>
Status: Deferred Type: Standards Track Content-Type: text/x-rst Created: 25-May-2017 Python-Version: 3.7 Post-History:
Cython -- the most important use case for this PEP and the only explicit one -- is not ready for multi-phase initialization yet. It keeps global state in C-level static variables. See discussion at Cython issue 1923.
The PEP is deferred until the situation changes.
This PEP proposes implementation that allows built-in and extension
modules to be executed in the __main__
namespace using
the PEP 489 multi-phase initialization.
With this, a multi-phase initialization enabled module can be run using following command:
$ python3 -m _testmultiphase This is a test module named __main__.
Currently, extension modules do not support all functionality of
Python source modules.
Specifically, it is not possible to run extension modules as scripts using
Python's -m
option.
The technical groundwork to make this possible has been done for PEP 489,
and enabling the -m
option is listed in that PEP's
“Possible Future Extensions” section.
Technically, the additional changes proposed here are relatively small.
Extension modules' lack of support for the -m
option has traditionally
been worked around by providing a Python wrapper.
For example, the _pickle
module's command line interface is in the
pure-Python pickle
module (along with a pure-Python reimplementation).
This works well for standard library modules, as building command line interfaces using the C API is cumbersome. However, other users may want to create executable extension modules directly.
An important use case is Cython, a Python-like language that compiles to
C extension modules.
Cython is a (near) superset of Python, meaning that compiling a Python module
with Cython will typically not change the module's functionality, allowing
Cython-specific features to be added gradually.
This PEP will allow Cython extension modules to behave the same as their Python
counterparts when run using the -m
option.
Cython developers consider the feature worth implementing (see
Cython issue 1715).
Python's -m
option is handled by the function
runpy._run_module_as_main
.
The module specified by -m
is not imported normally.
Instead, it is executed in the namespace of the __main__
module,
which is created quite early in interpreter initialization.
For Python source modules, running in another module's namespace is not
a problem: the code is executed with locals
and globals
set to the
existing module's __dict__
.
This is not the case for extension modules, whose PyInit_*
entry point
traditionally both created a new module object (using PyModule_Create
),
and initialized it.
Since Python 3.5, extension modules can use PEP 489 multi-phase initialization.
In this scenario, the PyInit_*
entry point returns a PyModuleDef
structure: a description of how the module should be created and initialized.
The extension can choose to customize creation of the module object using
the Py_mod_create
callback, or opt to use a normal module object by not
specifying Py_mod_create
.
Another callback, Py_mod_exec
, is then called to initialize the module
object, e.g. by populating it with methods and classes.
Multi-phase initialization makes it possible to execute an extension module in
another module's namespace: if a Py_mod_create
callback is not specified,
the __main__
module can be passed to the Py_mod_exec
callback to be
initialized, as if __main__
was a freshly constructed module object.
One complication in this scheme is C-level module state.
Each module has a md_state
pointer that points to a region of memory
allocated when an extension module is created.
The PyModuleDef
specifies how much memory is to be allocated.
The implementation must take care that md_state
memory is allocated at most
once.
Also, the Py_mod_exec
callback should only be called once per module.
The implications of multiply-initialized modules are too subtle to require
expecting extension authors to reason about them.
The md_state
pointer itself will serve as a guard: allocating the memory
and calling Py_mod_exec
will always be done together, and initializing an
extension module will fail if md_state
is already non-NULL.
Since the __main__
module is not created as an extension module,
its md_state
is normally NULL
.
Before initializing an extension module in __main__
's context, its module
state will be allocated according to the PyModuleDef
of that module.
While PEP 489 was designed to make these changes generally possible,
it's necessary to decouple module discovery, creation, and initialization
steps for extension modules, so that another module can be used instead of
a newly initialized one, and the functionality needs to be added to
runpy
and importlib
.
A new optional method for importlib loaders will be added.
This method will be called exec_in_module
and will take two
positional arguments: module spec and an already existing module.
Any import-related attributes, such as __spec__
or __name__
,
already set on the module will be ignored.
The runpy._run_module_as_main
function will look for this new
loader method.
If it is present, runpy
will execute it instead of trying to load and
run the module's Python code.
Otherwise, runpy
will act as before.
importlib's ExtensionFileLoader
will get an implementation of
exec_in_module
that will call a new function, _imp.exec_in_module
.
_imp.exec_in_module
will use existing machinery to find and call an
extension module's PyInit_*
function.
The PyInit_*
function can return either a fully initialized module
(single-phase initialization) or a PyModuleDef
(for PEP 489 multi-phase
initialization).
In the single-phase initialization case, _imp.exec_in_module
will raise
ImportError
.
In the multi-phase initialization case, the PyModuleDef
and the module to
be initialized will be passed to a new function, PyModule_ExecInModule
.
This function raises ImportError
if the PyModuleDef
specifies
a Py_mod_create
slot, or if the module has already been initialized
(i.e. its md_state
pointer is not NULL
).
Otherwise, the function will initialize the module according to the
PyModuleDef
.
This PEP maintains backwards compatibility.
It only adds new functions, and a new loader method that is added for
a loader that previously did not support running modules as __main__
.
The reference implementation of this PEP is available at GitHub.
This document has been placed in the public domain.