Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP about interpreter isolation #77

Closed
ericsnowcurrently opened this issue Nov 30, 2021 · 6 comments
Closed

PEP about interpreter isolation #77

ericsnowcurrently opened this issue Nov 30, 2021 · 6 comments
Assignees

Comments

@ericsnowcurrently
Copy link
Owner

ericsnowcurrently commented Nov 30, 2021

At this point it makes sense to have a PEP about interpreter isolation. This will be a companion to PEP 554. The PEP will cover the following:

  • objective
    • enable multi-core parallelism for Python code (incl. per-interpreter GIL)
    • concurrency model (see PEP 554)
  • strategy: make static globals per-interpreter
    • move globals to PyInterpreterState (or module state)
      • incl. PyInterpreterState.global_objects (Include/internal/pycore_global_objects.h)
      • share pointers in main interpreter with subinterpreters
    • tool to identify globals (& test to verify)
  • compatibility (fully compatible)
  • maintenance burden impact (more maintainable?)
  • performance impact
  • C-API obstacles
    • N objects exposed in public C-API
      • N exception types (PyObject * variable)
      • N other types (PyTypeObject variable, not pointer)
      • 5 singletons (macro to address of PyObject variable, not pointer)
    • most are in limited API (N exceptions, M other types, 5 singletons)
    • solution A:
      • add per-interpreter lookup functions to C-API
      • replace objects with calls to the per-interpreter lookup functions
      • stop exposing the objects directly in Include/*.h; keep using them for the main interpreter
      • non-pointer objects require trickiness
      • limited API (< 3.11)
        • keep exporting all existing symbols from object files (used for main interpreter too)
        • disallow such extensions in subinterpreters
    • solution B:
      • make all the C-API objects "immortal"
    • solution C:
      • (too much work and too fragile)
      • for the main interpreter use the existing objects as-is
      • for subinterpreters, do a lookup using the existing objects as keys into a per-interpreter mapping
      • requires that every C-API function possibly taking one of the objects be updated to do that lookup on its args
  • extension modules
    • concerns & impact
    • mitigation strategy
    • assistance

draft
PEP: NNN
Title: Isolating Multiple Interpreters in a Process, including the GIL
Author: Eric Snow <[email protected]>
BDFL-Delegate: ...
Status: Draft
Type: Standards Track ???
Content-Type: text/x-rst
Created: DD-MMM-2021
Python-Version: 3.11
Post-History: DD-MMM-2021


Abstract
========

CPython has supported multiple interpreters in the same process (AKA
"subinterpreters") since version 1.5 (1997).  The feature has been
available via the C-API. [c-api]_  PEP 554 discusses some of the value of
subinterpreters and the merits of exposing them to Python code.
However, that PEP purposefully avoids discussion about isolation,
especially related to the GIL.  This PEP fills that role.

The more isolation there is between interpreters, the more value they
can offer.  Currently subinterpreters operate in
`relative isolation from one another <Interpreter Isolation_>`_.  If they
were fully isolated then they could operate in parallel on multi-core
hosts.

This proposal identifies a path forward to reach full isolation between
interpreters.  This includes making the GIL per-interpreter.  


Proposal
========

TBD

Rationale
=========

TBD

Concerns
--------

TBD


About Subinterpreters
=====================

(copied from PEP 554, needs editing)

Concurrency
-----------

Concurrency is a challenging area of software development.  Decades of
research and practice have led to a wide variety of concurrency models,
each with different goals.  Most center on correctness and usability.

One class of concurrency models focuses on isolated threads of
execution that interoperate through some message passing scheme.  A
notable example is `Communicating Sequential Processes`_ (CSP) (upon
which Go's concurrency is roughly based).  The isolation inherent to
subinterpreters makes them well-suited to this approach.

Shared data
-----------

Subinterpreters are inherently isolated (with caveats explained below),
in contrast to threads.  So the same communicate-via-shared-memory
approach doesn't work.  Without an alternative, effective use of
concurrency via subinterpreters is significantly limited.

The key challenge here is that sharing objects between interpreters
faces complexity due to various constraints on object ownership,
visibility, and mutability.  At a conceptual level it's easier to
reason about concurrency when objects only exist in one interpreter
at a time.  At a technical level, CPython's current memory model
limits how Python *objects* may be shared safely between interpreters;
effectively objects are bound to the interpreter in which they were
created.  Furthermore, the complexity of *object* sharing increases as
subinterpreters become more isolated, e.g. after GIL removal.

Consequently,the mechanism for sharing needs to be carefully considered.
There are a number of valid solutions, several of which may be
appropriate to support in Python.  This proposal provides a single basic
solution: "channels".  Ultimately, any other solution will look similar
to the proposed one, which will set the precedent.  Note that the
implementation of ``Interpreter.run()`` will be done in a way that
allows for multiple solutions to coexist, but doing so is not
technically a part of the proposal here.

Regarding the proposed solution, "channels", it is a basic, opt-in data
sharing mechanism that draws inspiration from pipes, queues, and CSP's
channels. [fifo]_

As simply described earlier by the API summary,
channels have two operations: send and receive.  A key characteristic
of those operations is that channels transmit data derived from Python
objects rather than the objects themselves.  When objects are sent,
their data is extracted.  When the "object" is received in the other
interpreter, the data is converted back into an object owned by that
interpreter.

To make this work, the mutable shared state will be managed by the
Python runtime, not by any of the interpreters.  Initially we will
support only one type of objects for shared state: the channels provided
by ``create_channel()``.  Channels, in turn, will carefully manage
passing objects between interpreters.

This approach, including keeping the API minimal, helps us avoid further
exposing any underlying complexity to Python users.  Along those same
lines, we will initially restrict the types that may be passed through
channels to the following:

* None
* bytes
* str
* int
* channels

Limiting the initial shareable types is a practical matter, reducing
the potential complexity of the initial implementation.  There are a
number of strategies we may pursue in the future to expand supported
objects and object sharing strategies.

Interpreter Isolation
---------------------

CPython's interpreters are intended to be strictly isolated from each
other.  Each interpreter has its own copy of all modules, classes,
functions, and variables.  The same applies to state in C, including in
extension modules.  The CPython C-API docs explain more. [caveats]_

However, there are ways in which interpreters share some state.  First
of all, some process-global state remains shared:

* file descriptors
* builtin types (e.g. dict, bytes)
* singletons (e.g. None)
* underlying static module data (e.g. functions) for
  builtin/extension/frozen modules

There are no plans to change this.

Second, some isolation is faulty due to bugs or implementations that did
not take subinterpreters into account.  This includes things like
extension modules that rely on C globals. [cryptography]_  In these
cases bugs should be opened (some are already):

* readline module hook functions (http://bugs.python.org/issue4202)
* memory leaks on re-init (http://bugs.python.org/issue21387)

Finally, some potential isolation is missing due to the current design
of CPython.  Improvements are currently going on to address gaps in this
area:

* GC is not run per-interpreter [global-gc]_
* at-exit handlers are not run per-interpreter [global-atexit]_
* extensions using the ``PyGILState_*`` API are incompatible [gilstate]_
* interpreters share memory management (e.g. allocators, gc)
* interpreters share the GIL

Existing Usage
--------------

Subinterpreters are not a widely used feature.  In fact, the only
documented cases of widespread usage are
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_,
`OpenStack Ceph <https://github.com/ceph/ceph/pull/14971>`_, and
`JEP <https://github.com/ninia/jep>`_.  On the one hand, these cases
provide confidence that existing subinterpreter support is relatively
stable.  On the other hand, there isn't much of a sample size from which
to judge the utility of the feature.


Alternate Python Implementations
================================

(not affected?  this is CPython-only)


Interpreter "Isolated" Mode
===========================

(copied from PEP 554, needs editing)

By default, every new interpreter created by ``interpreters.create()``
has specific restrictions on any code it runs.  This includes the
following:

* importing an extension module fails if it does not implement the
  PEP 489 API
* new threads of any kind are not allowed
* ``os.fork()`` is not allowed (so no ``multiprocessing``)
* ``os.exec*()``, AKA "fork+exec", is not allowed (so no ``subprocess``)

This represents the full "isolated" mode of subinterpreters.  It is
applied when ``interpreters.create()`` is called with the "isolated"
keyword-only argument set to ``True`` (the default).  If
``interpreters.create(isolated=False)`` is called then none of those
restrictions is applied.

One advantage of this approach is that it allows extension maintainers
to check subinterpreter compatibility before they implement the PEP 489
API.  Also note that ``isolated=False`` represents the historical
behavior when using the existing subinterpreters C-API, thus providing
backward compatibility.  For the existing C-API itself, the default
remains ``isolated=False``.  The same is true for the "main" module, so
existing use of Python will not change.

We may choose to later loosen some of the above restrictions or provide
a way to enable/disable granular restrictions individually.  Regardless,
requiring PEP 489 support from extension modules will always be a
default restriction.


Documentation
=============

TBD


Deferred Functionality
======================

TBD


Rejected Ideas
==============

TBD


Implementation
==============

TBD

References
==========

.. [c-api]
   https://docs.python.org/3/c-api/init.html#sub-interpreter-support

.. [caveats]
   https://docs.python.org/3/c-api/init.html#bugs-and-caveats

.. [petr-c-ext]
   https://mail.python.org/pipermail/import-sig/2016-June/001062.html
   https://mail.python.org/pipermail/python-ideas/2016-April/039748.html

.. [cryptography]
   https://github.com/pyca/cryptography/issues/2299

.. [global-gc]
   http://bugs.python.org/issue24554

.. [gilstate]
   https://bugs.python.org/issue10915
   http://bugs.python.org/issue15751

.. [global-atexit]
   https://bugs.python.org/issue6531

.. [bug-rate]
   https://mail.python.org/pipermail/python-ideas/2017-September/047094.html

.. [benefits]
   https://mail.python.org/pipermail/python-ideas/2017-September/047122.html

.. [main-thread]
   https://mail.python.org/pipermail/python-ideas/2017-September/047144.html
   https://mail.python.org/pipermail/python-dev/2017-September/149566.html

.. [reset_globals]
   https://mail.python.org/pipermail/python-dev/2017-September/149545.html

.. [multi-core-project]
   https://github.com/ericsnowcurrently/multi-core-python

.. [cache-line-ping-pong]
   https://mail.python.org/archives/list/[email protected]/message/3HVRFWHDMWPNR367GXBILZ4JJAUQ2STZ/

.. [extension-docs]
   https://docs.python.org/3/extending/index.html


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
@ericsnowcurrently ericsnowcurrently self-assigned this Nov 30, 2021
@ericsnowcurrently
Copy link
Owner Author

Topics for discussion before the PEP can really take shape:

  • is "immortal objects" a viable solution?
    • fixed refcount vs. not
    • otherwise, what are the alternatives for C-API objects?
  • how to help extension authors?
  • subinterpreters vs. no-gil

I plan on starting threads on python-dev. I'm also going to reach out to the numpy folks about what it will take for them to supports subinterpreters and how I (we) can help.

@ericsnowcurrently
Copy link
Owner Author

See python/peps#2212. That's the draft I'm working on, in case anyone is interested. I should have it filled out (and pared down) and posted by the end of the week.

@jakirkham
Copy link

I plan on starting threads on python-dev. I'm also going to reach out to the numpy folks about what it will take for them to supports subinterpreters and how I (we) can help.

cc @rgommers @seberg (from NumPy for awareness)

Also cc-ing @ogrisel (who may have thoughts as well and know others who would be interested)

@seberg
Copy link

seberg commented Jan 5, 2022

Right now, nobody in NumPy is working on it, and I was hoping that the main effort for getting closer could be the HPy effort. NumPy uses PyEval_RestoreThread and PyEval_SaveThread in public API and I am not sure if we need a replacement and how it would look like. If those calls work, fine... If not, we need to extend API and move all users to the new API (or crash on subinterpreters?).

The other always upcoming thing is that not all methods have access to module state, so that it is unclear where global state is stored efficiently (also fairly important). I do not know if these things are solved yet on the Python side (i.e. access to global singleton objects or interned strings which could be "immortalized", but I am not sure).

Last time I looked into this, both of these seemed unclear to me. Maybe there are clear solutions now, but I am not sure I want to spend serious efforts on this in NumPy at the time, as I said, my hope was the HPy effort would at least move us closer.

@jakirkham
Copy link

jakirkham commented Jan 5, 2022

In that case I wonder if it would be useful to engage with HPy devs regarding this proposal. Do you know who would be best to talk to from HPy, Sebastian?

Edit: Maybe @rlamy or @antocuni?

Edit 2: These issues in particular look relevant: hpyproject/hpy#34 , hpyproject/hpy#268

@ericsnowcurrently
Copy link
Owner Author

I'm going to take a more focused approach in #79.

Repository owner moved this from In Progress to Done in Fancy CPython Board Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants