Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid changing the PYMEM_DOMAIN_RAW allocator during initialization and finalization #111924

Open
Tracked by #108219
colesbury opened this issue Nov 9, 2023 · 5 comments
Labels
3.13 bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented Nov 9, 2023

Bug report

CPython current temporarily changes PYMEM_DOMAIN_RAW to the default allocator during initialization and shutdown. The motivation is to ensure that core runtime structures are allocated and freed using the same allocator. However, modifying the current allocator changes global state and is not thread-safe even with the GIL. Other threads may be allocating or freeing objects use PYMEM_DOMAIN_RAW; they are not required to hold the GIL to call PyMem_RawMalloc/PyMem_RawFree.

We can avoid changing global state while still ensuring that we use a consistent allocator during initialization and shutdown.

PyThread_type_lock

Many of the runtime structures are PyThread_type_lock objects. We can avoid allocation/freeing entirely for these locks by using PyMutex or PyRawMutex instead.

cpython/Python/pystate.c

Lines 396 to 418 in b9f814c

static int
alloc_for_runtime(PyThread_type_lock locks[NUMLOCKS])
{
/* Force default allocator, since _PyRuntimeState_Fini() must
use the same allocator than this function. */
PyMemAllocatorEx old_alloc;
_PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
for (int i = 0; i < NUMLOCKS; i++) {
PyThread_type_lock lock = PyThread_allocate_lock();
if (lock == NULL) {
for (int j = 0; j < i; j++) {
PyThread_free_lock(locks[j]);
locks[j] = NULL;
}
break;
}
locks[i] = lock;
}
PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
return 0;
}

Calls to PyMem_RawMalloc, PyMem_RawCalloc, PyMem_RawFree, etc.

For the other calls to PyMem_RawMalloc, etc. where we know we want to use the default allocator, we should directly call a new internal-only function that always uses the default allocator. This will avoid unnecessarily modifying global state.

For example, we can add a new function _PyMem_DefaultRawMalloc that behaves like PyMem_RawMalloc, except that it is not modifiable by _PyMem_SetDefaultAllocator.

For an example implementation in the nogil-3.12 fork, see colesbury/nogil-3.12@d13c63dee9.

Linked PRs

@colesbury
Copy link
Contributor Author

cc @vstinner and @ericsnowcurrently in case you are interested in this.

@ericsnowcurrently
Copy link
Member

I'm a little unclear on this. Nearly all of the C-API is unavailable until global runtime initialization finishes and reverts to that state almost as soon as finalization begins. At both of those points only the main thread should be running. 12 Consequently, switching the raw allocator during init/fini shouldn't cause any problems.

Thus I'm not sure where the problem is.

Also, just to be clear, we do the raw allocator dance during init/fini for more than just locks. 3 All of that happens during global init/fini so the same single-threaded situation applies.

Of course, the actual operation of changing the allocator is thread-safe, protected by one of the global locks. Hence, what we do in Python/tracemalloc.c (e.g. _PyTraceMalloc_Start()) is fine.

Footnotes

  1. Daemon threads add a wrinkle to this during finalization, but we handle them in a way that shouldn't be affected by the raw allocator.

  2. Even in embedded scenarios, where there might be other non-Python threads running already, only the "main" thread (where Py_Initialize() was called) should be able to make use of the runtime until initialization finishes or once finalization starts.

  3. I did a quick grep for "_PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW". There's the "inittab" and builtin modules table stuff in Python/import.c. Then there's part of the config data in Python/initconfig.c and the path data in Python/pathconfig.c. We also do the raw allocator dance for PySys_AddWarnOption() (the "preinit_entry" stuff), but that function was removed from the API and only exists for stable ABI compatibility.

@ericsnowcurrently
Copy link
Member

FWIW, I'm on board with switching the various global locks to PyMutex or PyRawMutex, regardless of the raw allocator concerns.

@colesbury
Copy link
Contributor Author

colesbury commented Nov 10, 2023

Nearly all of the C-API is unavailable until global runtime initialization finishes and reverts to that state almost as soon as finalization begins.

"Nearly all of the C-API" here notably does not include PyMem_RawMalloc/Free. They are listed in https://docs.python.org/3/c-api/init.html#before-python-initialization.

At both of those points only the main thread should be running.

This is not the case during finalization. As you mention, daemon threads may still be running. So can any thread created by C (i.e., outside the threading module). We don't even have a hard guarantee that non-daemon threading module threads are finished: if you press ctrl-c during shutdown, we stop waiting on non-daemon threads and continue with finalization.

Any of these threads and (random C) code may call PyMem_RawMalloc and PyMem_RawFree because these APIs do not require holding the GIL.

Also, just to be clear, we do the raw allocator dance during init/fini for more than just locks.

Yeah, those are what I was referring to in the "Calls to PyMem_RawMalloc, PyMem_RawCalloc, PyMem_RawFree, etc." section.

@colesbury
Copy link
Contributor Author

FWIW, I'm on board with switching the various global locks to PyMutex or PyRawMutex, regardless of the raw allocator concerns.

Great! I'll probably tackle that part first, both because we seem to be on the same page there and it'll keep the PRs smaller.

colesbury added a commit to colesbury/cpython that referenced this issue Nov 16, 2023
This replaces some usages of PyThread_type_lock with PyMutex, which
does not require memory allocation to initialize.
ericsnowcurrently pushed a commit that referenced this issue Dec 7, 2023
This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize.

This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize.

This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize.

This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants