-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delayed reclamation mechanism for free-threaded build (QSBR) #115103
Labels
Comments
colesbury
added
type-feature
A feature request or enhancement
topic-free-threading
labels
Feb 6, 2024
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 7, 2024
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 8, 2024
This adds a safe memory reclamation scheme based on FreeBSD's "GUS" and quiescent state based reclamation (QSBR). The API provides a mechanism for callers to detect when it is safe to free memory that may be concurrently accessed by readers.
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 8, 2024
This adds a safe memory reclamation scheme based on FreeBSD's "GUS" and quiescent state based reclamation (QSBR). The API provides a mechanism for callers to detect when it is safe to free memory that may be concurrently accessed by readers.
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 12, 2024
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 12, 2024
…uilds This adds `_PyMem_FreeDelayed()` and supporting functions. The `_PyMem_FreeDelayed()` function frees memory with the same allocator as `PyMem_Free()`, but after some delay to ensure that concurrent lock-free readers have finished.
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 13, 2024
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 13, 2024
…uilds This adds `_PyMem_FreeDelayed()` and supporting functions. The `_PyMem_FreeDelayed()` function frees memory with the same allocator as `PyMem_Free()`, but after some delay to ensure that concurrent lock-free readers have finished.
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 13, 2024
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 14, 2024
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 14, 2024
This was referenced Feb 14, 2024
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 16, 2024
colesbury
added a commit
that referenced
this issue
Feb 16, 2024
This adds a safe memory reclamation scheme based on FreeBSD's "GUS" and quiescent state based reclamation (QSBR). The API provides a mechanism for callers to detect when it is safe to free memory that may be concurrently accessed by readers.
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Feb 16, 2024
…uilds This adds `_PyMem_FreeDelayed()` and supporting functions. The `_PyMem_FreeDelayed()` function frees memory with the same allocator as `PyMem_Free()`, but after some delay to ensure that concurrent lock-free readers have finished.
colesbury
added a commit
that referenced
this issue
Feb 20, 2024
…115367) This adds `_PyMem_FreeDelayed()` and supporting functions. The `_PyMem_FreeDelayed()` function frees memory with the same allocator as `PyMem_Free()`, but after some delay to ensure that concurrent lock-free readers have finished.
corona10
added a commit
to corona10/cpython
that referenced
this issue
Mar 2, 2024
corona10
added a commit
to corona10/cpython
that referenced
this issue
Mar 2, 2024
corona10
added a commit
that referenced
this issue
Mar 2, 2024
For the record: 2e91578 |
corona10
added a commit
to corona10/cpython
that referenced
this issue
Mar 2, 2024
…cessDelayed (pythongh-116238)" This reverts commit 2e91578.
corona10
added a commit
to corona10/cpython
that referenced
this issue
Mar 2, 2024
woodruffw
pushed a commit
to woodruffw-forks/cpython
that referenced
this issue
Mar 4, 2024
…uilds (python#115367) This adds `_PyMem_FreeDelayed()` and supporting functions. The `_PyMem_FreeDelayed()` function frees memory with the same allocator as `PyMem_Free()`, but after some delay to ensure that concurrent lock-free readers have finished.
woodruffw
pushed a commit
to woodruffw-forks/cpython
that referenced
this issue
Mar 4, 2024
woodruffw
pushed a commit
to woodruffw-forks/cpython
that referenced
this issue
Mar 4, 2024
…gh-116251) * Revert "pythongh-115103: Update refleak checker to trigger _PyMem_ProcessDelayed (pythongh-116238)" This reverts commit 2e91578. * pythongh-115103: Update gc.collect to process delayed objects
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Mar 4, 2024
This sets `MI_DEBUG` to `2` in debug builds to enable `mi_assert_internal()` calls. Expensive internal assertions are not neabled.
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Mar 4, 2024
This sets `MI_DEBUG` to `2` in debug builds to enable `mi_assert_internal()` calls. Expensive internal assertions are not enabled. This also disables an assertion in free-threaded builds that would be triggered by the free-threaded GC because we traverse heaps that are not owned by the current thread.
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Mar 4, 2024
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Mar 5, 2024
colesbury
added a commit
that referenced
this issue
Mar 5, 2024
This sets `MI_DEBUG` to `2` in debug builds to enable `mi_assert_internal()` calls. Expensive internal assertions are not enabled. This also disables an assertion in free-threaded builds that would be triggered by the free-threaded GC because we traverse heaps that are not owned by the current thread.
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Mar 5, 2024
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Mar 6, 2024
colesbury
added a commit
that referenced
this issue
Mar 6, 2024
This implements the delayed reuse of mimalloc pages that contain Python objects in the free-threaded build. Allocations of the same size class are grouped in data structures called pages. These are different from operating system pages. For thread-safety, we want to ensure that memory used to store PyObjects remains valid as long as there may be concurrent lock-free readers; we want to delay using it for other size classes, in other heaps, or returning it to the operating system. When a mimalloc page becomes empty, instead of immediately freeing it, we tag it with a QSBR goal and insert it into a per-thread state linked list of pages to be freed. When mimalloc needs a fresh page, we process the queue and free any still empty pages that are now deemed safe to be freed. Pages waiting to be freed are still available for allocations of the same size class and allocating from a page prevent it from being freed. There is additional logic to handle abandoned pages when threads exit.
I think this is complete now |
colesbury
added a commit
to colesbury/cpython
that referenced
this issue
Mar 7, 2024
If a thread blocks while waiting on the `shared->mutex` lock, the array of QSBR states may be reallocated. The `tstate->qsbr` values before the lock is acquired may not be the same as the value after the lock is acquired.
colesbury
added a commit
that referenced
this issue
Mar 8, 2024
If a thread blocks while waiting on the `shared->mutex` lock, the array of QSBR states may be reallocated. The `tstate->qsbr` values before the lock is acquired may not be the same as the value after the lock is acquired.
adorilson
pushed a commit
to adorilson/cpython
that referenced
this issue
Mar 25, 2024
adorilson
pushed a commit
to adorilson/cpython
that referenced
this issue
Mar 25, 2024
…gh-116251) * Revert "pythongh-115103: Update refleak checker to trigger _PyMem_ProcessDelayed (pythongh-116238)" This reverts commit 2e91578. * pythongh-115103: Update gc.collect to process delayed objects
adorilson
pushed a commit
to adorilson/cpython
that referenced
this issue
Mar 25, 2024
…ython#116343) This sets `MI_DEBUG` to `2` in debug builds to enable `mi_assert_internal()` calls. Expensive internal assertions are not enabled. This also disables an assertion in free-threaded builds that would be triggered by the free-threaded GC because we traverse heaps that are not owned by the current thread.
adorilson
pushed a commit
to adorilson/cpython
that referenced
this issue
Mar 25, 2024
…ython#115435) This implements the delayed reuse of mimalloc pages that contain Python objects in the free-threaded build. Allocations of the same size class are grouped in data structures called pages. These are different from operating system pages. For thread-safety, we want to ensure that memory used to store PyObjects remains valid as long as there may be concurrent lock-free readers; we want to delay using it for other size classes, in other heaps, or returning it to the operating system. When a mimalloc page becomes empty, instead of immediately freeing it, we tag it with a QSBR goal and insert it into a per-thread state linked list of pages to be freed. When mimalloc needs a fresh page, we process the queue and free any still empty pages that are now deemed safe to be freed. Pages waiting to be freed are still available for allocations of the same size class and allocating from a page prevent it from being freed. There is additional logic to handle abandoned pages when threads exit.
adorilson
pushed a commit
to adorilson/cpython
that referenced
this issue
Mar 25, 2024
If a thread blocks while waiting on the `shared->mutex` lock, the array of QSBR states may be reallocated. The `tstate->qsbr` values before the lock is acquired may not be the same as the value after the lock is acquired.
diegorusso
pushed a commit
to diegorusso/cpython
that referenced
this issue
Apr 17, 2024
…115180) This adds a safe memory reclamation scheme based on FreeBSD's "GUS" and quiescent state based reclamation (QSBR). The API provides a mechanism for callers to detect when it is safe to free memory that may be concurrently accessed by readers.
diegorusso
pushed a commit
to diegorusso/cpython
that referenced
this issue
Apr 17, 2024
…uilds (python#115367) This adds `_PyMem_FreeDelayed()` and supporting functions. The `_PyMem_FreeDelayed()` function frees memory with the same allocator as `PyMem_Free()`, but after some delay to ensure that concurrent lock-free readers have finished.
diegorusso
pushed a commit
to diegorusso/cpython
that referenced
this issue
Apr 17, 2024
diegorusso
pushed a commit
to diegorusso/cpython
that referenced
this issue
Apr 17, 2024
…gh-116251) * Revert "pythongh-115103: Update refleak checker to trigger _PyMem_ProcessDelayed (pythongh-116238)" This reverts commit 2e91578. * pythongh-115103: Update gc.collect to process delayed objects
diegorusso
pushed a commit
to diegorusso/cpython
that referenced
this issue
Apr 17, 2024
…ython#116343) This sets `MI_DEBUG` to `2` in debug builds to enable `mi_assert_internal()` calls. Expensive internal assertions are not enabled. This also disables an assertion in free-threaded builds that would be triggered by the free-threaded GC because we traverse heaps that are not owned by the current thread.
diegorusso
pushed a commit
to diegorusso/cpython
that referenced
this issue
Apr 17, 2024
…ython#115435) This implements the delayed reuse of mimalloc pages that contain Python objects in the free-threaded build. Allocations of the same size class are grouped in data structures called pages. These are different from operating system pages. For thread-safety, we want to ensure that memory used to store PyObjects remains valid as long as there may be concurrent lock-free readers; we want to delay using it for other size classes, in other heaps, or returning it to the operating system. When a mimalloc page becomes empty, instead of immediately freeing it, we tag it with a QSBR goal and insert it into a per-thread state linked list of pages to be freed. When mimalloc needs a fresh page, we process the queue and free any still empty pages that are now deemed safe to be freed. Pages waiting to be freed are still available for allocations of the same size class and allocating from a page prevent it from being freed. There is additional logic to handle abandoned pages when threads exit.
diegorusso
pushed a commit
to diegorusso/cpython
that referenced
this issue
Apr 17, 2024
If a thread blocks while waiting on the `shared->mutex` lock, the array of QSBR states may be reallocated. The `tstate->qsbr` values before the lock is acquired may not be the same as the value after the lock is acquired.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature or enhancement
Many operations in the free-threaded build are protected by locks. However, in some cases, we want to allow reads to happen concurrently with updates 1. For example, we want to avoid locking during most list read accesses. If there is a concurrent list resize operation, when is it safe to free the list's previous array?
"Safe memory reclamation" (SMR) schemes address this usually by delaying the "free" operation until all concurrent read accesses are guaranteed to have completed. Quiescent state-based reclamation (QSBR) is a safe memory reclamation scheme that will be useful in the free-threaded build. QSBR relies on marked "quiescent states", when a thread reports that it is definitely not in the middle of a non-locking read operation. The eval breaker checks serve as a good place to report a quiescent state.
Background
SMR schemes are most common in systems that don't use garbage collection or reference counting. CPython uses reference counting and has a tracing garbage collector, so it may seem like an odd fit. However, some data structures are not reference counted. For example, a list object's array is not separately reference counted, but may have a shorter lifetime than the containing
PyListObject
. We could delay reclamation until the GC runs, but we want reclamation to happen quickly, and we generally want to run the GC less frequently in the free-threaded build, because it requires pausing all threads and, at least for now, scanning the entire heap.Use cases
PyDictKeysObject
) and list arrays (ob_item
): If we resize a dict or list that may be shared between threads, we use QSBR to delay freeing the old keys/array until it's safe to do so. Note that most dicts and lists are not accessed by multiple threads; their keys/arrays can be immediately freed on resize.mi_page_t
: The non-locking dict and list accesses require cooperation from mimalloc. We want to ensure that even if an item is freed during access and the memory reused for a new object, the new object’s reference count field is placed at the same location in memory. In practice, this means that when anmi_page_t
is empty, we don't immediately allow it to be re-used for allocations of a different size class. We use QSBR to determine when it's safe to use themi_page_t
for a different size class (or return the memory to the OS).Implementation Overview
The implementation will be partially based on FreeBSD's "Global Unbounded Sequences". The FreeBSD implementation provides a good starting point because it's relatively simple, self contained, and the license (2-Clause BSD) is compatible with CPython. The implementation relies on a few sequence counters:
It's safe to free data once all thread states' read sequences are greater than or equal to the write sequence used to tag the data.
To check that, we scan over all the thread states read sequences to compute the minimum value (excluding detached threads). For efficiency, we also cache that computed value globally (per-interpreter).
Limitations
Determining the current minimum read sequence requires scanning over all thread states. This will likely become a bottleneck if we have a large number of threads (>1,000?). We will likely eventually want to address this, possibly with some combination of a tree-based mechanism 2 and incremental scanning.
For now, I think keeping the implementation simple important until we are able to run and benchmark multithreaded programs with the GIL disabled.
Linked PRs
Footnotes
See "Optimistically Avoiding Locking" in PEP 703. ↩
https://people.kernel.org/joelfernandes/gus-vs-rcu ↩
The text was updated successfully, but these errors were encountered: