-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-112529: Implement GC for free-threaded builds #114262
Conversation
7568efc
to
0a5d5bf
Compare
This implements a mark and sweep GC for the free-threaded builds of CPython. The implementation relies on mimalloc to find GC tracked objects (i.e., "containers").
0a5d5bf
to
1eb8a0c
Compare
@DinoV, @pablogsal, @nascheme - I think this is ready for review now. There is duplicate code between I'll put up a PR for the devguide as well. |
The design, structure and code style look good to me. A couple of minor suggestions. Next to the definition of ob_tid, I think it should also mention that this field is (ab)used for the linked-list and for the GC refs count. Instead of I think using the marking stack is okay, despite the extra allocations needed. If it is turns out to be a problem, e.g. a program with a very deeply linked object structure, a fairly simple fix is as follows. Limit the size of the stack. If the size is exceeded, re-start the marking process. An additional refinement is to have a "already marked" bit on each mimalloc page and that way if you re-start, those pages can be quickly skipped. Note that the |
Thanks for the review Neil. I've renamed
Yeah, that makes sense.
In a subsequent change (not yet a PR), I'm using |
Python/gc_free_threading.c
Outdated
} | ||
|
||
static int | ||
gc_visit_heaps_locked(PyInterpreterState *interp, mi_block_visit_fun *visitor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to call this gc_visit_heaps_lock_held
per our discussion on naming these style of functions?
Python/gc_free_threading.c
Outdated
if (_PyObject_GC_IS_TRACKED(op) && !_Py_IsImmortal(op)) { | ||
// If update_refs hasn't reached this object yet, mark it | ||
// as (tentatively) unreachable and initialize ob_tid to zero. | ||
if (!gc_is_unreachable(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to factor this into a helper like gc_init_refs
or gc_maybe_init_refs
that is used here and in update_refs
Python/gc_free_threading.c
Outdated
if (PyTuple_CheckExact(op)) { | ||
_PyTuple_MaybeUntrack(op); | ||
if (!_PyObject_GC_IS_TRACKED(op)) { | ||
if (gc_is_unreachable(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could similarly be a gc_restore_refs
or gc_maybe_restore_refs
and used below
return NULL; | ||
} | ||
|
||
// Append all objects to a worklist. This abuses ob_tid. We will restore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth stopping the world here? It seems like we're fine to race with ob_tid
from a correctness stand point, but it seems like a lot of objects are going to end up with merged ref counts in a multithreaded environment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to stop the world for the heap traversal (gc_visit_heaps
) to be thread-safe... which I forgot to add here. I'll also add an assert to gc_visit_heaps()
that the world is stopped.
This doesn't merge most refcounts. They get restored from the mimalloc data structure. They'll only get merged if the owning thread has already exited.
// it later. NOTE: We can't append to the list during gc_visit_heaps() | ||
// because PyList_Append() may reclaim an abandoned mimalloc segment | ||
// while we are traversing it. | ||
struct get_objects_args args = { 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto to get referrers
|
||
op->ob_tid = 0; | ||
op->ob_ref_local = 0; | ||
op->ob_ref_shared = _Py_REF_SHARED(refcount, _Py_REF_MERGED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this needs to be an atomic operation? Can this just be a call to _Py_ExplicitMergeRefcount
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other threads in the interpreter must be paused, so no atomics necessary. I'll add an assert here.
We could probably use _Py_ExplicitMergeRefcount()
, but it seems convenient to avoid the atomic operations.
|
||
// Clear weakrefs and enqueue callbacks (but do not call them). | ||
clear_weakrefs(state); | ||
_PyEval_StartTheWorld(interp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think if I understand this correctly what's going to happen with the thread ID and any reference count changes that may happen at this point, but the assumption is that the thread ID is never going to actually clash with an object reference in the work list. And then anything which does have a reference count operation will just become merged when we restore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the key assumption is that _PyThread_Id()
is itself a pointer to a distinct object so never conflicts with an object in the worklists. The implementation of _PyThread_Id()
uses the address of the platform's thread control block or equivalent.
On the GC side, we abuse ob_tid
for two purposes:
gc_refs
, which can have arbitrary values (might conflict!), but this is only happens while other threads are paused, so nobody sees it.worklist
pointers, which must not conflict with_PyThread_Id()
Some worklists are only used while other threads are paused (e.g., in _PyGC_GetObjects()
). Other worklists are created when threads are paused, but still used while other threads may be running (e.g., unreachable
, wrcb_to_call
).
If the worklist continues to be used while other threads are running, then there are two other important considerations:
- The worklist holds a strong reference to the object so that it can't be freed while in the worklist.
- The object's reference count fields are merged before adding it to the worklist. This avoids some other thread trying to merge the refcount fields, which would be messy. (It also makes the deallocation happen more quickly when we get around to calling
tp_clear
).
Here's the PR so far for the devguide: python/devguide#1263 |
* pythongh-112529: Implement GC for free-threaded builds This implements a mark and sweep GC for the free-threaded builds of CPython. The implementation relies on mimalloc to find GC tracked objects (i.e., "containers").
* pythongh-112529: Implement GC for free-threaded builds This implements a mark and sweep GC for the free-threaded builds of CPython. The implementation relies on mimalloc to find GC tracked objects (i.e., "containers").
This implements the GC for free-threaded builds. The free-threading GC follows the same basic algorithms as the existing GC, but operates on different data types. Specifically:
PyGC_Head
linked list)ob_tid
field (thread id) is abused for computinggc_refs
as well as for "worklists" that keep track of the "unreachable" set of objects, objects with legacy finalizers, and weakref callbacks._PyObjectStack
, a linked-list of fixed sized chunks. This requires memory allocation during GC for marking. It would be possible to avoid this, but at the cost of extra scans over the whole heap, which I don't think is worthwhile. For context, GC implementations in other languages, like OpenJDK and Go use essentially the same data structure for marking.ob_ref_local
for computinggc_refs
becauseob_tid
is in use for the "unreachable" worklistThere is a bunch of clean-up and improvements that I'd like to defer to later PRs to keep this size of this manageable:
NUM_GENERATIONS
to 1 in free-threaded builds. As written, every collection in the free-threaded build is a full collection, but we still behave as if we have three generations in a few places.PyGC_Head
pre-header in free-threaded buildsPython/gc.c
andPython/gc_free_threading.c
once Mark's incremental GC is landed--disable-gil
builds #112529