gh-112529: Implement GC for free-threaded builds #114262

colesbury · 2024-01-18T20:15:10Z

This implements the GC for free-threaded builds. The free-threading GC follows the same basic algorithms as the existing GC, but operates on different data types. Specifically:

GC-enabled objects are found by using mimalloc APIs to traverse heaps (instead of the PyGC_Head linked list)
The ob_tid field (thread id) is abused for computing gc_refs as well as for "worklists" that keep track of the "unreachable" set of objects, objects with legacy finalizers, and weakref callbacks.
The marking stack uses _PyObjectStack, a linked-list of fixed sized chunks. This requires memory allocation during GC for marking. It would be possible to avoid this, but at the cost of extra scans over the whole heap, which I don't think is worthwhile. For context, GC implementations in other languages, like OpenJDK and Go use essentially the same data structure for marking.
The resurrection check uses ob_ref_local for computing gc_refs because ob_tid is in use for the "unreachable" worklist

There is a bunch of clean-up and improvements that I'd like to defer to later PRs to keep this size of this manageable:

Define NUM_GENERATIONS to 1 in free-threaded builds. As written, every collection in the free-threaded build is a full collection, but we still behave as if we have three generations in a few places.
Remove the now unused PyGC_Head pre-header in free-threaded builds
Thread-safe counting of the number of allocated objects (used to schedule GCs)
Refactor out common code from Python/gc.c and Python/gc_free_threading.c once Mark's incremental GC is landed
Add stop-the-world calls (needs gh-111964: Implement stop-the-world pauses #112471)

Issue: Make the garbage collector thread-safe in --disable-gil builds #112529

This implements a mark and sweep GC for the free-threaded builds of CPython. The implementation relies on mimalloc to find GC tracked objects (i.e., "containers").

colesbury · 2024-01-22T21:55:43Z

@DinoV, @pablogsal, @nascheme - I think this is ready for review now.

There is duplicate code between Python/gc.c and Python/gc_free_threading.c for now. I'd like to avoid any substantial refactoring, while @markshannon's incremental GC PR is still outstanding, even at the cost of some duplicate code.

I'll put up a PR for the devguide as well.

nascheme · 2024-01-24T15:48:38Z

The design, structure and code style look good to me.

A couple of minor suggestions. Next to the definition of ob_tid, I think it should also mention that this field is (ab)used for the linked-list and for the GC refs count. Instead of from_block() I'd prefer a bit more descriptive name, e.g. op_from_block().

I think using the marking stack is okay, despite the extra allocations needed. If it is turns out to be a problem, e.g. a program with a very deeply linked object structure, a fairly simple fix is as follows. Limit the size of the stack. If the size is exceeded, re-start the marking process. An additional refinement is to have a "already marked" bit on each mimalloc page and that way if you re-start, those pages can be quickly skipped.

Note that the PyGC_Head info cannot be removed unless a different mechanism for the "trashcan" is made. It could use the _PyObjectStack structure as well.

colesbury · 2024-01-24T17:44:26Z

Thanks for the review Neil. I've renamed from_block to op_from_block and added a comment to the ob_tid definition.

I think using the marking stack is okay, despite the extra allocations needed. If it is turns out to be a problem... a fairly simple fix is as follows...

Yeah, that makes sense.

Note that the PyGC_Head info cannot be removed unless a different mechanism for the "trashcan" is made.

In a subsequent change (not yet a PR), I'm using ob_tid for the trashcan: colesbury@9e06349, which avoids having to worry about allocation failures. I expect to use _PyObjectStack for the (not yet implemented) biased reference counting an inter-thread queue, where ob_tid cannot be (ab)used as a linked list pointer.

DinoV · 2024-01-24T19:14:55Z

Python/gc_free_threading.c

+}
+
+static int
+gc_visit_heaps_locked(PyInterpreterState *interp, mi_block_visit_fun *visitor,


Do we want to call this gc_visit_heaps_lock_held per our discussion on naming these style of functions?

DinoV · 2024-01-24T19:23:25Z

Python/gc_free_threading.c

+    if (_PyObject_GC_IS_TRACKED(op) && !_Py_IsImmortal(op)) {
+        // If update_refs hasn't reached this object yet, mark it
+        // as (tentatively) unreachable and initialize ob_tid to zero.
+        if (!gc_is_unreachable(op)) {


It'd be nice to factor this into a helper like gc_init_refs or gc_maybe_init_refs that is used here and in update_refs

DinoV · 2024-01-24T19:24:38Z

Python/gc_free_threading.c

+    if (PyTuple_CheckExact(op)) {
+        _PyTuple_MaybeUntrack(op);
+        if (!_PyObject_GC_IS_TRACKED(op)) {
+            if (gc_is_unreachable(op)) {


This could similarly be a gc_restore_refs or gc_maybe_restore_refs and used below

DinoV · 2024-01-24T19:29:18Z

Python/gc_free_threading.c

+        return NULL;
+    }
+
+    // Append all objects to a worklist. This abuses ob_tid. We will restore


Is it worth stopping the world here? It seems like we're fine to race with ob_tid from a correctness stand point, but it seems like a lot of objects are going to end up with merged ref counts in a multithreaded environment!

We need to stop the world for the heap traversal (gc_visit_heaps) to be thread-safe... which I forgot to add here. I'll also add an assert to gc_visit_heaps() that the world is stopped.

This doesn't merge most refcounts. They get restored from the mimalloc data structure. They'll only get merged if the owning thread has already exited.

DinoV · 2024-01-24T19:55:47Z

Python/gc_free_threading.c

+    // it later. NOTE: We can't append to the list during gc_visit_heaps()
+    // because PyList_Append() may reclaim an abandoned mimalloc segment
+    // while we are traversing it.
+    struct get_objects_args args = { 0 };


Ditto to get referrers

DinoV · 2024-01-24T19:57:57Z

Python/gc_free_threading.c

+
+    op->ob_tid = 0;
+    op->ob_ref_local = 0;
+    op->ob_ref_shared = _Py_REF_SHARED(refcount, _Py_REF_MERGED);


It seems like this needs to be an atomic operation? Can this just be a call to _Py_ExplicitMergeRefcount?

The other threads in the interpreter must be paused, so no atomics necessary. I'll add an assert here.

We could probably use _Py_ExplicitMergeRefcount(), but it seems convenient to avoid the atomic operations.

DinoV · 2024-01-24T20:24:27Z

Python/gc_free_threading.c

+
+    // Clear weakrefs and enqueue callbacks (but do not call them).
+    clear_weakrefs(state);
+    _PyEval_StartTheWorld(interp);


So I think if I understand this correctly what's going to happen with the thread ID and any reference count changes that may happen at this point, but the assumption is that the thread ID is never going to actually clash with an object reference in the work list. And then anything which does have a reference count operation will just become merged when we restore it?

Yeah, the key assumption is that _PyThread_Id() is itself a pointer to a distinct object so never conflicts with an object in the worklists. The implementation of _PyThread_Id() uses the address of the platform's thread control block or equivalent.

On the GC side, we abuse ob_tid for two purposes:

gc_refs, which can have arbitrary values (might conflict!), but this is only happens while other threads are paused, so nobody sees it.

worklist pointers, which must not conflict with _PyThread_Id()

Some worklists are only used while other threads are paused (e.g., in _PyGC_GetObjects()). Other worklists are created when threads are paused, but still used while other threads may be running (e.g., unreachable, wrcb_to_call).

If the worklist continues to be used while other threads are running, then there are two other important considerations:

The worklist holds a strong reference to the object so that it can't be freed while in the worklist.

The object's reference count fields are merged before adding it to the worklist. This avoids some other thread trying to merge the refcount fields, which would be messy. (It also makes the deallocation happen more quickly when we get around to calling tp_clear).

colesbury · 2024-01-24T21:49:19Z

Here's the PR so far for the devguide: python/devguide#1263

* pythongh-112529: Implement GC for free-threaded builds This implements a mark and sweep GC for the free-threaded builds of CPython. The implementation relies on mimalloc to find GC tracked objects (i.e., "containers").

colesbury requested a review from DinoV January 18, 2024 20:15

bedevere-app bot mentioned this pull request Jan 18, 2024

Make the garbage collector thread-safe in --disable-gil builds #112529

Closed

3 tasks

colesbury force-pushed the gh-112529-gc branch from 7568efc to 0a5d5bf Compare January 18, 2024 20:21

colesbury mentioned this pull request Jan 18, 2024

gh-101438: Avoid reference cycle in ElementTree.iterparse. #114269

Merged

Eclips4 added the topic-free-threading label Jan 19, 2024

pythongh-112529: Implement GC for free-threaded builds

1eb8a0c

This implements a mark and sweep GC for the free-threaded builds of CPython. The implementation relies on mimalloc to find GC tracked objects (i.e., "containers").

colesbury force-pushed the gh-112529-gc branch from 0a5d5bf to 1eb8a0c Compare January 22, 2024 21:42

colesbury requested review from nascheme and pablogsal January 22, 2024 21:52

colesbury marked this pull request as ready for review January 22, 2024 21:52

colesbury requested a review from a team as a code owner January 22, 2024 21:52

bedevere-app bot added the awaiting review label Jan 22, 2024

colesbury added 3 commits January 23, 2024 18:17

Merge branch 'main' into pythongh-112529-gc

27199cc

Add stop-the-world calls to gc_free_threading.c

baf8081

Merge branch 'main' into pythongh-112529-gc

8a18a7a

Rename "from_block" and add comment to ob_tid definition

5676983

DinoV reviewed Jan 24, 2024

View reviewed changes

colesbury added 2 commits January 24, 2024 20:42

Add helper functions gc_restore_refs and gc_maybe_init_refs.

d13a206

Stop the world before calling gc_visit_heaps.

8814b34

DinoV approved these changes Jan 25, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Jan 25, 2024

DinoV merged commit b52fc70 into python:main Jan 25, 2024
34 checks passed

bedevere-app bot removed the awaiting merge label Jan 25, 2024

encukou mentioned this pull request Jan 30, 2024

test_pendingcalls_threaded times out on Windows free-threading builds #114746

Closed

colesbury deleted the gh-112529-gc branch February 15, 2024 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-112529: Implement GC for free-threaded builds #114262

gh-112529: Implement GC for free-threaded builds #114262

colesbury commented Jan 18, 2024 •

edited

Loading

colesbury commented Jan 22, 2024

nascheme commented Jan 24, 2024

colesbury commented Jan 24, 2024

DinoV Jan 24, 2024

DinoV Jan 24, 2024

DinoV Jan 24, 2024

DinoV Jan 24, 2024

colesbury Jan 24, 2024

DinoV Jan 24, 2024

DinoV Jan 24, 2024

colesbury Jan 24, 2024

DinoV Jan 24, 2024

colesbury Jan 24, 2024

colesbury commented Jan 24, 2024

gh-112529: Implement GC for free-threaded builds #114262

gh-112529: Implement GC for free-threaded builds #114262

Conversation

colesbury commented Jan 18, 2024 • edited Loading

colesbury commented Jan 22, 2024

nascheme commented Jan 24, 2024

colesbury commented Jan 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colesbury commented Jan 24, 2024

colesbury commented Jan 18, 2024 •

edited

Loading