Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-112529: Make the GC scheduling thread-safe #114880

Merged
merged 5 commits into from
Feb 16, 2024

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Feb 1, 2024

The GC keeps track of the number of allocations (less deallocations) since the last GC. This change buffers the allocation count in thread-local state and uses atomic operations to modify the per-interpreter count. The thread-local buffering avoids contention on shared state.

A consequence is that the GC scheduling is not as precise, so "test_sneaky_frame_object" is skipped because it requires that the GC be run exactly after allocating a frame object.

The GC keeps track of the number of allocations (less deallocations)
since the last GC. This buffers the count in thread-local state and uses
atomic operations to modify the per-interpreter count. The thread-local
buffering avoids contention on shared state.

A consequence is that the GC scheduling is not as precise, so
"test_sneaky_frame_object" is skipped because it requires that the GC be
run exactly after allocating a frame object.
Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC is one of the runtime components with which I am least familiar, so I mostly have questions for you. 😄

Otherwise, the PR mostly makes sense.

Objects/typeobject.c Show resolved Hide resolved
Python/gc_free_threading.c Show resolved Hide resolved
Python/gc_free_threading.c Show resolved Hide resolved
Python/gc_free_threading.c Show resolved Hide resolved
@ericsnowcurrently
Copy link
Member

The change here seems okay to me, but I'd feel better if one of the GC experts reviewed this before it's merged.

CC @markshannon @pablogsal @nascheme @DinoV @nanjekyejoannah

@nascheme
Copy link
Member

nascheme commented Feb 4, 2024

I've not looked at the code but the idea of the change sounds fine to me. I suspect there are some users who require that the GC threshold is precise, like the test_sneaky_frame_object case. However, I don't think that's a reasonable thing and I think it's okay to break them. We are pretty likely to adjust how the thresholds work anyhow. Using atomic operations to count allocations/dellocations will be too expensive.

// We buffer the allocation count to avoid the overhead of atomic
// operations for every allocation.
gc->alloc_count++;
if (gc->alloc_count >= LOCAL_ALLOC_COUNT_THRESHOLD) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be tied to the configurable GC threshold and therefore the tests could continue to pass but maybe it doesn't matter enough and the extra read isn't worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I considered making it a configurable runtime threshold, but decided it wasn't worth it, at least for now.

I think there's a decent chance we change how we count allocations in the future. In the nogil forks, for example, I accounted for allocations in mi_page_to_full and _mi_page_unfull, which provides some natural batching and avoids the thread-local that's done in every allocation here, but wouldn't allow for a configurable threshold. I haven't attempted that yet because I'd like some performance measurements to justify it first.

Copy link
Contributor

@DinoV DinoV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@colesbury colesbury added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Feb 14, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit f99d14e 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Feb 14, 2024
@colesbury colesbury merged commit b24c916 into python:main Feb 16, 2024
119 checks passed
@colesbury colesbury deleted the gh-112529-gc-schedule branch February 16, 2024 16:22
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this pull request Mar 4, 2024
The GC keeps track of the number of allocations (less deallocations)
since the last GC. This buffers the count in thread-local state and uses
atomic operations to modify the per-interpreter count. The thread-local
buffering avoids contention on shared state.

A consequence is that the GC scheduling is not as precise, so
"test_sneaky_frame_object" is skipped because it requires that the GC be
run exactly after allocating a frame object.
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
The GC keeps track of the number of allocations (less deallocations)
since the last GC. This buffers the count in thread-local state and uses
atomic operations to modify the per-interpreter count. The thread-local
buffering avoids contention on shared state.

A consequence is that the GC scheduling is not as precise, so
"test_sneaky_frame_object" is skipped because it requires that the GC be
run exactly after allocating a frame object.
LukasWoodtli pushed a commit to LukasWoodtli/cpython that referenced this pull request Jan 22, 2025
The GC keeps track of the number of allocations (less deallocations)
since the last GC. This buffers the count in thread-local state and uses
atomic operations to modify the per-interpreter count. The thread-local
buffering avoids contention on shared state.

A consequence is that the GC scheduling is not as precise, so
"test_sneaky_frame_object" is skipped because it requires that the GC be
run exactly after allocating a frame object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants