-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-112529: Make the GC scheduling thread-safe #114880
Conversation
The GC keeps track of the number of allocations (less deallocations) since the last GC. This buffers the count in thread-local state and uses atomic operations to modify the per-interpreter count. The thread-local buffering avoids contention on shared state. A consequence is that the GC scheduling is not as precise, so "test_sneaky_frame_object" is skipped because it requires that the GC be run exactly after allocating a frame object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GC is one of the runtime components with which I am least familiar, so I mostly have questions for you. 😄
Otherwise, the PR mostly makes sense.
The change here seems okay to me, but I'd feel better if one of the GC experts reviewed this before it's merged. CC @markshannon @pablogsal @nascheme @DinoV @nanjekyejoannah |
I've not looked at the code but the idea of the change sounds fine to me. I suspect there are some users who require that the GC threshold is precise, like the |
// We buffer the allocation count to avoid the overhead of atomic | ||
// operations for every allocation. | ||
gc->alloc_count++; | ||
if (gc->alloc_count >= LOCAL_ALLOC_COUNT_THRESHOLD) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could be tied to the configurable GC threshold and therefore the tests could continue to pass but maybe it doesn't matter enough and the extra read isn't worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I considered making it a configurable runtime threshold, but decided it wasn't worth it, at least for now.
I think there's a decent chance we change how we count allocations in the future. In the nogil
forks, for example, I accounted for allocations in mi_page_to_full
and _mi_page_unfull
, which provides some natural batching and avoids the thread-local that's done in every allocation here, but wouldn't allow for a configurable threshold. I haven't attempted that yet because I'd like some performance measurements to justify it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
🤖 New build scheduled with the buildbot fleet by @colesbury for commit f99d14e 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
The GC keeps track of the number of allocations (less deallocations) since the last GC. This buffers the count in thread-local state and uses atomic operations to modify the per-interpreter count. The thread-local buffering avoids contention on shared state. A consequence is that the GC scheduling is not as precise, so "test_sneaky_frame_object" is skipped because it requires that the GC be run exactly after allocating a frame object.
The GC keeps track of the number of allocations (less deallocations) since the last GC. This buffers the count in thread-local state and uses atomic operations to modify the per-interpreter count. The thread-local buffering avoids contention on shared state. A consequence is that the GC scheduling is not as precise, so "test_sneaky_frame_object" is skipped because it requires that the GC be run exactly after allocating a frame object.
The GC keeps track of the number of allocations (less deallocations) since the last GC. This buffers the count in thread-local state and uses atomic operations to modify the per-interpreter count. The thread-local buffering avoids contention on shared state. A consequence is that the GC scheduling is not as precise, so "test_sneaky_frame_object" is skipped because it requires that the GC be run exactly after allocating a frame object.
The GC keeps track of the number of allocations (less deallocations) since the last GC. This change buffers the allocation count in thread-local state and uses atomic operations to modify the per-interpreter count. The thread-local buffering avoids contention on shared state.
A consequence is that the GC scheduling is not as precise, so "test_sneaky_frame_object" is skipped because it requires that the GC be run exactly after allocating a frame object.
--disable-gil
builds #112529