-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move the eval_breaker
to PyThreadState
#112175
Comments
If I remember correctly, |
I don't think it'll be an issue in the default build, but I'll need to think about how instrumentation works in I think Mark's idea works basically like:
In |
I'm currently working on this, so if anyone has any related ideas/comments, please post them! |
@markshannon, do you have any more context on your per-thread I'm fleshing out exactly how all the flags will work in free-threaded vs. normal builds. As Sam says above, we need to loop over all threads for interpreter-wide flags in a free-threaded build, and this is pretty straightforward. All For normal builds, if we want to avoid looping over all threads, we can set interpreter-wide flags on the active thread and use
Does this all sound reasonable, especially for the interpreter-wide flags? If it's too complicated, I could set them by looping over all threads in both build types. That would add some overhead to normal builds, but I expect it wouldn't be measurable overall. It's not much work to try out both implementations, so I could see if there's a measurable performance difference, if that would help make the decision. |
This change adds an `eval_breaker` field to `PyThreadState`, renaming the existing `eval_breaker` to `interp_eval_breaker` (its uses are explained further down). The primary motivation is for performance in free-threaded builds: with thread-local eval breakers, we can stop a specific thread (e.g., for an async exception) without interrupting other threads. There are still two situations where we want the first available thread to handle a request: - Running a garbage collection: In normal builds, we set `_PY_GC_SCHEDULED_BIT` on the current thread. In case a thread suspends before handling the collection, the bit is copied to and from `interp_eval_breaker` on thread suspend and resume, respectively. In a free-threaded build, we simply iterate over all threads and set the bit. The first thread to check its eval breaker runs the collection, unsetting the bit on all threads. - Free-threaded builds could have multiple threads attempt a GC from one trigger if we get very unlucky with thread scheduling. I didn't put any protections against this in place because a) the consequences of it happening are just that one or more threads will check the GC thresholds right after a collection finishes, which won't affect correctness and b) it's incredibly, vanishingly unlikely. - Pending calls not limited to the main thread (possible since python/cpython@757b402ea1c2). This is a little tricker, since the callback can be added from any thread, with or without the GIL held. If the targeted interpreter's GIL is locked, we signal the holding thread. When a thread is resumed, its `_PY_CALLS_TO_DO` bit is derived from the source of truth for pending calls (one of two `_pending_calls` structs). This handles situations where no thread held the GIL when the call was first added, or if the active thread did not handle the call before releasing the GIL. In a free-threaded build, all threads all signaled, similar to scheduling a GC. The source of truth for the global instrumentation version is still in `interp_eval_breaker`, in both normal and free-threaded builds. Threads usually read the version from their local `eval_breaker`, where it continues to be colocated with the eval breaker bits, and the method for keeping it up to date depends on build type. All builds first update the version in `interp_eval_breaker`, and then: - Normal builds update the version in the current thread's `eval_breaker`. When a thread takes the GIL, it copies the current version from `interp_eval_breaker` as part of the same operation that copies `_PY_GC_SCHEDULED_BIT`. - Free-threaded builds again iterate over all threads in the current interpreter, updating the version on each one. Instrumentation (and the specializing interpreter more generally) will need more work to be compatible with free-threaded builds, so these changes are just intended to maintain the status quo in normal builds for now. Other notable changes are: - The `_PY_*_BIT` macros now expand to the actual bit being set, rather than the bit's index. I think this is simpler overall. I also moved their definitions from `pycore_ceval.h` to `pycore_pystate.h`, since their main usage is on `PyThreadState`s now. - Most manipulations of `eval_breaker` are done with a new pair of functions: `_PyThreadState_Signal()` and `_PyThreadState_Unsignal()`. Having two separate functions to set/unset a bit, rather than one function that takes the bit value to use, lets us use a single atomic `or`/`and`, rather than a loop around an atomic compare/exchange like the old `_Py_set_eval_breaker_bit` function. Existing tests provide pretty good coverage for most of this functionality. The one new test I added is to make sure a GC still happens if a thread schedules it then drops the GIL before the GC runs. I don't love how complicated this test ended up so I'm open to other ideas for how to test this (or other things to test in general).
This change adds an `eval_breaker` field to `PyThreadState`. The primary motivation is for performance in free-threaded builds: with thread-local eval breakers, we can stop a specific thread (e.g., for an async exception) without interrupting other threads. The source of truth for the global instrumentation version is stored in the `instrumentation_version` field in PyInterpreterState. Threads usually read the version from their local `eval_breaker`, where it continues to be colocated with the eval breaker bits.
This change adds an `eval_breaker` field to `PyThreadState`. The primary motivation is for performance in free-threaded builds: with thread-local eval breakers, we can stop a specific thread (e.g., for an async exception) without interrupting other threads. The source of truth for the global instrumentation version is stored in the `instrumentation_version` field in PyInterpreterState. Threads usually read the version from their local `eval_breaker`, where it continues to be colocated with the eval breaker bits.
This change adds an `eval_breaker` field to `PyThreadState`. The primary motivation is for performance in free-threaded builds: with thread-local eval breakers, we can stop a specific thread (e.g., for an async exception) without interrupting other threads. The source of truth for the global instrumentation version is stored in the `instrumentation_version` field in PyInterpreterState. Threads usually read the version from their local `eval_breaker`, where it continues to be colocated with the eval breaker bits.
Feature or enhancement
The
eval_breaker
is a variable that keeps track of requests to break out of the eval loop to handle things like signals, run a garbage collection, or handle asynchronous exceptions. It is currently in the interpreter state (ininterp->ceval.eval_breaker
). However, some of the events are specific to a given thread. For example, signals and some pending calls can only be executed on the "main" thread of an interpreter.We should move the
eval_breaker
toPyThreadState
to better handle these thread-specific events. This is more important for the--disable-gil
builds where multiple threads within the same interpreter may be running at the same time.@markshannon suggested a combination of per-interpreter and per-thread state, where the thread copies the per-interpreter eval_breaker state to the per-thread state when it acquires the GIL.
Linked PRs
eval_breaker
toPyThreadState
#115194The text was updated successfully, but these errors were encountered: