-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-109369: Merge all eval-breaker flags and monitoring version into one word. #109846
Conversation
Python/ceval_gil.c
Outdated
calls_to_do = _Py_atomic_load_int32_relaxed( | ||
&_PyRuntime.ceval.pending_mainthread.calls_to_do); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this fully covered by the places we call SIGNAL_PENDING_CALLS()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this case.
When running a non-main thread with pending_mainthread.calls_to_do
we don't want _PY_CALLS_TO_DO_BIT
set, to avoid constantly calling _Py_HandlePending
.
So when we switch to the main thread, we need to set _PY_CALLS_TO_DO_BIT
if pending_mainthread.calls_to_do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to #109846 (comment), why does acquiring the GIL require that we reproduce what was already done via Py_AddPendingCall()
?
Is it because we always flip the bit in _make_pending_calls()
, even if we're not in the main thread but there are global pending calls waiting. That seems like we could use a separate bit for the global pending calls.
Is i because we don't want to interrupt all the other threads while waiting for the main thread to handle its pending calls? I suppose the current approach sort of addresses that, but (as indicated in #109846 (comment)) that means we put off those extraneous interruptions only until the main thread takes the GIL again. Then the interruptions resume until the main thread handles its pending calls. That feels very jumpy. ISTM, a per-thread eval breaker would be a clearer solution.
This all applies to the signal-related code here in update_eval_breaker_from_thread()
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting that we change things up in this PR. Rather, if a less jumpy approach makes sense then we could do it in a follow-up PR.
if (tstate->async_exc != NULL) { | ||
_Py_set_eval_breaker_bit(interp, _PY_ASYNC_EXCEPTION_BIT, 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we not already setting the bit when tstate->async_exc
is set, such that we must do so here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If tstate->async_exc
is set on a sleeping thread, then we need to set the bit when that thread gets the GIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if I understand things correctly. A thread will only ever acquire the GIL either by the eval loop itself or via the C-API:
- C-API1
- before the eval loop has started
- while the eval loop is running
- after the eval loop has finished
- running eval loop--for some bytecode instructions2
On the other hand, all the "pending" actions3 happen only are performed in a running eval loop, driven by the eval breaker2. The eval loop calls _Py_HandlePending()
, which executes in this order:
- handle signals, if any (main thread only)
- make per-interpreter pending calls
- make global pending calls (main thread only)
- run GC4
- release & re-acquire the GIL (if another thread has asked for it)
- apply the async exception, if applicable (sort of thread-specific)
When the GIL is acquired via the C-API (outside a running eval loop), none of that will happen until at least one eval loop is running and it reaches one of the instructions that checks the eval breaker2.
When the GIL is acquired by a running eval loop, the first 4 pending events have already just been handled, and the last one (async exception) is about to be.
With all that in mind, the question is: how does the effect of PyThreadState_SetAsyncExc()
relate to acquiring the GIL? Hence, why should we worry about _PY_ASYNC_EXCEPTION_BIT
in update_eval_breaker_from_thread()
? PyThreadState_SetAsyncExc()
always sets tstate->async_exc
and then sets the eval breaker for the target interpreter. That eval breaker bit never gets cleared until the targeted thread takes care of event. So why do we need to set the bit when taking the GIL, regardless of if the thread is sleeping or not? It seems unnecessary.
That does lead to another question: why have the async exception be part of the eval breaker at all? It only only applies to the target thread, but the eval breaker bit is set it will interrupt all threads. Would it make sense to have a per-thread eval breaker? Then again, it only matters if async exceptions are used by the community often enough.
Footnotes
-
e.g.
PyEval_RestoreThread()
(after earlier releasing it), or for the first time when an interpreter is created or withthreading.Thread.start()
. ↩ -
the small set of instructions that invoke
CHECK_EVAL_BREAKER()
: the intermediate end of loops; after calls; sometimes forRESUME
;_JUMP_TO_TOP
andENTER_EXECUTOR
(whatever those are for). ↩ ↩2 ↩3 -
which are driven by the events represented in this PR ↩
-
GC is also triggered in other places, like the signal-related C-API ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a similar question for signals and the main thread pending calls,. However, unlike async exceptions, we know that signals are somewhat prevalent. Maybe a separate per-thread eval breaker would be worth it? Perhaps it would help to have a separate (generated) variant of the eval loop that also checks a per-thread eval breaker (or checks it exclusively)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't think we have much room to change the order of actions in _Py_HandlePending()
.
Python/ceval_gil.c
Outdated
if (_Py_atomic_load(&_PyRuntime.signals.is_tripped)) { | ||
_Py_set_eval_breaker_bit(interp, _PY_SIGNALS_PENDING_BIT, 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is something the signals module can't do directly. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the main thread can handle signals. So if the main thread is not the running thread when a signal happens, we don't want to set the eval breaker bit then, but we do want to set it when the main thread gets the GIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #109846 (comment).
_PyEval_SignalAsyncExc(tstate->interp); | ||
} | ||
RESET_GIL_DROP_REQUEST(interp); | ||
update_eval_breaker_from_thread(interp, tstate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems less critical since we have the single source of truth now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important, as _PY_CALLS_TO_DO_BIT
, _PY_SIGNALS_PENDING_BIT
and _PY_ASYNC_EXCEPTION_BIT
are, to varying degrees, per-thread.
Whenever we switch thread we need to update them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it comes back to how we are being a little tricky with unsetting certain bits when the apply to the main thread and setting them again here when the main thread takes the GIL.
I would have expected a slight speedup, due to the simpler check, but it doesn't seem to make a difference. |
return -1; | ||
if (_Py_eval_breaker_bit_is_set(interp, _PY_ASYNC_EXCEPTION_BIT)) { | ||
_Py_set_eval_breaker_bit(interp, _PY_ASYNC_EXCEPTION_BIT, 0); | ||
if (tstate->async_exc != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the bit being set imply an exception is set? If so, this should be an assert. Furthermore, PyThreadState_SetAsyncExc()
should probably be fixed to unset _PY_ASYNC_EXCEPTION_BIT
when NULL
is passed in.
if (tstate->async_exc != NULL) { | ||
_Py_set_eval_breaker_bit(interp, _PY_ASYNC_EXCEPTION_BIT, 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if I understand things correctly. A thread will only ever acquire the GIL either by the eval loop itself or via the C-API:
- C-API1
- before the eval loop has started
- while the eval loop is running
- after the eval loop has finished
- running eval loop--for some bytecode instructions2
On the other hand, all the "pending" actions3 happen only are performed in a running eval loop, driven by the eval breaker2. The eval loop calls _Py_HandlePending()
, which executes in this order:
- handle signals, if any (main thread only)
- make per-interpreter pending calls
- make global pending calls (main thread only)
- run GC4
- release & re-acquire the GIL (if another thread has asked for it)
- apply the async exception, if applicable (sort of thread-specific)
When the GIL is acquired via the C-API (outside a running eval loop), none of that will happen until at least one eval loop is running and it reaches one of the instructions that checks the eval breaker2.
When the GIL is acquired by a running eval loop, the first 4 pending events have already just been handled, and the last one (async exception) is about to be.
With all that in mind, the question is: how does the effect of PyThreadState_SetAsyncExc()
relate to acquiring the GIL? Hence, why should we worry about _PY_ASYNC_EXCEPTION_BIT
in update_eval_breaker_from_thread()
? PyThreadState_SetAsyncExc()
always sets tstate->async_exc
and then sets the eval breaker for the target interpreter. That eval breaker bit never gets cleared until the targeted thread takes care of event. So why do we need to set the bit when taking the GIL, regardless of if the thread is sleeping or not? It seems unnecessary.
That does lead to another question: why have the async exception be part of the eval breaker at all? It only only applies to the target thread, but the eval breaker bit is set it will interrupt all threads. Would it make sense to have a per-thread eval breaker? Then again, it only matters if async exceptions are used by the community often enough.
Footnotes
-
e.g.
PyEval_RestoreThread()
(after earlier releasing it), or for the first time when an interpreter is created or withthreading.Thread.start()
. ↩ -
the small set of instructions that invoke
CHECK_EVAL_BREAKER()
: the intermediate end of loops; after calls; sometimes forRESUME
;_JUMP_TO_TOP
andENTER_EXECUTOR
(whatever those are for). ↩ ↩2 ↩3 -
which are driven by the events represented in this PR ↩
-
GC is also triggered in other places, like the signal-related C-API ↩
If I suspect that a per-thread eval-breaker would be more efficient and simpler. But that's for another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're on the same page about the possible per-thread eval breaker. So mostly LGTM.
I'm approving the PR, but please address the one new comment I've left before merging.
200f545
to
da0d844
Compare
Merge the various eval-breaker flags and monitoring version into a single atomic value to allow faster checking by combining the eval-breaker check and the monitoring version check.