Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-109369: Merge all eval-breaker flags and monitoring version into one word. #109846

Merged
merged 20 commits into from
Oct 4, 2023

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Sep 25, 2023

Merge the various eval-breaker flags and monitoring version into a single atomic value to allow faster checking by combining the eval-breaker check and the monitoring version check.

Python/ceval_gil.c Outdated Show resolved Hide resolved
Python/ceval_gil.c Show resolved Hide resolved
Comment on lines 69 to 70
calls_to_do = _Py_atomic_load_int32_relaxed(
&_PyRuntime.ceval.pending_mainthread.calls_to_do);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this fully covered by the places we call SIGNAL_PENDING_CALLS()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this case.

When running a non-main thread with pending_mainthread.calls_to_do we don't want _PY_CALLS_TO_DO_BIT set, to avoid constantly calling _Py_HandlePending.
So when we switch to the main thread, we need to set _PY_CALLS_TO_DO_BIT if pending_mainthread.calls_to_do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to #109846 (comment), why does acquiring the GIL require that we reproduce what was already done via Py_AddPendingCall()?

Is it because we always flip the bit in _make_pending_calls(), even if we're not in the main thread but there are global pending calls waiting. That seems like we could use a separate bit for the global pending calls.

Is i because we don't want to interrupt all the other threads while waiting for the main thread to handle its pending calls? I suppose the current approach sort of addresses that, but (as indicated in #109846 (comment)) that means we put off those extraneous interruptions only until the main thread takes the GIL again. Then the interruptions resume until the main thread handles its pending calls. That feels very jumpy. ISTM, a per-thread eval breaker would be a clearer solution.

This all applies to the signal-related code here in update_eval_breaker_from_thread() too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting that we change things up in this PR. Rather, if a less jumpy approach makes sense then we could do it in a follow-up PR.

Comment on lines +82 to +84
if (tstate->async_exc != NULL) {
_Py_set_eval_breaker_bit(interp, _PY_ASYNC_EXCEPTION_BIT, 1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we not already setting the bit when tstate->async_exc is set, such that we must do so here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tstate->async_exc is set on a sleeping thread, then we need to set the bit when that thread gets the GIL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if I understand things correctly. A thread will only ever acquire the GIL either by the eval loop itself or via the C-API:

  • C-API1
    • before the eval loop has started
    • while the eval loop is running
    • after the eval loop has finished
  • running eval loop--for some bytecode instructions2

On the other hand, all the "pending" actions3 happen only are performed in a running eval loop, driven by the eval breaker2. The eval loop calls _Py_HandlePending(), which executes in this order:

  1. handle signals, if any (main thread only)
  2. make per-interpreter pending calls
  3. make global pending calls (main thread only)
  4. run GC4
  5. release & re-acquire the GIL (if another thread has asked for it)
  6. apply the async exception, if applicable (sort of thread-specific)

When the GIL is acquired via the C-API (outside a running eval loop), none of that will happen until at least one eval loop is running and it reaches one of the instructions that checks the eval breaker2.

When the GIL is acquired by a running eval loop, the first 4 pending events have already just been handled, and the last one (async exception) is about to be.


With all that in mind, the question is: how does the effect of PyThreadState_SetAsyncExc() relate to acquiring the GIL? Hence, why should we worry about _PY_ASYNC_EXCEPTION_BIT in update_eval_breaker_from_thread()? PyThreadState_SetAsyncExc() always sets tstate->async_exc and then sets the eval breaker for the target interpreter. That eval breaker bit never gets cleared until the targeted thread takes care of event. So why do we need to set the bit when taking the GIL, regardless of if the thread is sleeping or not? It seems unnecessary.

That does lead to another question: why have the async exception be part of the eval breaker at all? It only only applies to the target thread, but the eval breaker bit is set it will interrupt all threads. Would it make sense to have a per-thread eval breaker? Then again, it only matters if async exceptions are used by the community often enough.

Footnotes

  1. e.g. PyEval_RestoreThread() (after earlier releasing it), or for the first time when an interpreter is created or with threading.Thread.start().

  2. the small set of instructions that invoke CHECK_EVAL_BREAKER(): the intermediate end of loops; after calls; sometimes for RESUME; _JUMP_TO_TOP and ENTER_EXECUTOR (whatever those are for). 2 3

  3. which are driven by the events represented in this PR

  4. GC is also triggered in other places, like the signal-related C-API

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a similar question for signals and the main thread pending calls,. However, unlike async exceptions, we know that signals are somewhat prevalent. Maybe a separate per-thread eval breaker would be worth it? Perhaps it would help to have a separate (generated) variant of the eval loop that also checks a per-thread eval breaker (or checks it exclusively)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't think we have much room to change the order of actions in _Py_HandlePending().

Comment on lines 71 to 73
if (_Py_atomic_load(&_PyRuntime.signals.is_tripped)) {
_Py_set_eval_breaker_bit(interp, _PY_SIGNALS_PENDING_BIT, 1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is something the signals module can't do directly. Is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the main thread can handle signals. So if the main thread is not the running thread when a signal happens, we don't want to set the eval breaker bit then, but we do want to set it when the main thread gets the GIL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyEval_SignalAsyncExc(tstate->interp);
}
RESET_GIL_DROP_REQUEST(interp);
update_eval_breaker_from_thread(interp, tstate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems less critical since we have the single source of truth now.

Copy link
Member Author

@markshannon markshannon Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important, as _PY_CALLS_TO_DO_BIT, _PY_SIGNALS_PENDING_BIT and _PY_ASYNC_EXCEPTION_BIT are, to varying degrees, per-thread.
Whenever we switch thread we need to update them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it comes back to how we are being a little tricky with unsetting certain bits when the apply to the main thread and setting them again here when the main thread takes the GIL.

@markshannon
Copy link
Member Author

I would have expected a slight speedup, due to the simpler check, but it doesn't seem to make a difference.
Nominally 0.2% slower, but in the noise

return -1;
if (_Py_eval_breaker_bit_is_set(interp, _PY_ASYNC_EXCEPTION_BIT)) {
_Py_set_eval_breaker_bit(interp, _PY_ASYNC_EXCEPTION_BIT, 0);
if (tstate->async_exc != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the bit being set imply an exception is set? If so, this should be an assert. Furthermore, PyThreadState_SetAsyncExc() should probably be fixed to unset _PY_ASYNC_EXCEPTION_BIT when NULL is passed in.

Comment on lines +82 to +84
if (tstate->async_exc != NULL) {
_Py_set_eval_breaker_bit(interp, _PY_ASYNC_EXCEPTION_BIT, 1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if I understand things correctly. A thread will only ever acquire the GIL either by the eval loop itself or via the C-API:

  • C-API1
    • before the eval loop has started
    • while the eval loop is running
    • after the eval loop has finished
  • running eval loop--for some bytecode instructions2

On the other hand, all the "pending" actions3 happen only are performed in a running eval loop, driven by the eval breaker2. The eval loop calls _Py_HandlePending(), which executes in this order:

  1. handle signals, if any (main thread only)
  2. make per-interpreter pending calls
  3. make global pending calls (main thread only)
  4. run GC4
  5. release & re-acquire the GIL (if another thread has asked for it)
  6. apply the async exception, if applicable (sort of thread-specific)

When the GIL is acquired via the C-API (outside a running eval loop), none of that will happen until at least one eval loop is running and it reaches one of the instructions that checks the eval breaker2.

When the GIL is acquired by a running eval loop, the first 4 pending events have already just been handled, and the last one (async exception) is about to be.


With all that in mind, the question is: how does the effect of PyThreadState_SetAsyncExc() relate to acquiring the GIL? Hence, why should we worry about _PY_ASYNC_EXCEPTION_BIT in update_eval_breaker_from_thread()? PyThreadState_SetAsyncExc() always sets tstate->async_exc and then sets the eval breaker for the target interpreter. That eval breaker bit never gets cleared until the targeted thread takes care of event. So why do we need to set the bit when taking the GIL, regardless of if the thread is sleeping or not? It seems unnecessary.

That does lead to another question: why have the async exception be part of the eval breaker at all? It only only applies to the target thread, but the eval breaker bit is set it will interrupt all threads. Would it make sense to have a per-thread eval breaker? Then again, it only matters if async exceptions are used by the community often enough.

Footnotes

  1. e.g. PyEval_RestoreThread() (after earlier releasing it), or for the first time when an interpreter is created or with threading.Thread.start().

  2. the small set of instructions that invoke CHECK_EVAL_BREAKER(): the intermediate end of loops; after calls; sometimes for RESUME; _JUMP_TO_TOP and ENTER_EXECUTOR (whatever those are for). 2 3

  3. which are driven by the events represented in this PR

  4. GC is also triggered in other places, like the signal-related C-API

@markshannon
Copy link
Member Author

Hence, why should we worry about _PY_ASYNC_EXCEPTION_BIT in update_eval_breaker_from_thread()? PyThreadState_SetAsyncExc() always sets tstate->async_exc and then sets the eval breaker for the target interpreter. That eval breaker bit never gets cleared until the targeted thread takes care of event. So why do we need to set the bit when taking the GIL, regardless of if the thread is sleeping or not? It seems unnecessary.

If async_exc is set for one thread, we don't want the eval breaker bit set when other threads are running, otherwise we constantly be calling _Py_HandlePending() to no effect, which would be terrible for performance.

I suspect that a per-thread eval-breaker would be more efficient and simpler. But that's for another PR.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're on the same page about the possible per-thread eval breaker. So mostly LGTM.

I'm approving the PR, but please address the one new comment I've left before merging.

Python/ceval_gil.c Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants