-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-116818: Make sys.settrace
, sys.setprofile
, and monitoring thread-safe
#116775
Conversation
f65320d
to
1b5ea7d
Compare
40ef815
to
3bcc7f0
Compare
sys.settrace
, sys.setprofile
, and monitoring thread-safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me that the strategy of either locking the code object or a stop-the-world pause is safe.
- Is locking code objects sufficient? What if other threads are executing those code objects?
- We need to be careful about what we do doing a stop-the-world pause. It's not safe to call
Py_DECREF
either directly or indirectly, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fine grained locking is the wrong approach here.
PEP 667 specifically states that setting events is expensive.
Stopping the world for setting events and locking code objects to instrument seems like the right approach to me.
@@ -23,18 +23,25 @@ extern "C" { | |||
#define FT_ATOMIC_LOAD_SSIZE(value) _Py_atomic_load_ssize(&value) | |||
#define FT_ATOMIC_LOAD_SSIZE_RELAXED(value) \ | |||
_Py_atomic_load_ssize_relaxed(&value) | |||
#define FT_ATOMIC_LOAD_PTR_ACQUIRE(value) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping things in macros often obfuscates the intent.
If I see _Py_atomic_load_ptr_acquire(&value)
I know what it does (assuming I know how atomics work)
If I see FT_ATOMIC_LOAD_PTR_ACQUIRE(value)
I need to scan the source code to find the macro definition.
Can we come up with a name that describes the semantics, not the context in which it should be used?
All these operations are thread safe on all platforms. Either the GIL makes them safe or we use hardware atomics.
So, maybe choose a name based on the semantics not the context, and drop the FT_
prefix?
OOI, do we have any evidence that relaxed/release/acquire gains us performance over sequential consistency which is a lot easier to reason about and thus safer.
Python/instrumentation.c
Outdated
} | ||
|
||
static void | ||
remove_tools(PyCodeObject * code, int offset, int event, int tools) | ||
{ | ||
Py_BEGIN_CRITICAL_SECTION(code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to assert that this is locked here and move the critical section to the DISABLE
section of call_instrumentation_vector
. _Py_Instrument
should already be locked (with stop the world)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want _Py_Instrument
to stop the world? The reason why I didn't want to have that stop the world is because it can happen outside of calls to enable tracing, e.g. in RESUME
which seems like it would be a massive slow down.
Python/instrumentation.c
Outdated
assert(instrumentation_data->per_instruction_opcodes); | ||
int next_opcode = instrumentation_data->per_instruction_opcodes[offset]; | ||
_PyCoMonitoringData *instr_data = get_monitoring(code); | ||
uint8_t *per_instr_opcodes = FT_ATOMIC_LOAD_PTR_ACQUIRE(instr_data->per_instruction_opcodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for atomic operations here. per_instruction_opcodes
is updated in _Py_Instrument
which should only be called with the world stopped, so it cannot change while threads are running.
Python/instrumentation.c
Outdated
if (tstate->tracing) { | ||
return next_opcode; | ||
} | ||
PyInterpreterState *interp = tstate->interp; | ||
uint8_t tools = instrumentation_data->per_instruction_tools != NULL ? | ||
instrumentation_data->per_instruction_tools[offset] : | ||
uint8_t *per_instr_tools = FT_ATOMIC_LOAD_PTR_ACQUIRE(instr_data->per_instruction_tools); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise
Python/instrumentation.c
Outdated
@@ -1320,15 +1375,23 @@ _PyMonitoring_RegisterCallback(int tool_id, int event_id, PyObject *obj) | |||
PyInterpreterState *is = _PyInterpreterState_GET(); | |||
assert(0 <= tool_id && tool_id < PY_MONITORING_TOOL_IDS); | |||
assert(0 <= event_id && event_id < _PY_MONITORING_EVENTS); | |||
#ifdef Py_GIL_DISABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe drop the #ifdef Py_GIL_DISABLED
and use (FT_)ATOMIC_EXCHANGE_POINTER_PTR
?
Python/instrumentation.c
Outdated
@@ -1382,9 +1445,10 @@ initialize_tools(PyCodeObject *code) | |||
#define NO_LINE -128 | |||
|
|||
static void | |||
initialize_lines(PyCodeObject *code) | |||
initialize_lines(PyCodeObject *code, _PyCoLineInstrumentationData *line_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add the extra argument?
(Same for initialize_line_tools
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was so that we could initialize the data and then publish it so any readers would either see no line data or would see the initialized line data.
Python/instrumentation.c
Outdated
monitoring->line_tools = NULL; | ||
monitoring->per_instruction_opcodes = NULL; | ||
monitoring->per_instruction_tools = NULL; | ||
FT_ATOMIC_STORE_PTR_RELEASE(code->_co_monitoring, monitoring); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the atomic operation here?
Won't this leak memory in case of a race?
I think it would be better to lock the code object when doing this initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually do lock the code object on initialization, but the current implementation was trying to avoid locking the code object on reads. The readers of _co_monitoring
are all using acquire semantics so if there is a value there then they'll also pick all of the writes to the data which it is pointing to as well. The alternative here is to either lock the code object on reads as well or to stop the world on _Py_Instrument
calls which seems to have disproportionately compared to calls to actually enable monitoring.
Python/instrumentation.c
Outdated
@@ -1746,6 +1833,10 @@ _Py_Instrument(PyCodeObject *code, PyInterpreterState *interp) | |||
|
|||
static int | |||
instrument_all_executing_code_objects(PyInterpreterState *interp) { | |||
#ifdef Py_GIL_DISABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ASSERT_WORLD_STOPPED()
, much like we use ASSERT_WORLD_STOPPED_OR_LOCKED()
instead of the #ifdef ...
?
Python/instrumentation.c
Outdated
uint32_t existing_events = get_events(&interp->monitors, tool_id); | ||
if (existing_events == events) { | ||
_PyEval_StartTheWorld(interp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than three _PyEval_StartTheWorld(interp);
calls, could you use our standard cleanup idiom of jumping to a done:
label, and calling _PyEval_StartTheWorld(interp)
there?
Note that it is not possible for a code object that needs to be instrumented to execute without hitting a |
I don't think locking alone is sufficient, unless you're referring to stopping the world to lock. If a code object starts running on one thread, hits resume, and another thread enables tracing then calls that function it's going to start mutating the code object. I'm just concerned that stopping the world to instrument every new code object is going to be extremely expensive, but if that seems like an okay performance cost to you then I'm happy to do that, it is much simpler :) |
@DinoV Did you actually request a review from me or did I just get that message because I'm a code owner? |
"""Runs once after the test is done""" | ||
pass | ||
|
||
def test_instrumention(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_instrumention(self): | |
def test_instrumentation(self): |
I think you just got it because you're a code owner. |
Ah, interesting. Thanks for digging that up. Easy solution would be to move the I'm slightly leaning towards the former, since the tier two code should bail immediately anyways upon seeing it's been invalidated, and it keeps the redundant check for the opcode out of the happy path. To the user, it would appear as if the jump happened just before tracing was enabled, rather than just after. I think that's okay. |
Hm, actually, I think that could leak the incref'ed executor if the eval breaker check raises (I forgot there's a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few random things I noticed while reading the PR
Okay, i've gone with re-checking the opcode approach. It looks like in addition to checking the opcode we've also got to check the oparg as we can seemingly end up just updating the oparg. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(You're not waiting for me are you?)
Nope, I'm more hoping to get a review from @markshannon and/or @colesbury unless you feel particularly familiar with this area :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@markshannon Do you want to take a look at this or are you happy with where it's at? |
Stops the world when we're doing a monitoring event that impacts all threads.
Updates the initialization of the monitoring data structures so that modifications generally lock the code object.
There's various side data structures depending upon the tracing modes that are enabled. These are all allocated and published once. This is necessary because
_Py_Instrument
can be called on code objects pretty freely and we probably don't want locks on the readers. Races with the actual values should be benign and just result in some delay in seeing the events being enabled/disabled.Also adds multi-threaded test cases which are passing with
PYTHON_GIL=0
.sys.settrace
,sys.setprofile
, andsys.monitoring
thread safe in--disable-gil
builds #116818