-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-116168: Remove extra _CHECK_STACK_SPACE
uops
#117242
gh-116168: Remove extra _CHECK_STACK_SPACE
uops
#117242
Conversation
…ove-extra-_check_stack_space-ops
…ove-extra-_check_stack_space-ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Just one comment for now.
…ove-extra-_check_stack_space-ops
…ove-extra-_check_stack_space-ops
Not sure if this needs news, but I'm happy to add it if necessary. The failing test is a timeout that seems to be occurring on more than just my branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome work! I just have a few comments about the tests.
Modules/_testinternalcapi.c
Outdated
@@ -957,6 +957,16 @@ iframe_getlasti(PyObject *self, PyObject *frame) | |||
return PyLong_FromLong(PyUnstable_InterpreterFrame_GetLasti(f)); | |||
} | |||
|
|||
static PyObject * | |||
get_co_framesize(PyObject *self, PyObject *arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how comprehensive this test is. But I don't think we shouldn't test for such implementation details. We should just test the bytecode trace is correct and that's the limit.
Also, _CHECK_STACK_SPACE
shouldn't affect frame size. It's a check after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to remove the assertions for exact framesizes, but I'd still like to keep this function. It's useful to help determine that _CHECK_STACK_SPACE_OPERAND
is assigned the correct operand.
To be clear, I'm not worried about my changes affecting the framesizes themselves, just that the calculated total stack space required is accurate.
Python/optimizer_analysis.c
Outdated
} | ||
} | ||
} | ||
Py_FatalError("No terminating instruction"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Py_UNREACHABLE alone is enough usually.
Py_FatalError("No terminating instruction"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this.
The approach looks sound, and the tests thorough.
I have a few suggestions, specifically that the pass by added to optimize_uops
to avoid adding another pass.
Lib/test/test_capi/test_opt.py
Outdated
def _assert_framesize_cross_platform(self, fn_obj, expected_framesize): | ||
self.assertEqual( | ||
_testinternalcapi.get_co_framesize(fn_obj.__code__), | ||
expected_framesize + self.FRAMESIZE_ADJUSTMENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear why this adjustment is necessary.
Also, would it be cleaner to do the adjustment in _testinternalcapi.get_co_framesize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear to me either why framesizes vary across platforms 😅 but based on other comments I'm thinking that this assertion is overly specific and a bit brittle, so I'm just going to remove it. I don't think I want to adjust get_co_framesize
, since I need this to report the actual framesize in every case
Python/optimizer_analysis.c
Outdated
return co; | ||
} | ||
|
||
// _PUSH_FRAME is 3 ops in front of _CHECK_STACK_SPACE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fragile, and I don't think you need it if you track the current stack depth record the location of the previous _CHECK_STACK_SPACE
and patch it in _PUSH_FRAME
.
Python/optimizer_analysis.c
Outdated
* calls in this trace. | ||
*/ | ||
static void | ||
combine_stack_space_checks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than make this a separate pass, could you integrate into optimize_uops
by adding it to optimizer_bytecodes.c
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to do this, but to confirm - do you definitely prefer integration into optimize_uops
over peephole_opt
or remove_unneeded_uops
? Any of these would prevent an extra pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fidget-Spinner may also have an opinion here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both peephole_opt
or optimize_uops
is fine to me. peephole_opt
would likely be clearer as this pass does not use any information from the abstract interpreter.
Python/optimizer_analysis.c
Outdated
bool must_exit = false; | ||
#endif | ||
int depth = 0; | ||
uint32_t space_at_depth[MAX_ABSTRACT_FRAME_DEPTH]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only need to track the current depth, as this is a single pass algorithm.
In fact, this may be incorrect, if more than one function is called with differing stack requirements.
I think, only the following variables are needed:
- The first check (address or offset)
- The current stack size
- The maximum requested stack size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus the last seen stack check, to avoid hard-coding the offset between CHECK_STACK_SPACE
and _PUSH_FRAME
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion - I was able to remove my makeshift stack. I had to use one additional variable to keep track of the current frame's size, since there's no way I could see to access this from _POP_FRAME
(we only have the code object being returned to, not the one we're returning from)
Python/optimizer_analysis.c
Outdated
if (first_valid_check_stack != NULL) { | ||
assert(first_valid_check_stack->opcode == _CHECK_STACK_SPACE); | ||
assert(max_space > 0); | ||
assert(max_space < INT_MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(max_space < INT_MAX); | |
if (max_space < INT_MAX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reduce the limit to UNIT16_MAX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, setting a valid _CHECK_STACK_SPACE_OPERAND
is mandatory if even a single _CHECK_STACK_SPACE
is deleted during this pass, so I can't make this conditional on the value of max_space
. I'm going to add some additional checks to ensure max_space
is always an acceptable value and rework the exit logic
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
…ove-extra-_check_stack_space-ops
…ove-extra-_check_stack_space-ops
I have made the requested changes; please review again |
Thanks for making the requested changes! @markshannon: please review the changes made to this pull request. |
I think this is fine to merge now. Will wait a few days if Mark has any reviews left. |
I am working on one last review. Looks like just some style nits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I do have a few style nits. Once you fix those I will gladly merge it!
Python/optimizer_analysis.c
Outdated
/* _PUSH_FRAME/_POP_FRAME's operand can be 0, a PyFunctionObject *, or a | ||
* PyCodeObject *. Retrieve the code object if possible. | ||
*/ | ||
static PyCodeObject *get_co(_PyUOpInstruction *op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit:
static PyCodeObject *get_co(_PyUOpInstruction *op) { | |
static PyCodeObject * | |
get_co(_PyUOpInstruction *op) | |
{ |
Lib/test/test_capi/test_opt.py
Outdated
@@ -4,6 +4,7 @@ | |||
import unittest | |||
import gc | |||
import os | |||
import platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unused?
Lib/test/test_capi/test_opt.py
Outdated
self.assertEqual(res, 832) | ||
self.assertIsNotNone(ex) | ||
|
||
uops_and_operands = [(opcode, operand) for opcode, _, _, operand in list(ex)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list()
call is superfluous:
uops_and_operands = [(opcode, operand) for opcode, _, _, operand in list(ex)] | |
uops_and_operands = [(opcode, operand) for opcode, _, _, operand in ex] |
(Ditto below.)
c, d = dummy13(9) | ||
a += b + c + d | ||
return a | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One blank line is enough. :-)
(Ditto below.)
self.assertEqual(uop_names.count("_PUSH_FRAME"), 2) | ||
self.assertEqual(uop_names.count("_CHECK_STACK_SPACE_OPERAND"), 1) | ||
|
||
# this hits a different case during trace projection in refcount test runs only, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you've made me curious -- what does the refcount test do different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gvanrossum I'm still not sure why this is happening, but here's the behavior:
On the first round of a refcount test, trace projection appears to proceed normally. We end up with a large trace that's cut off part of the way through the uops for dummy_large
. On all subsequent rounds, we end up in the new_code == NULL
case while emitting _PUSH_FRAME
in trace projection (code).
That prevents me from having access to a code object in peephole optimization, so I can't remove the corresponding _CHECK_STACK_SPACE
.
Interestingly, we never hit new_code == NULL
in testing with --forever
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably refcount tests do other weird stuff and maybe somehow the func/code version cache entry is overwritten? Let's not worry about it, refcount tests are always weird.
Modules/_testinternalcapi.c
Outdated
@@ -957,6 +957,16 @@ iframe_getlasti(PyObject *self, PyObject *frame) | |||
return PyLong_FromLong(PyUnstable_InterpreterFrame_GetLasti(f)); | |||
} | |||
|
|||
static PyObject * | |||
get_co_framesize(PyObject *self, PyObject *arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit
get_co_framesize(PyObject *self, PyObject *arg) { | |
get_co_framesize(PyObject *self, PyObject *arg) | |
{ |
int framesize = co->co_framesize; | ||
assert(framesize > 0); | ||
curr_space += framesize; | ||
if (curr_space < 0 || curr_space > INT32_MAX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second check cannot succeed when int
is 32 bits, right? I guess the compiler will then omit the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. I was worried about cases where int
is larger than the 32-bit operand. I'm comparing to INT32_MAX
instead of UINT32_MAX
to avoid compiler warnings; technically that makes this comparison 1 bit more strict than necessary, but I imagine this is an edge case that will almost never be hit anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 actually, the comparison isn't overly strict, since I'm already requiring this number to be a positive int. Anyway, I'm still convinced this is a necessary check.
Python/optimizer_analysis.c
Outdated
assert(framesize > 0); | ||
curr_space += framesize; | ||
if (curr_space < 0 || curr_space > INT32_MAX) { | ||
// overflow or won't fit in operand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will fit in operand
(which is always 64 bits) but not in the cache entry for the new instruction (which is always unsigned 32 bits) or won't be able to be converted to signed int (which it must be for the _PyThreadState_HasStackSpace()
call). So the check is right but the comment feels like it is misrepresenting the situation. My suggestion:
// overflow or won't fit in operand | |
// won't fit in signed 32-bit int |
…ove-extra-_check_stack_space-ops
I've addressed all comments and the pipeline looks good, so this is once again ready for review. |
Lib/test/test_capi/test_opt.py
Outdated
# a99999 = a99998 + 1 | ||
# return a99999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably more this, since repetitions is 10000:
# a99999 = a99998 + 1 | |
# return a99999 | |
# a9999 = a9998 + 1 | |
# return a9999 |
Can you run |
…ove-extra-_check_stack_space-ops
👍 just ran |
Thank you all for your patience and guidance on this work! I learned a lot throughout the process and quite enjoyed it. |
Thank you! Looking forward to more contributions from you. |
This merges all `_CHECK_STACK_SPACE` uops in a trace into a single `_CHECK_STACK_SPACE_OPERAND` uop that checks whether there is enough stack space for all calls included in the entire trace.
This PR reduces the number of
_CHECK_STACK_SPACE
operands in tier2 bytecode.Here's the basic design:
_CHECK_STACK_SPACE_OPERAND
, which contains the amount space to check in its operand._CHECK_STACK_SPACE
s into a single_CHECK_STACK_SPACE_OPERAND
.potential changes / refactors to consider:
combine_stack_space_checks
withpeephole_opt
orremove_unneeded_uops
?#ifdef Py_DEBUG
s in the core logic here are kind of distracting. Not sure if there's a more preferable way to implement these safety checks.I'm wide open to feedback / criticism / suggestions!