-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-118095: Use broader specializations in tier 1, for better tier 2 support of calls. #118322
Conversation
c6c1f31
to
dce9b23
Compare
Finally found the bug (took me way too long). This fixes it: diff --git a/Python/bytecodes.c b/Python/bytecodes.c
index c54763ce719..689bb9cbb57 100644
--- a/Python/bytecodes.c
+++ b/Python/bytecodes.c
@@ -3171,15 +3171,10 @@ dummy_func(
frame->return_offset = (uint16_t)(1 + INLINE_CACHE_ENTRIES_CALL);
}
- op(_CHECK_IS_FUNCTION, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) {
- DEOPT_IF(!PyFunction_Check(callable));
- }
-
macro(CALL_PY_GENERAL) =
unused/1 + // Skip over the counter
- unused/2 +
_CHECK_PEP_523 +
- _CHECK_IS_FUNCTION +
+ _CHECK_FUNCTION +
_CALL_PY_GENERAL +
_PUSH_FRAME;
|
I created an issue for this: #118540, but I'll do the simpler fix you suggest for 3.13. |
Stats look good. ~13% reduction in tier 1 instructions executed and a ~5% increase in tier 2 uops executed with an increase of uops/trace from 26.3 to 28.5 and 3% reduction in the number of traces executed. |
Tier 1 performance is neutral (1.00x faster) |
Python/bytecodes.c
Outdated
if (new_frame == NULL) { | ||
ERROR_NO_POP(); | ||
} | ||
frame->return_offset = (uint16_t)(1 + INLINE_CACHE_ENTRIES_CALL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use _SAVE_RETURN_OFFSET
for this bit?
Python/bytecodes.c
Outdated
op(_CHECK_IS_NOT_PY_CALLABLE, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) { | ||
DEOPT_IF(PyFunction_Check(callable)); | ||
DEOPT_IF(Py_TYPE(callable) == &PyMethod_Type); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe these are better as exits so we can handle more cases? If we trace through this, it'd be a shame if we couldn't then stitch a function or method into the call site if its polymorphic.
op(_CHECK_IS_NOT_PY_CALLABLE, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) { | |
DEOPT_IF(PyFunction_Check(callable)); | |
DEOPT_IF(Py_TYPE(callable) == &PyMethod_Type); | |
} | |
op(_CHECK_IS_NOT_PY_CALLABLE, (callable, unused, unused[oparg] -- callable, unused, unused[oparg])) { | |
EXIT_IF(PyFunction_Check(callable)); | |
EXIT_IF(Py_TYPE(callable) == &PyMethod_Type); | |
} |
@@ -3226,40 +3326,6 @@ dummy_func( | |||
_SAVE_RETURN_OFFSET + | |||
_PUSH_FRAME; | |||
|
|||
inst(CALL_PY_WITH_DEFAULTS, (unused/1, func_version/2, callable, self_or_null, args[oparg] -- unused)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, this isn't worth specializing anymore? I forget exactly why it can't be converted to tier two (I'm guessing that there's more than one reason, since the use of this_instr
can be easily worked around).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reasons:
- Because this loops over both arguments and defaults, it probably isn't much faster than
CALL_PY_GENERAL
for tier 1. It is almost certainly slower in the JIT because of its size.CALL_PY_GENERAL
calls a help function, so is more compact. - We expect that, for 3.14, the tier 2 optimizer will convert them both to the same optimal sequence of operations.
Python/ceval.c
Outdated
@@ -107,7 +107,7 @@ static void | |||
dump_stack(_PyInterpreterFrame *frame, PyObject **stack_pointer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the changes in this function intentional, or left over from debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left over from debugging, I needed them as dump_stack
was too slow for a realistic number (many millions) of instructions for finding bugs.
It might be worth making this change, but that's for another PR.
@@ -1789,8 +1789,7 @@ specialize_class_call(PyObject *callable, _Py_CODEUNIT *instr, int nargs) | |||
return -1; | |||
} | |||
if (Py_TYPE(tp) != &PyType_Type) { | |||
SPECIALIZATION_FAIL(CALL, SPEC_FAIL_CALL_METACLASS); | |||
return -1; | |||
goto generic; | |||
} | |||
if (tp->tp_new == PyBaseObject_Type.tp_new) { | |||
PyFunctionObject *init = get_init_for_simple_managed_python_class(tp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GH won't let me comment outside the diff, but below this line there's a failure path if we're out of type versions. We could go generic in that case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but I don't want to hide any errors.
int argcount = -1; | ||
if (kind == SPEC_FAIL_CODE_NOT_OPTIMIZED) { | ||
SPECIALIZATION_FAIL(CALL, SPEC_FAIL_CODE_NOT_OPTIMIZED); | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CALL_PY_GENERAL
can handle this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I don't want to hide these cases, otherwise we will never know if they are a problem.
We could record stats for failed specialization that have a fall back, like this case, but I think that's a bit too intrusive for 3.13.
int version = _PyFunction_GetVersionForCurrentState(func); | ||
if (version == 0) { | ||
SPECIALIZATION_FAIL(CALL, SPEC_FAIL_OUT_OF_VERSIONS); | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we fix the version checks in tier two, we could handle this too, right? Since we don't care about function version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@@ -1789,8 +1789,7 @@ specialize_class_call(PyObject *callable, _Py_CODEUNIT *instr, int nargs) | |||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle this with CALL_NON_PY_GENERAL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but in specialize_class_call
.
It keeps the logic cleaner if we don't make assumptions about how specialize_class_call
could fail here.
} | ||
#endif // Py_STATS | ||
|
||
|
||
void | ||
_Py_Specialize_Call(PyObject *callable, _Py_CODEUNIT *instr, int nargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool. Am I correct that we are now (in theory) able to specialize all calls, except for those where:
- We're out of versions somewhere.
- The call itself is invalid.
And even these can be specialized, even if it doesn't make a whole lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, more or less. There might be a few other cases, but they are all very rare.
@@ -718,7 +727,7 @@ dummy_func(void) { | |||
if (first_valid_check_stack == NULL) { | |||
first_valid_check_stack = corresponding_check_stack; | |||
} | |||
else { | |||
else if (corresponding_check_stack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corresponding_check_stack
may be NULL for PY_CALL_GENERAL
.
…etter tier 2 support of calls. (pythonGH-118322) * Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations. * Remove CALL_PY_WITH_DEFAULTS specialization * Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
Co-Authored-By: Ken Jin <[email protected]> Signed-off-by: Manjusaka <[email protected]>
Co-Authored-By: Ken Jin <[email protected]> Signed-off-by: Manjusaka <[email protected]>
Co-Authored-By: Ken Jin <[email protected]> Signed-off-by: Manjusaka <[email protected]>
… introduced in pythongh-118322 Co-Authored-By: Ken Jin <[email protected]> Signed-off-by: Manjusaka <[email protected]>
… introduced in pythongh-118322 Co-Authored-By: Ken Jin <[email protected]> Signed-off-by: Manjusaka <[email protected]>
gh-118322 (GH-120712) Co-authored-by: Ken Jin <[email protected]>
…roduced in gh-118322 (GH-120712) (#120747) [3.13] gh-120437: Fix `_CHECK_STACK_SPACE` optimization problems introduced in gh-118322 (GH-120712) Signed-off-by: Manjusaka <[email protected]> Co-authored-by: Ken Jin <[email protected]>
…duced in pythongh-118322 (pythonGH-120712) Co-authored-by: Ken Jin <[email protected]>
…duced in pythongh-118322 (pythonGH-120712) Co-authored-by: Ken Jin <[email protected]>
…duced in pythongh-118322 (pythonGH-120712) Co-authored-by: Ken Jin <[email protected]>
There is a bug in the tier 2 code, and this needs to be benchmarked.Bug is fixed. See below for benchmarking and stats