-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-118093: Handle some polymorphism before requiring progress in tier two #122843
Conversation
Include/internal/pycore_optimizer.h
Outdated
@@ -35,6 +35,7 @@ typedef struct { | |||
_PyBloomFilter bloom; | |||
_PyExecutorLinkListNode links; | |||
PyCodeObject *code; // Weak (NULL if no corresponding ENTER_EXECUTOR). | |||
int chain_depth; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is only 2 bits of information, and valid and linked are only 1 bit each, could we pack them together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I packed these three into bitfields of a uint16_t
.
Python/optimizer.c
Outdated
@@ -111,7 +111,7 @@ never_optimize( | |||
_PyInterpreterFrame *frame, | |||
_Py_CODEUNIT *instr, | |||
_PyExecutorObject **exec, | |||
int Py_UNUSED(stack_entries)) | |||
int Py_UNUSED(stack_entries), bool Py_UNUSED(progress_needed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put progress_needed
on its own line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here and elsewhere.
Python/optimizer.c
Outdated
{ | ||
// The first instruction in a chain and the MAX_CHAIN_DEPTH'th instruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "the first instruction in the last executor in a chain"?
I read this as meaning that the first and fourth instructions must make progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced both "instruction"s with "executor"s.
@@ -602,13 +606,17 @@ translate_bytecode_to_trace( | |||
goto done; | |||
} | |||
} | |||
if (opcode == ENTER_EXECUTOR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we are done?
If we are wanting specialized traces, they may overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment justifying the decision.
Despite our best intentions, we currently don't handle polymorphism at all in tier two, since our current forward progress requirement means that we need to deopt the first instruction of every new trace.
This changes our trace stitching for side exits to not require progress until a certain tree depth is reached (currently four). It also changes the optimizer to rejoin with any traces encountered during projection, since this does a better job of keeping us on trace in loops (we can take four side exits per iteration before requiring progress) and keeps the amount of tier two code from exploding.
As a simple microbenchmark, this goes from ~10% slower when run with the JIT enabled to ~25% faster after this change.
Overall, the benchmarks are 0.6% faster, including nice improvements on all of our interpreter-heavy benchmarks. The stats show that we're actually executing more traces and a bit (~1%) more tier one code now, but that's expected since side exit chains require up to 4x as much "warming up" as before.