Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-118093: Handle some polymorphism before requiring progress in tier two #122843

Merged
merged 11 commits into from
Aug 12, 2024

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Aug 8, 2024

Despite our best intentions, we currently don't handle polymorphism at all in tier two, since our current forward progress requirement means that we need to deopt the first instruction of every new trace.

This changes our trace stitching for side exits to not require progress until a certain tree depth is reached (currently four). It also changes the optimizer to rejoin with any traces encountered during projection, since this does a better job of keeping us on trace in loops (we can take four side exits per iteration before requiring progress) and keeps the amount of tier two code from exploding.

As a simple microbenchmark, this goes from ~10% slower when run with the JIT enabled to ~25% faster after this change.

class A:
    def f(self):
        return True

class B:
    def f(self):
        return False

def main():
    a, b = A(), B()
    for _ in range(100_000_000):
        a, b = b, a
        a.f()

main()

Overall, the benchmarks are 0.6% faster, including nice improvements on all of our interpreter-heavy benchmarks. The stats show that we're actually executing more traces and a bit (~1%) more tier one code now, but that's expected since side exit chains require up to 4x as much "warming up" as before.

@brandtbucher brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Aug 8, 2024
@brandtbucher brandtbucher self-assigned this Aug 8, 2024
@@ -35,6 +35,7 @@ typedef struct {
_PyBloomFilter bloom;
_PyExecutorLinkListNode links;
PyCodeObject *code; // Weak (NULL if no corresponding ENTER_EXECUTOR).
int chain_depth;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only 2 bits of information, and valid and linked are only 1 bit each, could we pack them together?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I packed these three into bitfields of a uint16_t.

@@ -111,7 +111,7 @@ never_optimize(
_PyInterpreterFrame *frame,
_Py_CODEUNIT *instr,
_PyExecutorObject **exec,
int Py_UNUSED(stack_entries))
int Py_UNUSED(stack_entries), bool Py_UNUSED(progress_needed))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put progress_needed on its own line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here and elsewhere.

{
// The first instruction in a chain and the MAX_CHAIN_DEPTH'th instruction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "the first instruction in the last executor in a chain"?
I read this as meaning that the first and fourth instructions must make progress.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced both "instruction"s with "executor"s.

@@ -602,13 +606,17 @@ translate_bytecode_to_trace(
goto done;
}
}
if (opcode == ENTER_EXECUTOR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we are done?
If we are wanting specialized traces, they may overlap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment justifying the decision.

@brandtbucher brandtbucher merged commit 9621a7d into python:main Aug 12, 2024
55 of 57 checks passed
blhsing pushed a commit to blhsing/cpython that referenced this pull request Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants