Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-114058: Foundations of the Tier2 redundancy eliminator #115085

Merged
merged 54 commits into from
Feb 13, 2024

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Feb 6, 2024

This sets up the tier 2 optimizer's redundancy eliminator foundations.

It does the following:

  • Aliasing information for types (not on all expressions though, that is too expensive)
  • Basic Type propagation
  • Basic Guard elimination

It's basically a port of #114059 over using Mark's DSL. Constant propagation and a lot of type propagation rules torn out, for the sake of simpler code for now. We can easily add more later. I just want to get the foundations in.

I expect no speedups for this, as for simpler reviewing, most of the code is torn out. However, I believe that #114059 already shows this approach has potential for speedups.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much closer to what I think we need. Thanks.

What's the API for _Py_UOpsSymType? It isn't clear what's internal and what's API.

_Py_UOpsSymType **stack_pointer;
_Py_UOpsSymType **stack;
_Py_UOpsSymType **locals;
} _Py_UOpsAbstractFrame;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use much the same layout as an actual frame and put the locals and stack at the end:

    ...
    _Py_UOpsSymType **stack_pointer;
    _Py_UOpsSymType *localsplus[1];
}

Copy link
Member Author

@Fidget-Spinner Fidget-Spinner Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially did that, but this new layout is easier for inlining, because it actually models the state if all frames were to be inlined. So it makes things like calculating the new locals offset easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we aren't inlining yet, and we might choose a different approach. Making it a single struct would make memory management simpler.

Copy link
Member Author

@Fidget-Spinner Fidget-Spinner Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now making it a single struct would make memory management a little more complex, because my "frames" are just an array of static sized structs. Their localsplus is just accessing a giant pool. frame->prev is just index--. It's all rather clean. And I just need to manage memory once instead of for 7 giant frames.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we won't be doing any allocations, just reusing a buffer on the thread-state. But's that future PR.
But we still need to manage the memory within that buffer, and monolithic frames will make that simpler.
This should be fine for now, though.

@markshannon
Copy link
Member

How hard would it be to change the API to take PyTypeObject * for the types in the API? It would make it easier to use.

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Feb 6, 2024

How hard would it be to change the API to take PyTypeObject * for the types in the API? It would make it easier to use.

Quite possible. I can just convert to the internal representation internally. (The representation is just to save space, and support unions in the future).

The only thing is there are some "types" that cant be represented using pytype object. In that case I will special case them.

@markshannon
Copy link
Member

The only thing is there are some "types" that cant be represented using pytype object. In that case I will special case them.

It is probably best to add those cases to the API.
For example if you have "not NULL" or "NULL" as nodes in the lattice, then adding a new/set_null and new/set_not_null API would make sense.


#define GETLOCAL(idx) ((ctx->frame->locals[idx]))

#define REPLACE_OP(op, arg, oper) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like macros that assume context, and we rarely (maybe never) want to set the oparg or operand.

Suggested change
#define REPLACE_OP(op, arg, oper) \
#define REPLACE_OP(INST, OP) (INST)->opcode = (OP)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need operand once we start burning in _LOAD_CONST_INLINE and friends.



static void
remove_unneeded_uops(_PyUOpInstruction *buffer, int buffer_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this changed, or has it just moved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not change it. It just moved.

// The only valid error we can raise is MemoryError.
// Other times it's not really errors but things like not being able
// to fetch a function version because the function got deleted.
return PyErr_Occurred() ? -1 : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to check PyErr_Occurred()? uop_redundancy_eliminator should not be returning -1 unless an error has occurred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the old code, constant propagation could raise MemoryError.

@markshannon markshannon self-requested a review February 13, 2024 12:16
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@Fidget-Spinner Fidget-Spinner enabled auto-merge (squash) February 13, 2024 12:20
@Fidget-Spinner Fidget-Spinner merged commit 7cce857 into python:main Feb 13, 2024
59 of 60 checks passed
@Fidget-Spinner Fidget-Spinner deleted the tier2_redundancy_eliminator branch February 13, 2024 13:24
fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this pull request Feb 14, 2024
…onGH-115085)

---------

Co-authored-by: Mark Shannon <[email protected]>
Co-authored-by: Jules <[email protected]>
Co-authored-by: Guido van Rossum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants