-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-114058: Foundations of the Tier2 redundancy eliminator #115085
gh-114058: Foundations of the Tier2 redundancy eliminator #115085
Conversation
Co-Authored-By: Mark Shannon <[email protected]>
Co-Authored-By: Jules <[email protected]> Co-Authored-By: Guido van Rossum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much closer to what I think we need. Thanks.
What's the API for _Py_UOpsSymType
? It isn't clear what's internal and what's API.
_Py_UOpsSymType **stack_pointer; | ||
_Py_UOpsSymType **stack; | ||
_Py_UOpsSymType **locals; | ||
} _Py_UOpsAbstractFrame; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use much the same layout as an actual frame and put the locals and stack at the end:
...
_Py_UOpsSymType **stack_pointer;
_Py_UOpsSymType *localsplus[1];
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially did that, but this new layout is easier for inlining, because it actually models the state if all frames were to be inlined. So it makes things like calculating the new locals offset easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we aren't inlining yet, and we might choose a different approach. Making it a single struct would make memory management simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now making it a single struct would make memory management a little more complex, because my "frames" are just an array of static sized structs. Their localsplus is just accessing a giant pool. frame->prev
is just index--
. It's all rather clean. And I just need to manage memory once instead of for 7 giant frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we won't be doing any allocations, just reusing a buffer on the thread-state. But's that future PR.
But we still need to manage the memory within that buffer, and monolithic frames will make that simpler.
This should be fine for now, though.
How hard would it be to change the API to take |
Quite possible. I can just convert to the internal representation internally. (The representation is just to save space, and support unions in the future). The only thing is there are some "types" that cant be represented using pytype object. In that case I will special case them. |
It is probably best to add those cases to the API. |
Python/optimizer_analysis.c
Outdated
|
||
#define GETLOCAL(idx) ((ctx->frame->locals[idx])) | ||
|
||
#define REPLACE_OP(op, arg, oper) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like macros that assume context, and we rarely (maybe never) want to set the oparg or operand.
#define REPLACE_OP(op, arg, oper) \ | |
#define REPLACE_OP(INST, OP) (INST)->opcode = (OP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need operand once we start burning in _LOAD_CONST_INLINE
and friends.
|
||
|
||
static void | ||
remove_unneeded_uops(_PyUOpInstruction *buffer, int buffer_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this changed, or has it just moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not change it. It just moved.
Python/optimizer_analysis.c
Outdated
// The only valid error we can raise is MemoryError. | ||
// Other times it's not really errors but things like not being able | ||
// to fetch a function version because the function got deleted. | ||
return PyErr_Occurred() ? -1 : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to check PyErr_Occurred()
? uop_redundancy_eliminator
should not be returning -1 unless an error has occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the old code, constant propagation could raise MemoryError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
…onGH-115085) --------- Co-authored-by: Mark Shannon <[email protected]> Co-authored-by: Jules <[email protected]> Co-authored-by: Guido van Rossum <[email protected]>
This sets up the tier 2 optimizer's redundancy eliminator foundations.
It does the following:
It's basically a port of #114059 over using Mark's DSL. Constant propagation and a lot of type propagation rules torn out, for the sake of simpler code for now. We can easily add more later. I just want to get the foundations in.
I expect no speedups for this, as for simpler reviewing, most of the code is torn out. However, I believe that #114059 already shows this approach has potential for speedups.