-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-44590: Lazily allocate frame objects #27077
Conversation
…a frame, except for the locals.
…to tstate.pyframe as preparation for lazily created frames.
… object outlives stack invocation.
…has been deleted.
…t introspects frames.
Python/pystate.c
Outdated
@@ -2009,42 +2016,57 @@ _Py_GetConfig(void) | |||
|
|||
#define MINIMUM_OVERHEAD 1000 | |||
|
|||
PyObject ** | |||
_PyThreadState_PushLocals(PyThreadState *tstate, int size) | |||
_Py_NO_INLINE PyObject ** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why _Py_NO_INLINE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the slow path. We want to make sure the fast path is inlined.
Python/pystate.c
Outdated
PyObject ** | ||
_PyThreadState_PushLocals(PyThreadState *tstate, int size) | ||
_Py_NO_INLINE PyObject ** | ||
_PyThreadState_PushChunk(PyThreadState *tstate, int size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to take pointers to arbitrary places in memory, doesn't the standard require that those must be aligned? How are we making sure that all pointers to this region are aligned?. From the standard:
A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
Many compilers normally assume aligned pointer access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think the returned memory would be unaligned?
How is this any different from malloc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think the returned memory would be unaligned?
My question is not if the returned memory would be unaligned, but if we are taking pointers to random places of the chunk array. For instance, if we are doing anything isomorphic to this:
data = (char *)malloc(16);
sp1 = (short*)(data+3);
then we are doing it wrong. I am asking not because I think we are doing it wrong, but because i am still wrapping around the structure you have in mind for the chunks, because there is 3 or 4 layers of indirections, a "chunk struct" and at least 3 pointer accesses when pushin frames, so immediately I am wondering if this mechanism is taking this correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compiled Python with the undefined behavior sanitizer and it doesn't complain, so seems that there is no problem here.
On contrast, my example shows:
example.c:1313:30: runtime error: load of misaligned address 0x55f1ba0cd283 for type 'short int', which requires 2 byte alignment
0x55f1ba0cd283: note: pointer points here
00 3d 0e 0b e5 f4 55 00 00 00 00 00 00 00 00 00 00 09 09 49 42 4d 39 30 36 21 00 00 00 00 00 00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand it found this (unrelated to this PR):
home/pablogsal/github/python/main/Modules/_sha3/kcp/KeccakP-1600-opt64.c:467:9: runtime error: load of misaligned address 0x563579d3c3f7 for type 'UINT64', whic
h requires 8 byte alignment
0x563579d3c3f7: note: pointer points here
23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
^
/home/pablogsal/github/python/main/Modules/_sha3/kcp/KeccakP-1600-opt64.c:467:9: runtime error: load of misaligned address 0x563579d3c3ff for type 'UINT64', whic
h requires 8 byte alignment
0x563579d3c3ff: note: pointer points here
23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
^
/home/pablogsal/github/python/main/Modules/_sha3/kcp/KeccakP-1600-opt64.c:467:9: runtime error: load of misaligned address 0x563579d3c407 for type 'UINT64', whic
h requires 8 byte alignment
0x563579d3c407: note: pointer points here
23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
^
/home/pablogsal/github/python/main/Modules/_sha3/kcp/KeccakP-1600-opt64.c:467:9: runtime error: load of misaligned address 0x563579d3c40f for type 'UINT64', whic
h requires 8 byte alignment
0x563579d3c40f: note: pointer points here
23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
^
I will open an issue
if (new == NULL) { | ||
return NULL; | ||
} | ||
tstate->datastack_chunk->top = tstate->datastack_top - &tstate->datastack_chunk->data[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tstate->datastack_chunk->top = tstate->datastack_top - &tstate->datastack_chunk->data[0]; | |
tstate->datastack_chunk->top = tstate->datastack_top - tstate->datastack_chunk->data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am missing anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It's just a hint to the reader that data
is an array within the chunk.
chunk->data
looks like a pointer deference, but &chunk->data[0]
hints that this is an address calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Great job!
Let's run the buildbots and land!
🤖 New build scheduled with the buildbot fleet by @markshannon for commit e8476b2 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
The failures on buildbot/AMD64 Windows10 PR look like something is wrong with that machine. |
One consequence of this change is that the code is now quite confusing, since the term "frame" is used ambiguously to refer to both frame objects and interpreter frames (specifically, the |
The generator code uses |
I've posted a proposed internal API refactoring design here: https://bugs.python.org/issue44800 |
And the PR implementing that proposed refactoring is up at #27525 |
@@ -3,7 +3,7 @@ | |||
#include "pycore_pymem.h" // _Py_tracemalloc_config | |||
#include "pycore_traceback.h" | |||
#include "pycore_hashtable.h" | |||
#include "frameobject.h" // PyFrame_GetBack() | |||
#include <pycore_frame.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m curious why this one is included with <>
when all the headers above use ""
(I don’t know much C note!)
This PR:
For most calls to Python functions, we can get the necessary memory for a stack frame by just bumping a pointer.
When we do need a frame object, for tracebacks and the like, it is lazily allocated.
According to my benchmarking run this PR produces a maximum speedup of 31% with a mean of 7%!
The performance numbers are suspiciously good.
If someone could run the comparison on their machine that would be much appreciated.
https://bugs.python.org/issue44590