-
Notifications
You must be signed in to change notification settings - Fork 50
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pointer tagging aka tagged numbers #7
Comments
So are you proposing to use the lowest N bits for the tag? (N being 1 or 3.) Why not represent pointers as themselves (tag == 0)? So that forgetting to box is more obviously always an error? If tagged pointers can be seen by 3rd part code that would break the ABI (alas). I think that (currently) the macros that look inside tuple are part of the ABI. I had considered a scheme where tagged pointers can only occur in local variables and on the evaluation stack, but are boxed when they escape to a place where 3rd party code can see them. (With PEP 651 we could even return them from functions and use them as args.) I assume that they would never be placed in tuples or lists, though. |
Catching errors quickly is one reason. The other is performance; operations on integers are simpler and faster.
The macros that look inside sequence objects are part of the API, not the ABI. Thankfully. Tagged pointers should not be visible to the ABI or current API. E.g. to get an item from a tuple we might add a new function: PyValue Python_TupleGetItem(intptr_t index); The old API would wrap that: PyObject* PyTuple_GetItem(intptr_t index) {
PyValue val = Python_TupleGetItem(index);
if (val == PyValue_ERROR) {
return NULL;
}
return PyValue_AsNewReference(val);
} |
What's the scope of your proposed change then? I can't imagine every field that's currently a If we're going as far as having tuples and lists of Thinking some more about this, even without special sequence types, we'd need:
There could be two types of unboxing/encoding operations (
Not sure that the distinction is important, but I figure the lazy version is slower, and everything needs to be prepared to handle boxed numbers anyway since the value may not fit in the 30 or 60 bits we have available. PS. Why do you use all 6 available values for unboxed floats? Isn't two bits, i.e. 4 values, sufficient? Then we'd have some extra values if we ever want to encode other types. |
It's a good place to start. It should reduce the amount of unboxing, if not as much as using tagged values everywhere. I think adding new datatypes for tuples and lists might well cause more problems than it solves. Some degree of automation will be necessary, maybe quite a lot of automation. I think the changes can be largely automated. Initial It's definitely not a small task though. |
Using tagged values everywhere would be a "Python 4.0" type event, because so much code exists that does not limit itself to the ABI. But I think we can get away with using them only for values store in the frame object's For automation, perhaps @ericsnowcurrently has tools that can form a starting point. (His tools do something else but IIRC they can scan through the entire CPython code base intelligently looking for certain patterns -- I believe more complex than a regex can do.)
Do you mean we could automatically translate it to something that's still correct? Because if I understand you correctly, if a There's a concrete experiment we could do: implement pointer encoding for |
See also Neil Schemenauer's tagged_int branch. |
Yes, but as you point out, we would need to convert the value to a pointer. So inline PyObject *
PyValue_AsObjectPtr(PyValue v) {
return (PyObject *)(v.bits+1);
} Because the cast to |
I see. I am starting a small experiment where only |
Also see Victor's branch: vstinner/cpython#6 (read the |
I've decided to abandon my experiment for now. The immediate reason is the vectorcall API. Let me explain. My plan was to store tagged pointers in
Note that the There seem to be two ways we can slice this. We could change the The other way would be to let
So far so good (it looks like hiding this complexity in But now we get to the It may be possible to rescue the experiment by making the boxed and unboxed |
Some comments on Victor's attempt:
|
FWIW there have been other related attempts to just optimize int operations on single-digit ints (without tagging). See https://bugs.python.org/issue10044 and https://bugs.python.org/issue21955. |
So I lied, I kept playing with this. I haven't made that much progress, but I changed things around so that the box/unbox operations change inline integers and leave object pointers alone, rather than the other way around in Mark's proposed tagging scheme. (Note, I haven't actually proved that this is any better yet.) One observation: it is kind of painful that the cells used for closures are part of the f_localsplus array. It is statically (i.e. from the bytecode) fixed which array elements are cells, and that's a contiguous chunk of items (which are never NULL), but in most cases it is dynamically determined whether a local is a cell or not (e.g. in dict_to_map() in frameobject.c). Maybe we can work on separating the two arrays. (I found this tidbit in Skip Montanaro's proto-PEP for his register VM, see #22.) |
One nice things of tagged ints everywhere is that you could use these directly as the size of strings, lists etc. Another idea: Using HPy we could perhaps hide tagged numbers from 3rd parties (or internal modules that aren't yet aware of tagged numbers). |
I've got a mostly working (in the sense of not-crashing) version of tagged integers for everything in It's about 10-15% slower than master (on a micro-benchmark). That's not surprising -- I'm doing all the work to check for tags and but I'm not yet doing any operations that preserve tags (not even LOAD_FAST or ROT_TWO). To check (some) correctness I've just implemented a specialization in FOR_ITER where if the iterator is a range iter object it returns an unboxed integer, and this seems to work. Next up (next week) I hope to work on making all operations that just move data around between locals, cells and the value stack skip most of the tagging overhead (there's still one tag check because all reference count operations need to check for tags). After that I'm planning to work on basic arithmetic. Don't expect speedups, just an evaluation of the feasibility and thoughts about the right API (I've gone over half a dozen different approaches to boxing so far :-). The jury is still out on whether it's better to tag integers or pointers, and I haven't touched floats. |
I had a look at your implementation. I would strongly recommend making typedef struct _pyv {
uint64_t bits;
} PyValue; To help with conversions I would also recommend a helper union: typedef union convert {
uint64_t bits;
int64_t i;
double d;
PyObject *p;
} _PyConvert; For example: static inline int
PyValue_IsTaggedInt(PyValue v) {
(v.bits & 7) == 0;
}
static inline double
PyValue_UntagFloat(PyValue v) {
assert(PyValue_IsTaggedFloat(v));
_PyConvert c;
c.bits = rotate4(v.bits);
return c.d;
} One other thing. Returning borrowed references is asking for trouble. |
Yeah, I already discovered that my current macros don't sufficiently type-check. :-( I will incorporate the struct and helper union. Regarding borrowed references, I originally was not returning them, but almost all of the code where I had to insert boxing operations was effectively returning borrowed references -- for example the |
Hm, but if we're using inline functions instead of macros the type-safety would not be a problem, right?
Then a call to
and now OTOH, if we're going to make
(I'm leaving out |
Inline functions will help a bit, but the fundamental issue is that you don't want to allow casts at all, because there is very often a mask, shift or rotate required and it is all too easy to forget them.
I don't remember which casts are legal for unions. What doesn't work for doubles? I would recommend you tag pointers in some way, so that any casts that do slip though explode at runtime. |
Well, using a struct found several spots that I had missed, so I'm convinced of that. I did find that I had to introduce a macro to check for NULL, because this is an error: some_func(PyValue v)
{
if (v != PyValue_NULL) // Error
...
} I had to rewrite all those tests as if (!PyValue_IsNULL(v)) There were occasional comparisons of one value to another, e.g. if (v != w) // Error which I rewrote as if (v.bits != w.bits) // No error I think it would work with a union too (the casting rules for unions are pretty similar to those for structs), but keeping the union private affords a little bit extra safety. Regarding double, I have no idea what your function I haven't decided to tag pointers (instead of numbers) yet, but I think that the amount of code written and the speed at which it executes are exactly the same, so your argument (that it catches anything that slips through the net) makes sense now. |
So there's an issue (theoretical or not) with boxing negative values. The C standard says it's undefined what happens to sign bits on a right shift of a signed value. Since the boxing operation would use Can you think of a closed expression to solve that? The best I can think is would be to Would it be worth testing if the compiler preserves sign bits so we can avoid this? (MSC docs claim that for signed operands, copies of the sign bit are shifted in.) |
Okay, first positive result: after returning tagged ints from range(n) and implementing a special code path for tagged ints in BINARY_ADD, a micro-benchmark nearly doubled in speed:
|
I propose to stop here, since this is just a proof of concept. We'd have to spend a lot more time on this to defeat the 10-20% slowdown I see on code that doesn't benefit from the fast path. Some things left to do as part of productionizing:
And more ideas:
|
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
https://en.wikipedia.org/wiki/Tagged_pointer
Currently, a reference to a Python object is a
PyObject *
. This means that we need to do a lot of boxing, unboxing and pointer chasing when handlingint
s andfloat
s.Because
int
s andfloat
s are pure value types, we can freely replace references to them with their values, and vice-versa.For 32 bit machines, using a single bit to tag small (up to 31 bit) integers is the most obvious scheme.
For 64 bit machines, there at least two schemes that we could use.
abs(val) < 2**512
Any C extensions that use macros to access the internals of tuples, lists, etc. would need recompiling, but they need recompiling for every release anyway.
The text was updated successfully, but these errors were encountered: