-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-32492: 1.6x speed up in namedtuple attribute access using C fast-path #10495
Conversation
Modules/_collectionsmodule.c
Outdated
Py_INCREF(self); | ||
return self; | ||
} | ||
result = PyTuple_GetItem(obj, ((_tuplegetterobject*)self)->index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a PyTuple_Check
here, but as this class is private to the collections module,
we can assume that the contract is that it has to be used in a tuple-like class. Notice that this stills throws SystemError: Objects/tupleobject.c:152: bad argument to internal function
if is used incorrectly.
Being said that, I am happy to add the check if people think otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this can be triggered from Python, we should avoid SystemError
and crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a PyTuple_Check
that throws a TypeError
in 21be735.
Include/descrobject.h
Outdated
PyObject *prop_del; | ||
PyObject *prop_doc; | ||
int getter_doc; | ||
} propertyobject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be in the limited API. And why it is added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the new descriptor needs to inherit from PyProperty_Type
and the object type (propertyobject
) was not exposed in the headerfile, making impossible to inherit from it. I placed the object definition (propertyobject
) together with the Type definition (PyProperty_Type
).
Where is the best place to place the definition?
Lib/collections/__init__.py
Outdated
cache[index] = itemgetter_object, doc | ||
class_namespace[name] = property(itemgetter_object, doc=doc) | ||
tuplegetter_object = _tuplegetter(index, doc=doc) | ||
cache[index] = tuplegetter_object, doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__doc__
is a writable attribute, and descriptors in different namedtuples can have different docs. This is why itemgetter objects was cached instead property objects.
A = namedtuple('A', 'x y')
B = namedtuple('B', 'x y')
A.x.__doc__ = 'foo'
B.x.__doc__ = 'bar'
assert A.x.__doc__ = 'foo'
assert B.x.__doc__ = 'bar'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am caching only the docstring in 21be735
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test based on the above example.
Modules/_collectionsmodule.c
Outdated
Py_INCREF(self); | ||
return self; | ||
} | ||
result = PyTuple_GetItem(obj, ((_tuplegetterobject*)self)->index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this can be triggered from Python, we should avoid SystemError
and crashes.
@@ -0,0 +1,2 @@ | |||
Speed up :class:`namedtuple` attribute access by 2.5x using a C fast-path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make comparison not with the current master, but with 3.7. There is a regression in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mean +- std dev: [old] 176 ns +- 2 ns -> [new] 110 ns +- 2 ns: 1.61x faster (-38%)
Significant (t=177.69)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore the number 1.6x should be used in the documentation.
Sorry, I didn't check your implementation, but did you consider to reuse existing structseq type to implement namedtuple? https://bugs.python.org/issue28638#msg298499 Last time I ran a microbenchmark, structseq was 1.9x faster than namedtuple to get an attribute by name. In the meanwhile, I removed property_descr_get() micro-optimization because it wasn't correct and caused 3 different crashed, bpo-30156, commit e972c13. So I get that structseq is now even faster than namedtuple to get an attribute :-) |
…the descriptor itself
Hummm I did not consider this, but that will involve more significant and fundamental changes than this Pull Request. Also, apparently there is this issue that Josh Rosenberg ran into when implementing the idea. I am happy to give it a go if people agree that is a good idea :) But I think this we can start with this Pull Request as is simpler and it gives some immediate speedup. |
You do not need a subclass of Look also at |
@serhiy-storchaka Thanks! I will take a look into that. Independently, if we don't move the property object to the header file, is not possible to subclass property in C. What do you think we should do with that? |
@@ -0,0 +1,2 @@ | |||
Speed up :class:`namedtuple` attribute access by 2.5x using a C fast-path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore the number 1.6x should be used in the documentation.
@@ -454,12 +459,13 @@ def __getnewargs__(self): | |||
cache = _nt_itemgetters | |||
for index, name in enumerate(field_names): | |||
try: | |||
itemgetter_object, doc = cache[index] | |||
doc = cache[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please measure the effect of caching docstrings. Adding this cache for docstrings sped up namedtuple type creation by 10% in former implementation, but removing the cache for itemgetters should reduce the benefit. If it is too small, it may be not worth to use the cache at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This are the results for:
from collections import namedtuple; names = ['field%d' % i for i in range(1000)]" -- "namedtuple('A', names)"
❯ ./python -m perf compare_to with_caching.json without_caching.json
Mean +- std dev: [with_caching] 7.88 ms +- 0.12 ms -> [without_caching] 8.24 ms +- 0.04 ms: 1.05x slower (+5%)
Is 5% slower without the cache.
PyObject_HEAD | ||
Py_ssize_t index; | ||
PyObject* doc; | ||
} _tuplegetterobject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reuse PyMemberDescrObject or PyGetSetDescrObject here, without creating a new type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hummm.... I don't see an obvious way to do that. We still need a custom descriptor protocol to access the namedtuple object and a mutable doc field.
Modules/_collectionsmodule.c
Outdated
return self; | ||
} | ||
if (!PyTuple_Check(obj)){ | ||
PyErr_SetString(PyExc_TypeError, "_tuplegetter must be used with tuples"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to avoid using private class names in error messages.
Common error message looks like:
>>> int.numerator.__get__([])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor 'numerator' for 'int' objects doesn't apply to 'list' object
It needs additional information: the name of the namedtuple type and the name of the filed. If it is hard to add this information, the error message can use more general words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it to:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor for index '1' for tuple subclasses doesn't apply to 'A' object
Lib/collections/__init__.py
Outdated
cache[index] = itemgetter_object, doc | ||
class_namespace[name] = property(itemgetter_object, doc=doc) | ||
tuplegetter_object = _tuplegetter(index, doc=doc) | ||
cache[index] = tuplegetter_object, doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test based on the above example.
cache[index] = doc | ||
|
||
tuplegetter_object = _tuplegetter(index, doc=doc) | ||
class_namespace[name] = tuplegetter_object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can tuplegetter object be cached?
try:
tuplegetter_object = cache[index]
except KeyError:
tuplegetter_object = cache[index] = _tuplegetter(index, doc=f'Alias for field number {index}')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, no because the docstrings are mutable. Check Serhiy comment:
Try to make constructor arguments positional-only and repeat benchmarks for creating a namedtuple type. I think this can save several percents of creation time. |
@serhiy-storchaka Here are the results (commit e5bca1d): import perf
runner = perf.Runner()
runner.timeit("collections.namedtuple('A','x')",
stmt="collections.namedtuple('A','x')",
setup="import collections")
|
This patch looks great. Thanks for the effort to get this done :-) Before this gets committed, please make a couple of improvements. 1. The _tuplegetter() API needs to more fully emulate property():
Part of the reason is that we want tuplegetter() to be a drop in substitute, supporting whatever interactions users have had with it before now (this is an old API). Another reason is that tuplegetter() needs to be recognized as a data descriptor so that its docstrings show-up in the output of help(). Formerly, running
Now we get:
2. The code in tuplegetterdescr_get can be made tighter by using PyTuple_GET_SIZE() and PyTuple_GET_ITEM() instead of PyTuple_GetItem(). That saves the function call overhead and a redundant duplicate PyTuple_Check (the second check is 100% branch predictable which is good, but still incurs two chained memory accesses). In running timings, we should not only benchmark 1.6x to 2.5 improvement, but also compare against regular attribute access to an instance of a class that defines |
To be recognized as a data descriptor I do not think that
|
@rhettinger @serhiy-storchaka I am not sure what is the best path to follow. To make this PR simpler, what do you think about just reverting commit 1e14509 so @serhiy-storchaka Although |
…on of tuplegetterdescr_get
I do not see sense in full emulating a Attributes
We are at the pre-alpha stage. If some code will be broken by this change, we have enough time to fix it. |
@serhiy-storchaka So you propose to implement:
Is that correct? |
Try to implement just |
After db3ffcd: >>> from collections import namedtuple
>>> help(namedtuple('Point', ['x', 'y'])(10, 20))
| Data descriptors defined here:
|
| x
| Alias for field number 0
|
| y
| Alias for field number 1
|
>>> set(dir(property)) - set(dir(_tuplegetter))
{'fget', 'deleter', '__isabstractmethod__', 'getter', 'setter', 'fdel', 'fset'} |
Benchmark agains a class definning
Results (no PGO):
It turns that the latest I ran some experiments regarding the inlining of PyTuple_GetItem and even without PGO is unoticeable under
The two cmpq are smashed by the branch predictor and the stack allocation (the push and the two movs and subsequent) are almost negligible as the stack of |
Modules/_collectionsmodule.c
Outdated
if (value == NULL) { | ||
return PyObject_DelItem(obj, index); | ||
} | ||
return PyObject_SetItem(obj, index, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just raise an AttributeError similar to errors for other read-only attributes:
>>> 1 .numerator = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: attribute 'numerator' of 'int' objects is not writable
>>> sys.version_info.major = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: readonly attribute
>>> sched.Event(0, 0, None, (), {}).time = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
Modules/_collectionsmodule.c
Outdated
} | ||
|
||
result = PyTuple_GET_ITEM(obj, index); | ||
Py_XINCREF(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Py_INCREF()
should be used here, not Py_XINCREF()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, now that we inline PyTuple_GetItem, the index cannot be out of bounds after the check and therefore it will always return something. Good catch!
FWIW, my timings show a significant improvement (more than 2x) and that named tuple attribute access is now on-par with access to member objects created by _slots_. Nice work. |
@rhettinger: Please replace |
Timing benchmarks
Attribute Access
Apparently, there is a regression in the current master. This is the comparison against 3.7:
Creation
(Just to check that creation is not slower)
Cache efficiency
Baseline
Patched
https://bugs.python.org/issue32492