-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad row factory example in the sqlite docs #110643
Comments
It was a deliberate decision to exclude the optimisation. Please read thoroughly through the discussions on gh-99507. |
CC. @merwok, @CAM-Gerlach |
I understand this is the relevant excerpt.
So the overhead of creating a tuple class is comparable to reading a single row. However, a query that reads a thousand rows is just 60 times slower (just tested it with a ten column table in a on-disk database). So the per row overhead skyrockets. And in a very realistic scenario of reading a few hundred rows we are looking at a tenth of a second per request of just creating named tuple classes. I feel it is quite impactful. |
@kfdf, do you want to open a PR to propose a change? |
I really am not adept at Github, Python and proper doc writing (which is, judging from the referenced discussion, quite involved, certainly well above "anything is better than nothing" level), so no. |
Creating an optimized namedtuple-factory is not straightforward and can make docs less readable. For example, this is what we do in Django: def unpickle_named_row(names, values):
return create_namedtuple_class(*names)(*values)
@functools.lru_cache
def create_namedtuple_class(*names):
# Cache type() with @lru_cache since it's too slow to be called for every
# QuerySet evaluation.
def __reduce__(self):
return unpickle_named_row, (names, tuple(self))
return type(
"Row",
(namedtuple("Row", names),),
{"__reduce__": __reduce__, "__slots__": ()},
)
def namedtuple_factory(cursor, row):
fields = [column[0] for column in cursor.description]
cls = create_namedtuple_class(fields)
return cls._make(row) However, this will make the docs less readable. Maybe we can add a warning that
|
Exactly; if you see the linked discussion, you'll find that we concluded similar: putting optimisations into the docs make the docs less readable, and also more unfocused (so, it would be to deviate from our Diatáxis northstar). A note that adds a hint on how to optimise things (similar to your suggestion) is better. (@felixxm, I restored your post; sorry 'bout my GH clumsiness!) |
Do you think it's worth linking to the Django repository? Referencing |
Linking to Django could be an option, yes.
IIRC, I did that in one of my early revisions of the row factory docs, but my proposed text was rejected, since it became a distraction from the main goal of the how-to. The main goal of the how-to is to teach how row factories can be created; the main goal is not to explain how to optimise your Python code ;) |
I agree with it, we also follow the principles outlined in the Diátaxis framework. We can close this issue as invalid, I think ⚖️ |
I don’t see this as an optimization issue (adding lru_cache would be a fig leaf), but a maybe confusing example that leads people to better understanding. (I had a different comment drafted, but am rewriting it after understanding!) The row factory is set on the connection, and is a function that gets information from a cursor to extract the list of fields, and then maps the values from a specific row. |
The row factory example that returns a namedtuple in the sqlite docs creates a new namedtuple class for every row instance and as a result is really, really slow. A couple of orders of magnitude slower is too much even for an example, people might still use it and assume that one class per row doesn't affect performance that much. Introducing caching makes the example a bit less concise, but at least it becomes usable.
The text was updated successfully, but these errors were encountered: