Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataclasses - Improve the performance of asdict/astuple for common types and default values #103000

Closed
DavidCEllis opened this issue Mar 24, 2023 · 5 comments
Labels
3.12 bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@DavidCEllis
Copy link
Contributor

DavidCEllis commented Mar 24, 2023

Feature or enhancement

Improve the performance of asdict/astuple in common cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the default dict_factory=dict to construct the dictionary directly.

The goal here is to improve performance in common cases without significantly impacting less common cases, while not changing the API or output in any way.

Pitch

In cases where a dataclass contains a lot of data of common python types (eg: bool/str/int/float) currently the inner loops for asdict and astuple require the values to be compared to check if they are dataclasses, namedtuples, lists, tuples, and then dictionaries before passing them to deepcopy. This proposes to special case and shortcut objects of types where deepcopy returns the object unchanged.

It is much faster for these cases to instead check for them at the first opportunity and shortcut their return, skipping the recursive call and all of the other comparisons. In the case where this is being used to prepare an object to serialize to JSON this can be quite significant as this covers most of the remaining types handled by the stdlib json module.

Note: Anything that skips deepcopy with this alteration is already unchanged asdeepcopy(obj) is obj is always True for these types.

Currently when constructing the dict for a dataclass, a list of tuples is created and passed to the dict_factory constructor. In the case where the dict_factory constructor is the default - dict - it is faster to construct the dictionary directly.

Previous discussion

Discussed here with a few more details and earlier examples: https://discuss.python.org/t/dataclasses-make-asdict-astuple-faster-by-skipping-deepcopy-for-objects-where-deepcopy-obj-is-obj/24662

Code Details

Types to skip deepcopy

This is the current set of types to be checked for and shortcut returned, ordered in a way that I think makes more sense for dataclasses than the original ordering copied from the copy module. These are known to be safe to skip as they are all sent to _deepcopy_atomic (which returns the original object) in the copy module.

# Types for which deepcopy(obj) is known to return obj unmodified
# Used to skip deepcopy in asdict and astuple for performance
_ATOMIC_TYPES = {
    # Common JSON Serializable types
    types.NoneType,
    bool,
    int,
    float,
    complex,
    bytes,
    str,
    # Other types that are also unaffected by deepcopy
    types.EllipsisType,
    types.NotImplementedType,
    types.CodeType,
    types.BuiltinFunctionType,
    types.FunctionType,
    type,
    range,
    property,
    # weakref.ref,  # weakref is not currently imported by dataclasses directly
}

Function changes

With that added the change is essentially replacing each instance of

_asdict_inner(v, dict_factory)

inside _asdict_inner, with

v if type(v) in _ATOMIC_TYPES else _asdict_inner(v, dict_factory)

Instances of subclasses of these types are not guaranteed to have deepcopy(obj) is obj so this checks specifically for instances of the base types.

Performance tests

Test file: https://gist.github.com/DavidCEllis/a2c2ceeeeda2d1ac509fb8877e5fb60d

Results on my development machine (not a perfectly stable test machine, but these differences are large enough).

Main

Current Main python branch:

Dataclasses asdict/astuple speed tests
--------------------------------------
Python v3.12.0alpha6
GIT branch: main
Test Iterations: 10000
List of Int case asdict: 5.80s

Test Iterations: 1000
List of Decimal case asdict: 0.65s

Test Iterations: 1000000
Basic types case asdict: 3.76s
Basic types astuple: 3.48s

Test Iterations: 100000
Opaque types asdict: 2.15s
Opaque types astuple: 2.11s

Test Iterations: 100
Mixed containers asdict: 3.66s
Mixed containers astuple: 3.28s

Modified

Modified Branch:

Dataclasses asdict/astuple speed tests
--------------------------------------
Python v3.12.0alpha6
GIT branch: faster_dataclasses_serialize
Test Iterations: 10000
List of Int case asdict: 0.53s

Test Iterations: 1000
List of Decimal case asdict: 0.68s

Test Iterations: 1000000
Basic types case asdict: 1.33s
Basic types astuple: 1.28s

Test Iterations: 100000
Opaque types asdict: 2.14s
Opaque types astuple: 2.13s

Test Iterations: 100
Mixed containers asdict: 1.99s
Mixed containers astuple: 1.84s

Linked PRs

@DavidCEllis DavidCEllis added the type-feature A feature request or enhancement label Mar 24, 2023
@AlexWaygood AlexWaygood added performance Performance or resource usage stdlib Python modules in the Lib dir 3.12 bugs and security fixes labels Mar 24, 2023
AlexWaygood added a commit that referenced this issue Apr 10, 2023
@Pythonix
Copy link

Since the PR has been merged, presumably the issue can be closed?

@AlexWaygood
Copy link
Member

Since the PR has been merged, presumably the issue can be closed?

There was another possible optimisation that was originally included in the PR, but which we decided to postpone to a separate PR, since it was a distinct change that deserved to be considered separately.

@DavidCEllis mentioned that he might not be interested in filing a followup PR anymore. But I'd still like to do some more benchmarks before closing this issue — if the second optimization provides a decent speedup, I might file my own PR (listing @DavidCEllis as a co-author).

@DavidCEllis
Copy link
Contributor Author

The implementation I had originally included the recursion skip for things that weren't 'atomic'

        if dict_factory is dict:
            result = {}
            for f in fields(obj):
                value = getattr(obj, f.name)
                if type(value) not in _ATOMIC_TYPES:
                    value = _asdict_inner(value, dict_factory)
                result[f.name] = value
            return result

My thinking is that without the recursion skip while you could still write it in the same way:

        if dict_factory is dict:
            result = {}
            for f in fields(obj):
                result[f.name] = _asdict_inner(getattr(obj, f.name), dict_factory))
            return result

You would probably write it as a comprehension instead:

        if dict_factory is dict:
            return {f.name: _asdict_inner(value, dict_factory) for f in fields(obj)}

This currently introduces additional overhead associated with the implementation of comprehensions. Based on testing a 'walrus' comprehension implementation1 with the recursion skip I would expect this to be slower than the regular loop, though I could be wrong.

However if/when PEP-709 for inline comprehensions lands, then I would expect this to no longer matter and the comprehension is clearly a better fit. Given that I probably wouldn't submit a PR until it was possible to compare with PEP-709 implemented. I wouldn't want to reject it as an insignificant improvement if this was due to the comprehension overhead.

Footnotes

  1. I never submitted this implementation as it was slower and also much harder to read.

warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
@AlexWaygood
Copy link
Member

AlexWaygood commented Apr 11, 2023

This currently introduces additional overhead associated with the implementation of comprehensions. Based on testing a 'walrus' comprehension implementation1 with the recursion skip I would expect this to be slower than the regular loop, though I could be wrong.

Yeah, I tried this patch out locally:

diff --git a/Lib/dataclasses.py b/Lib/dataclasses.py
index 4026c8b779..079c28fae3 100644
--- a/Lib/dataclasses.py
+++ b/Lib/dataclasses.py
@@ -1317,11 +1317,18 @@ def _asdict_inner(obj, dict_factory):
     if type(obj) in _ATOMIC_TYPES:
         return obj
     elif _is_dataclass_instance(obj):
-        result = []
-        for f in fields(obj):
-            value = _asdict_inner(getattr(obj, f.name), dict_factory)
-            result.append((f.name, value))
-        return dict_factory(result)
+        # fast path for the common case
+        if dict_factory is dict:
+            return {
+                f.name: _asdict_inner(getattr(obj, f.name), dict_factory)
+                for f in fields(obj)
+            }
+        else:
+            result = []
+            for f in fields(obj):
+                value = _asdict_inner(getattr(obj, f.name), dict_factory)
+                result.append((f.name, value))
+            return dict_factory(result)

And measured it with this benchmark:

from dataclasses import dataclass, asdict
import timeit

@dataclass
class Foo:
    x: int

@dataclass
class Bar:
    x: Foo
    y: Foo
    z: Foo

@dataclass
class Baz:
    x: Bar
    y: Bar
    z: Bar

foo = Foo(42)
bar = Bar(foo, foo, foo)
baz = Baz(bar, bar, bar)

print(timeit.timeit(lambda: asdict(baz), number=50_000))

And it does indeed seem to be slower than the current main branch. So let's put this on hold for now, and try again once PEP-709 has been decided on (hopefully it'll be accepted!).

@AlexWaygood
Copy link
Member

That's two great dataclasses optimisations landed in 3.12 -- thanks again @DavidCEllis!

carljm added a commit to carljm/cpython that referenced this issue May 10, 2023
* main:
  pythonGH-102181: Improve specialization stats for SEND (pythonGH-102182)
  pythongh-103000: Optimise `dataclasses.asdict` for the common case (python#104364)
  pythongh-103538: Remove unused TK_AQUA code (pythonGH-103539)
  pythonGH-87695: Fix OSError from `pathlib.Path.glob()` (pythonGH-104292)
  pythongh-104263: Rely on Py_NAN and introduce Py_INFINITY (pythonGH-104202)
  pythongh-104010: Separate and improve docs for `typing.get_origin` and `typing.get_args` (python#104013)
  pythongh-101819: Adapt _io._BufferedIOBase_Type methods to Argument Clinic (python#104355)
  pythongh-103960: Dark mode: invert image brightness (python#103983)
  pythongh-104252: Immortalize Py_EMPTY_KEYS (pythongh-104253)
  pythongh-101819: Clean up _io windows console io after pythongh-104197 (python#104354)
  pythongh-101819: Harden _io init (python#104352)
  pythongh-103247: clear the module cache in a test in test_importlib/extensions/test_loader.py (pythonGH-104226)
  pythongh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format (python#103849)
  pythongh-74895: adjust tests to work on Solaris (python#104326)
  pythongh-101819: Refactor _io in preparation for module isolation (python#104334)
  pythongh-90953: Don't use deprecated AST nodes in clinic.py (python#104322)
  pythongh-102327: Extend docs for "url" and "headers" parameters to HTTPConnection.request()
  pythongh-104328: Fix typo in ``typing.Generic`` multiple inheritance error message (python#104335)
carljm added a commit to carljm/cpython that referenced this issue May 11, 2023
* main: (27 commits)
  pythongh-87849: fix SEND specialization family definition (pythonGH-104268)
  pythongh-101819: Adapt _io.IOBase.seek and _io.IOBase.truncate to Argument Clinic (python#104384)
  pythongh-101819: Adapt _io._Buffered* methods to Argument Clinic (python#104367)
  pythongh-101819: Refactor `_io` futher in preparation for module isolation (python#104369)
  pythongh-101819: Adapt _io.TextIOBase methods to Argument Clinic (python#104383)
  pythongh-101117: Improve accuracy of sqlite3.Cursor.rowcount docs (python#104287)
  pythonGH-92184: Convert os.altsep to '/' in filenames when creating ZipInfo objects (python#92185)
  pythongh-104357: fix inlined comprehensions that close over iteration var (python#104368)
  pythonGH-90208: Suppress OSError exceptions from `pathlib.Path.glob()` (pythonGH-104141)
  pythonGH-102181: Improve specialization stats for SEND (pythonGH-102182)
  pythongh-103000: Optimise `dataclasses.asdict` for the common case (python#104364)
  pythongh-103538: Remove unused TK_AQUA code (pythonGH-103539)
  pythonGH-87695: Fix OSError from `pathlib.Path.glob()` (pythonGH-104292)
  pythongh-104263: Rely on Py_NAN and introduce Py_INFINITY (pythonGH-104202)
  pythongh-104010: Separate and improve docs for `typing.get_origin` and `typing.get_args` (python#104013)
  pythongh-101819: Adapt _io._BufferedIOBase_Type methods to Argument Clinic (python#104355)
  pythongh-103960: Dark mode: invert image brightness (python#103983)
  pythongh-104252: Immortalize Py_EMPTY_KEYS (pythongh-104253)
  pythongh-101819: Clean up _io windows console io after pythongh-104197 (python#104354)
  pythongh-101819: Harden _io init (python#104352)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants