Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for default_factory on Struct types #274

Merged
merged 4 commits into from
Jan 23, 2023
Merged

Conversation

jcrist
Copy link
Owner

@jcrist jcrist commented Jan 23, 2023

This PR does a few things:

  • It adds a new msgspec.field function, which returns an opaque object used for configuring fields. So far this only exposes default and default_factory, both of which match the dataclasses kwargs of the same name.
  • It adds support for init-time generated default values in msgspec.Struct types, through use of the new default_factory setting (fixes Support default factories for Struct types #259).

Example:

import msgspec
import uuid

class Example(msgspec.Struct):
    id: uuid.UUID = msgspec.field(default_factory=uuid.uuid4)

Breaking Change: mutable default values in structs are now handled differently

This also changes how mutable default values are handled. Previously msgspec's semantics were that a default value for a field would be deepcopied on __init__ if that field wasn't provided (the implementation would avoid a deepcopy for most common types, but semantically this was the same). This proved problematic in the face of custom types, led to some unnecessary slowdowns, and didn't match the behavior of other similar libraries like dataclasses or attrs. Since we now can support arbitrarily complex default values through the default_factory function, we drop the old deepcopy behavior.

A default value on a Struct now has the following rules:

  • Common mutable containers (list, dict, set, bytearray) will error if used as the default value if they aren't empty. The error points users to use a default_factory instead.
  • Empty common mutable containers ([], {}, set(), bytearray()) are accepted as default values. These are treated as syntactic sugar, and are automatically converted to the equivalent default_factory notation ([] -> field(default_factory=list)).
  • Mutable struct types will error if used as default values. The error points users to use a default_factory instead.
  • Frozen struct types are accepted as a default value, and are used directly without deepcopying.
  • All other values are used directly without deepcopying

Since most mutable default values are empty collections, I don't expect this breaking change to affect most users. To reiterate, the following type definition is still valid:

class Example(Struct):
    x: list[int] = []
    y: set[int] = set()
    z: dict[str, int] = {}

while this one will error, and should use a default_factory instead:

class Bad(Struct):
    x: list[int] = [1, 2, 3]

# should be
class Good(Struct):
    x: list[int] = field(default_factory=lambda: [1, 2, 3])

Todo:

  • Tests
  • Docs
  • Ensure all examples are updated to the newer syntax

Previously we allowed any msgspec-compatible value to be used as a
default value in a struct. When initialized, the default value would be
deepcopied (with some optimizations for common types) to ensure mutable
state wasn't shared. This was nice and readable (IMO), and let us avoid
implementing `default_factory` support. However, it also had a few
problems:

- Some custom types used as default values are effectively immutable
  (e.g. UUIDs, ipaddress, ...). These shouldn't need to be deepcopied,
  but there was no way to tell msgspec that.
- Deepcopying is expensive. We had optimizations for common cases (empty
  mutable collections like `[]`, `{}`, ...), but the general case had a
  large performance cost.
- This only supports "static" default values. Sometimes a user may want
  to autogenerate a UUID for a field if one isn't provided, which isn't
  possible with the current system.

All of these problems can be solved by dropping the current deepcopying
behavior and adding support for a configurable `default_factory` on
Struct fields. This commit only does the first half.

The new behavior has the following rules:

- Common empty mutable collections (`[]`, `{}`, `set()`, and
  `bytearray()`) may be used directly as default values (as a shorthand
  for `field(default_factory=list)`. This is purely syntactic sugar,
  behind the scenes these are converted to `default_factory`.

- Using common *nonempty* mutable collections (list, dict, set, and
  bytearray) as a default value is now an error. We can't check for all
  mutable types, so we only try to provide error messages for common
  mistakes. To handle these use cases the user should use a
  `default_factory`, or switch to an immutable type.

- Using `frozen` struct instances as default values is allowed.

- Using non-frozen struct instances as default values is now an error.
  To handle these use cases the user should use a `default_factory`, or
  set `frozen=True`.

- Every other type used as a default value is used directly (meaning we
  assume they're immutable values).

An added benefit of these changes is that `Struct.__init__` with default
values now has less overhead (although it was already fast).
This moves `msgspec.inspect.UNSET` to `msgspec.UNSET` (leaving the
original import as well). It also moves the singleton implementation to
C, to make it easier to work with when adding a new `field` construct in
a follow-up commit.
This adds a new `msgspec.field` function, which returns an opaque config
object for configuring fields in a `msgspec.Struct` type. In the future
this will support more config options, but for now it only supports:

- `default` (the same as providing a default value directly)
- `default_factory` (a 0-argument callable for generating a default
  value at `__init__` time.

Example:

```python
import msgspec
import uuid

class Test(msgspec.Struct):
    id: uuid.UUID = msgspec.field(default_factory=uuid.uuid4)
```
@jcrist jcrist merged commit 11fca9e into main Jan 23, 2023
@jcrist jcrist deleted the default-factory branch January 23, 2023 15:37
@jcrist
Copy link
Owner Author

jcrist commented Jan 23, 2023

A quick benchmark of __init__ costs after this change, comparing msgspec to a few other dataclass-like libraries: https://gist.github.com/jcrist/9bfe44f60533225d5f8383791f2fe734. In short - msgspec is fast. Allocating a new struct even in the presence of default values now is as fast as allocating a tuple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support default factories for Struct types
1 participant