Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subclassing support #312

Merged
merged 17 commits into from
May 22, 2023
Merged

Subclassing support #312

merged 17 commits into from
May 22, 2023

Conversation

matmel
Copy link
Contributor

@matmel matmel commented Oct 21, 2022

Solves #272

Without this PR, the subclass support is as follows:

  • BaseConverter: unstructure behaves as expected in face of an attrs subclass given instead of the attrs original class
  • Converter cannot unstructure a subclass when it has been typed with a parent class, some attributes will be missing.
  • BaseConverter and Converter fail at structuring a dictionary that is structured as a parent class even if the attributes match the child class. Those attributes are lost in the structuring process.

With this PR:

  • No change for BaseConverter
  • Converter can structure and unstructure subclasses when parent class is mentioned. This behavior is opt-in though and must be asked for at Converter instantiation with include_subclasses=True parameter.

@matmel
Copy link
Contributor Author

matmel commented Oct 21, 2022

CI fails, but I don't think this is due to the PR code AFAICT.

@codecov-commenter
Copy link

codecov-commenter commented Oct 21, 2022

Codecov Report

Merging #312 (ce287f3) into main (2f7c3f9) will increase coverage by 0.08%.
The diff coverage is 98.96%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main     #312      +/-   ##
==========================================
+ Coverage   97.47%   97.55%   +0.08%     
==========================================
  Files          20       21       +1     
  Lines        1582     1675      +93     
==========================================
+ Hits         1542     1634      +92     
- Misses         40       41       +1     
Impacted Files Coverage Δ
src/cattrs/strategies/_subclasses.py 98.85% <98.85%> (ø)
src/cattrs/strategies/__init__.py 100.00% <100.00%> (ø)
src/cattrs/strategies/_unions.py 97.56% <100.00%> (+0.33%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor Author

@matmel matmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put some comments on what I think deserves some additional scrutiny.

@@ -13,7 +13,10 @@ def genconverter(request):

@pytest.fixture(params=(True, False))
def converter(request, converter_cls):
return converter_cls(detailed_validation=request.param)
args = dict(detailed_validation=request.param)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change means that in the parametrized tests for inheritance, we always test Converter(detailed_validation=True, include_subclasses=True) and Converter(detailed_validation=False, include_subclasses=False). I don't think the two features (subclass support and detailed validation support) interact each other, even in the structuring side. If they were interacting, we should also test Converter(detailed_validation=False, include_subclasses=True) and the other way around...

if getattr(converter, "include_subclasses", False):
cl_tree = tuple(_make_class_tree(cl))
lines += [
" if '__cattrs_already_recursed_subclasses__' in o:",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am shoving my own structuring state in the dictionary to be structured with this hidden key. The goal is to make sure I pass in the structuring function or one of the subclasses function exactly twice for the given dictionary.

If there is no mechanism to keep the stack state this code will end in an infinite recursion. Unfortunately I didn't see a practical way to do it except by putting cattrs private state in the data structure given by the user which sounds hacky...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to myself: this cannot stay as-is. If the structuring fails we pollute the original mapping with those weird looking keys (at least we know where it comes from...). This bit me in unit tests where failing tests polluted the reference unstructured dictionaries used for parametrized tests and in turn broke even more tests. Plus it obviously does not work if the mapping is immutable...

I am thinking to put the mapping in a custom class like:

@define
class _cattrsStackMsg:
    msg: Mapping[str, Any]
    origin: str

That is used to carry the mapping over. This requires a new predicate though, but I think I can make it work at the price of an additional complexity.

@Tinche Tinche self-requested a review October 22, 2022 21:47
@Tinche Tinche added this to the 22.3 milestone Oct 22, 2022
@Tinche
Copy link
Member

Tinche commented Oct 22, 2022

First question: would you be open to the possibility of limiting this new behavior to individual classes, instead of a flag on the Converter instead?

So you'd do:

from cattrs import Converter

c = Converter()

c.customize(Parent, _cattrs_include_subclasses=True)

(customize doesn't exist yet, but will soon)

I'm trying not to overload the Converter __init__, and also you'd be slowing down all the other attrs hooks.

@matmel
Copy link
Contributor Author

matmel commented Oct 22, 2022

Yes, no problem to have this feature opt-in by class type. I had thought about it myself before, but I wasn't sure how to do it so I went with the global flag approach.

I did not mention it before, but I think the current unstructure behavior of Converter (i.e. without the include_subclasses turned on) for a subclass is arguably broken and certainly does not match BaseConverter. Additionally I think the performance impact on the unstructuring side is rather minimal and the very small overhead is paid (one extra function call) when receiving a subclass

EDIT This code is not demonstrating the problem, see my next message for the correct demonstration.

from attrs import defin
from cattrs import Converter, BaseConverter

@define
class A:
    a: int

@define
class B(A):
    b: int

c = Converter()
c.unstructure(B(1, 2))  # {"a": 1} --> Silently unstructured as A

bc = BaseConverter()
bc.unstructure(B(1, 2))  # {"a": 1, "b": 2}  --> OK. This is what I think everyone would expect.

@Tinche
Copy link
Member

Tinche commented Oct 23, 2022

I'm not seeing that result though:

from attrs import define

from cattrs import Converter


@define
class A:
    a: int


@define
class B(A):
    b: int


c = Converter()
print(c.unstructure(B(1, 2)))  # Outputs `{'a': 1, 'b': 2}`

@matmel
Copy link
Contributor Author

matmel commented Oct 23, 2022

Yes, sorry, you are perfectly right, this what happens when you write untested code from memory... Here is the code I should have posted instead which is less bad as the one I posted above:

from attrs import defin
from cattrs import Converter, BaseConverter

@define
class A:
    a: int

@define
class B(A):
    b: int

c = Converter()
# Happily and exactly unstructured as A even if B was given. No warning, no error, but not fundamentally wrong either.
print(c.unstructure(B(1, 2), unstructure_as=A))  # {"a": 1}

bc = BaseConverter()
# BaseConverter on the other hand behaves in what I think several people (myself included) would expect.
bc.unstructure(B(1, 2), unstructure_as=A)  # {"a": 1, "b": 2} 

Obviously the situation is kind made-up but the real-world situation is when you have a 3rd class:

@define
class C:
    a: A

print(c.unstructure(C(B(1, 2))))  # {"a": {"a": 1}}
print(bc.unstructure(C(B(1, 2))))  # {"a": {"a": 1, "b": 2}}

In that situation I think a lot of people would agree that Converter is not right but BaseConverter is. I don't know how much you want BaseConverter and Converter to have the same behaviors in general but considering your latest release with the replacement of BaseConverter by Converter, I can say that this subclass unstructuring behavior is certainly a breaking change.

@Tinche
Copy link
Member

Tinche commented Oct 23, 2022

(Unrelated to the discussion: I've upgraded Poetry on the main branch to use version 1.2.2 which may (hopefully) fix the issue that's stopping the CI, so please rebase.)

I feel like we're getting bogged down here discussing what is essentially an issue of run-time vs compile-time dispatch. The approach that the BaseConverter uses is runtime dispatch - it does a lot of work on every unstructure. That approach was abandoned because it has unfixable flaws - it gets slower the more features get added to cattrs. Hence why the BaseConverter doesn't support generics or newtypes properly, nor can it.

What you want here is the ability to selectively opt-in into runtime dispatch for a class.

Here's a code snippet that works:

from attrs import define

from cattrs import Converter


@define
class A:
    a: int


@define
class B(A):
    b: int


@define
class C:
    a: A


c = Converter()

# This hook is for instances of A, but not instances of subclasses.
base_hook = c.gen_unstructure_attrs_fromdict(A)


def unstructure_a(val: A) -> dict:
    """
    If val is an instance of `A`, use the hook.

    If val is an instance of a subclass, dispatch on its exact
    runtime type.
    """
    if val.__class__ is A:
        return base_hook(val)
    return c.unstructure(val, unstructure_as=val.__class__)

# This needs to use function dispatch, using singledispatch will again
# match A and all subclasses, which is not what we want.
c.register_unstructure_hook_func(lambda cls: cls is A, unstructure_a)

print(c.unstructure(C(B(1, 2))))  # {'a': {'a': 1, 'b': 2}}

I think this can be part of the solution we're looking for.

@matmel
Copy link
Contributor Author

matmel commented Oct 23, 2022

Rebased.

What you want here is the ability to selectively opt-in into runtime dispatch for a class.

Oh yeah, definitely, this is what I have actually implemented in this PR. I even limited the runtime dispatch to the subclasses of the parent type and I generate all of those unstructuring function at compile time of the parent class unless there is a RecursionError.

I am wondering if this runtime dispatch unstructuring behavior for subclasses should always be on by default for backward compatibility reasons especially considering that the runtime impact is fairly minimal I think (but I did not measure it). But if you think it is not worth it, I am OK with that too.

What I want to do in the end is to be able to structure / unstructure without having to create union types everywhere.

@Tinche
Copy link
Member

Tinche commented Oct 25, 2022

Sorry, I'm somewhat busy at work but I'm thinking about this.

@matmel
Copy link
Contributor Author

matmel commented Oct 26, 2022

No problem. Take your time and thanks for keeping me updated.

@Tinche
Copy link
Member

Tinche commented Oct 31, 2022

Alright, coming back to this.

I'm slowly cooling on the converter.customize idea for the simple reason that it's a centralized interface. I'm warming to the concept of strategies instead - a strategy being a high-level modification applied to a converter to accomplish a certain goal. Tagged unions would be an example of a strategy, for example.

Now I'm not super sure what the exact interface for all of this would be yet so we'll have to play around with it.

I also believe I've figured out a cleaner way to implement this. Put the following into cattrs/strategies/subclasses.py:

"""Strategies for customizing subclass behaviors."""
from gc import collect
from typing import Dict, Optional, Tuple, Type, Union

from ..converters import Converter


def include_subclasses(
    cl: Type, converter: Converter, subclasses: Optional[Tuple[Type]] = None
) -> None:
    """
    Modify the given converter so that the attrs/dataclass `cl` is
    un/structured as if it was a union of itself and all its subclasses
    that are defined at the time when this strategy is applied.

    Subclasses are detected using the `__subclasses__` method, or
    they can be explicitly provided.
    """
    # Due to https://github.com/python-attrs/attrs/issues/1047
    collect()
    # This hook is for instances of A, but not instances of subclasses.
    base_hook = converter.gen_unstructure_attrs_fromdict(cl)

    def unstructure_a(val: cl, c=converter) -> Dict:
        """
        If val is an instance of `A`, use the hook.

        If val is an instance of a subclass, dispatch on its exact
        runtime type.
        """
        if val.__class__ is cl:
            return base_hook(val)
        return c.unstructure(val, unstructure_as=val.__class__)

    # This needs to use function dispatch, using singledispatch will again
    # match A and all subclasses, which is not what we want.
    converter.register_unstructure_hook_func(lambda cls: cls is cl, unstructure_a)

    if subclasses is not None:
        subclass_union = Union[(cl, subclasses)]
    else:
        subclass_union = Union[(cl, *cl.__subclasses__())]

    dis_fn = converter._get_dis_func(subclass_union)

    base_struct_hook = converter.gen_structure_attrs_fromdict(cl)

    def structure_a(val: dict, _, c=converter, cl=cl) -> cl:
        dis_cl = dis_fn(val)
        if dis_cl is cl:
            return base_struct_hook(val, cl)
        return c.structure(val, dis_cl)

    converter.register_structure_hook_func(lambda cls: cls is cl, structure_a)

Then, this is how it's used:

from attrs import define

from cattrs import Converter
from cattrs.strategies.subclasses import include_subclasses


@define
class A:
    a: int


@define
class B(A):
    b: int


@define
class C:
    a: A


c = Converter()
include_subclasses(A, c)


print(c.unstructure(C(B(1, 2))))  # {'a': {'a': 1, 'b': 2}}
print(c.structure({"a": {"a": 1, "b": 2}}, C))

Does this implementation work for your use case? If so, please incorporate it into this PR and update the docs and tests.

Later in a different PR we can change it to allow for pluggable union strategies, like we discussed.

@matmel
Copy link
Contributor Author

matmel commented Nov 7, 2022

Hi there,

So I finally found some time to update this PR. Here what I did (and did not):

  1. Rebased on your PR Tin/tagged unions #318
  2. Modified slightly your code as there was a problem with python scoping rules and inner functions as revealed by the tests.
  3. Ported my unit tests to this new API
  4. Missing tests about subclasses argument. I also need to make sure that sub-subclasses that were not in the original argument are not pulled-in which is not the case at the moment.
  5. Doc still missing.

Overall I like this approach a lot. This is much cleaner and less intrusive than my first attempt. I think the strategy API is nice.

Now I don't think this is finished yet. Here is what I think is missing and I'd like your input to get a good API:

  1. Handling properly the overrides. At the moment, it's either you get the subclasses handling or you use the overrides on the Parent, but not both, as one un/structuring replaces the other. Also, the parent overrides should propagate to the childs.
  2. I'd like to have the union tagged union strategy specified when asking the include_subclasses strategy in the same call.

@matmel
Copy link
Contributor Author

matmel commented Nov 13, 2022

I went ahead and I added the overrides argument in include_subclasses to make sure the overrides defined for the parent are applied to the child too. I definitely need this feature but I am open to change the API or the implementation. I am not even sure it belongs to include_subclass strategy function.

I will note though that I did not find a way to get this behavior outside of the include_subclass function. You can see my attempt in the test defined in 1447935 that fails. If you think that those additional behavior indeed belong to include_subclasses then, I think we should also add arguments to trigger the tagged union disambiguation.

@Tinche
Copy link
Member

Tinche commented Nov 30, 2022

Hello! Sorry for the pause, I was attending a retreat and dealing with some other projects.

In the meantime, I've merged tagged union support into main, so please rebase on that.

Before I left I was playing around with how to make the union strategy configurable for include_subclasses. Here's what I had come up with:

def include_subclasses(
    cl: Type,
    converter: Converter,
    subclasses: Optional[Tuple[Type]] = None,
    union_strategy: Optional[Callable[[Any, BaseConverter], Any]] = None,
) -> None:
    """
    Modify the given converter so that the attrs/dataclass `cl` is
    un/structured as if it was a union of itself and all its subclasses
    that are defined at the time when this strategy is applied.

    Subclasses are detected using the `__subclasses__` method, or
    they can be explicitly provided.
    """
    # Due to https://github.com/python-attrs/attrs/issues/1047
    collect()

    if subclasses is not None:
        subclass_union = Union[(cl, subclasses)]
    else:
        subclass_union = Union[(cl, *cl.__subclasses__())]

    if union_strategy is None:

        # This hook is for instances of A, but not instances of subclasses.
        base_hook = converter.gen_unstructure_attrs_fromdict(cl)

        def unstruct_hook(val: cl, c=converter) -> Dict:
            """
            If val is an instance of `A`, use the hook.

            If val is an instance of a subclass, dispatch on its exact
            runtime type.
            """
            if val.__class__ is cl:
                return base_hook(val)
            return c.unstructure(val, unstructure_as=val.__class__)

        dis_fn = converter._get_dis_func(subclass_union)

        base_struct_hook = converter.gen_structure_attrs_fromdict(cl)

        def struct_hook(
            val: dict, _, c=converter, cl=cl, base_struct_hook=base_struct_hook
        ) -> cl:
            dis_cl = dis_fn(val)
            if dis_cl is cl:
                return base_struct_hook(val, cl)
            return c.structure(val, dis_cl)

    else:
        union_strategy(subclass_union, converter)
        unstruct_hook = converter._unstructure_func.dispatch(subclass_union)
        struct_hook = converter._union_struct_registry[subclass_union]

    # This needs to use function dispatch, using singledispatch will again
    # match A and all subclasses, which is not what we want.
    converter.register_unstructure_hook_func(lambda cls: cls is cl, unstruct_hook)
    converter.register_structure_hook_func(lambda cls: cls is cl, struct_hook)

It's not as clean as I would have wanted it but probably good enough for now. Can you incorporate the pluggable union strategy changes too, while keeping support for subclasses that you added?

@matmel matmel force-pushed the subclassing branch 2 times, most recently from aebf0d3 to 005bc24 Compare January 6, 2023 20:33
@matmel
Copy link
Contributor Author

matmel commented Jan 6, 2023

Hello and happy new year!

I finally took the time to get back at this. I made some progresses, but there are still rough edges and I think some clean up could be made. As proposed by you above include_subclasses takes a union_strategycallable. I also added anoverrides` argument to be able to control that also as I need it for my work.

Here is what is working:

  • Structuring / destructuring of subclasses with their proper types if there is no circular reference like:
    @define
    class A:
        a: "List[A]"
    
    @define
    class B(A):
        pass
  • Handling of tagged union and unique field disambiguation if tagged union is not provided. Note that in the later case, circular references work.
  • In all cases, overrides work.

The less good:

  • Code split between tagged union and unique field disambiguation. I think in the long run, this should be merged somehow, but the unique fields union approach is simpler to handle. Indeed for the unstructuring part, there is nothing special to do except making sure to unstructure with the actual class type, whereas for the tagged union, we need to make sure we pass through the union handler.
  • Circular reference with tagged union. This one is very hard to solve I find. I think a more viable approach would be to rework dynamically the type annotations before trying to generate the un/structuring functions. What I mean is in the following code (note: edited to actually use a circular reference in the example):
    @define
    class A:
        a: "List[A]"
    
    @define
     class B(A):
        b: int
    When generating un/structuring code of A, before entering in the cattrs machinery, it should be reworked as:
    @define
    class A:
        a: "List[Union[A, B]]"
    and that would alleviate some of the problem of doing this conceptual union work within the un/structuring functions themselves. At the moment, the attribute a of the class A will always be un/structured as A even if the actual desired instance was B (in case of tagged unions, as mentioned earlier unique field union works). If you look at the unit tests, the circular reference I have put an xfail on the tagged union cases, but I would prefer to have a working solution. Note that at my job, I don't have those circular references yet, but this is a relatively common situation.

@Tinche
Copy link
Member

Tinche commented Jan 9, 2023

Thanks for the update and happy new year! I'm busy finishing up TypedDict support, after that's done I can come back to this and help out before we release the next version.

Thanks for your patience ;)

@matmel
Copy link
Contributor Author

matmel commented Feb 14, 2023

Gentle ping 😃

@Tinche
Copy link
Member

Tinche commented Feb 16, 2023

Gentle ack, this is next on my list, I promise ;)

@Tinche
Copy link
Member

Tinche commented Mar 2, 2023

Alright, I have spare cycles now to help with this. Ok, I see the issue with circular references, will try to work on it a little.

matmel and others added 16 commits April 29, 2023 10:58
A closure factory should return only one inner function. If not, there is a risk of messing up all free variables like cl for instance here. All inner generated functions had the last iterated cl value which obviously did not work well...
Allows to propagate overrides easily to child classes.
* Adapted the code provided by Tin
* New test parameters

Some unstructuring tests with the union strategy fail: infinite recursion error.
Overrides still need to be implemented.
The overrides with union strategy still fail though.

Restructured the code in add_subclasses:
* Skipping the classes without subclasses broke the overrides (without union strategy) and also the unstructuring of unknown class.
* The hook functions are created inline and the closure variables are passed as default argument to workaround the scoping problems encontered before.
@matmel
Copy link
Contributor Author

matmel commented Apr 30, 2023

So I rebased and I completed the documentation. Finally.

@Tinche
Copy link
Member

Tinche commented May 15, 2023

Sweet, a release is long overdue so as soon as I'm finished with TypedDicts I'll come over to get this merged in.

@matmel
Copy link
Contributor Author

matmel commented May 16, 2023

I pushed a commit to fix the linting error. I was using black 22.6.0 instead of 23.1.0. The later removed a line in _unions.py compared to the former.

@Tinche
Copy link
Member

Tinche commented May 21, 2023

Great job @matmel !

Copy link
Member

@Tinche Tinche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might do an editing pass on the docs, but looks good.

@Tinche Tinche merged commit 27e9c0d into python-attrs:main May 22, 2023
@matmel matmel deleted the subclassing branch May 22, 2023 22:58
@AdrianSosic
Copy link
Contributor

Hi guys 👋 It's great that the subclass feature is finally coming in the next release. I've been hoping for it for quite a while, it allows me to get rid of some manual workarounds 👍

I do have one question, though. I'm currently playing around with it and trying to incorporate it into our codebase. In most places, it worked out-of-the-box, but I encountered problems with one of my class hierarchies. When I looked a bit deeper, I suddenly understood what is the problem: the subclass tree is constructed during the call of include_subclasses and not at the time of structuring. This differs from my manual workaround, in which the lookup was done inside the hook itself. The logic was as roughly as follows:

    def structure_base(val: dict, _) -> _T:
        _type = val["type"]
        cls = next((cl for cl in get_subclasses(base) if cl.__name__ == _type), None)
        if cls is None:
            raise ValueError(f"Unknown subclass {_type}.")
        return cattrs.structure_attrs_fromdict(val, cls)

where get_subclasses builds the subclass tree similar to your code.

The fact that the classes are collected in include_subclasses causes a problem to me in that no subclasses can be added later on. Do you see any solution to this problem?

I can imagine at least two use cases where a delayed lookup would be necessary:

  1. (My situation) You have a base class Base defined in base.py and would like to activate the subclass strategy for it (probably in that same file). However, several of its subclasses are defined in different files, e.g. subclass1.py, subclass2.py and so on. In this case, it's not obvious where you would even activate the strategy.
  2. (More generally) You do not own all the subclasses yourself, i.e. some user of your package extends the subclass tree by adding their own subclass implementations.

Maybe I'm even misusing the feature, but then I'd be also happy about getting your input :D

@Tinche
Copy link
Member

Tinche commented Oct 19, 2023

@AdrianSosic I might have a solution for you, but please can you make a separate issue (you can just copy/paste your text into it) so it's easier to find for other folks?

@AdrianSosic
Copy link
Contributor

Sure, will do. After such an enormous cliffhanger, I'm more than curious what you have to say :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants