Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested_dict_to_tree breaks for Nodes with required args #338

Open
2 tasks done
mat-adamec opened this issue Jan 16, 2025 · 2 comments
Open
2 tasks done

nested_dict_to_tree breaks for Nodes with required args #338

mat-adamec opened this issue Jan 16, 2025 · 2 comments
Labels
question Further information is requested

Comments

@mat-adamec
Copy link

mat-adamec commented Jan 16, 2025

Describe the issue

The nested_dict_to_tree function explicitly builds nodes by calling node_type(node_name...) here.

Names are an optional attribute of Nodes as defined within the bigtree documentation. When one subclasses a bigtree node and adds required arguments, then calling nested_dict_to_tree with the subclass as the node_type breaks down, because this line takes the first argument as the name of the node, rather than taking the argument defined by the name_key.

I would classify this as a bug, as while it produces consistent behavior in bigtree with no augmentations, it harms the extensibility of the package without bringing any clear benefit (at least as far as I can see).

I believe this specific issue could be solved, without introducing additional bugs, by rewriting the line to:

root = node_type(name=node_name, parent=parent_node, **child_dict)

Environment

  • Platform: Windows 10
  • Python version: 3.11
  • bigtree version: 0.19.3

To Reproduce

Though the issue is generic, I offer an example to illustrate and make reproducibility simple.

Consider the following subclass node type.

class FooNode(Node):
    def __init__(
        self,
        bar: str,
        name: str = "",
        **kwargs: Any,
    ):
        super().__init__(name, **kwargs)
        self.bar = bar

With the following nested dictionary:

sample = {"name": "a", "foo": 1, "children": [{"name": "b", "foo": 2}, {"name": "c", "foo": 3, "children": [{"name": "d", "foo": 4}]}]}

Such that we have the following tree structure:

a (foo = 1)
--- b (foo=2)
--- c (foo=3)
      --- d (foo=4)

Calling nested_dict_to_tree on this with node_type set to FooNode will result in an error, as the following occurs:

Line 915 pops the value of the name key, say "a":
node_name = child_dict.pop(name_key)

Line 921 initializes a FooNode with "a" as the first argument, but the first argument of FooNode is bar (since bar is a required argument):
root = node_type(node_name, parent=parent_node, **child_dict)

The actual name for FooNode is thus never set, since name has already been popped off of the dictionary and provided to the incorrect argument in the Node's init signature.

Expected behaviour

I would expect the error to be avoided, since a name is provided, it's just not being provided to the appropriate argument in the function signature. At risk of repetition, the produced tree would look like:

a (foo = 1)
--- b (foo=2)
--- c (foo=3)
      --- d (foo=4)

where foo is an attribute of the node.

Additional context

I am not sure whether the same issue arises in other constructors in the library, as I am primarily using nested_dict_to_tree. If it does, then it would be worth fixing for the sake of proper extensibility (and, if this change is made to nested_dict_to_tree, consistency).

Final Check

  • I will watch this issue to stay updated with any comments or requests for additional information. I understand that timely responses will help in resolving the issue more quickly.

  • I have thoroughly searched the documentation, and I have also checked for existing solutions or similar issues before submitting this issue.

@mat-adamec mat-adamec added the bug Something isn't working label Jan 16, 2025
@kayjan
Copy link
Owner

kayjan commented Jan 18, 2025

Hello! For using custom node type, there is already a test case for every kind of constructor (e.g., dataframe to tree, dict to tree etc.) and this is the test case for nested_dict_to_tree (test_nested_dict_to_tree_custom_node_type) if you want to see how it is used.

The intended usage is different from how you are using it. In Node, name is a mandatory attribute (not optional as you described), so it should be in the constructor of FooNode without the default of empty string, and as the first argument. name is the only mandatory attribute, so that is the only requirement to extend Node class, that the first argument has to be name, and other arguments can be as-is/whatever order you want.

For your suggestion to change the line to root = node_type(name=node_name, parent=parent_node, **child_dict), that will only work if all the classes that extend Node class has their attribute as name, which is a stricter requirement and thus would make it less extensible.

The below code will work, which is your code but reordering the attribute to make name the first argument. I also renamed the argument from name to foo_name which you can try with the current code vs. your suggestion to illustrate the difference.

from typing import Any

from bigtree import Node, nested_dict_to_tree


class FooNode(Node):
    def __init__(
        self,
        foo_name: str,  # changed this
        bar: str,
        **kwargs: Any,
    ):
        super().__init__(foo_name, **kwargs)
        self.bar = bar


sample = {
    "name": "a",
    "bar": "test0",  # modified this to include mandatory 'bar'
    "foo": 1,
    "children": [
        {"name": "b", "bar": "test1", "foo": 2},
        {"name": "c", "bar": "test2", "foo": 3, "children": [{"name": "d", "bar": "test3", "foo": 4}]},
    ],
}

root = nested_dict_to_tree(
    sample, node_type=FooNode
)
root.show(all_attrs=True)
# a [bar=test0, foo=1]
# ├── b [bar=test1, foo=2]
# └── c [bar=test2, foo=3]
#     └── d [bar=test3, foo=4]

@kayjan kayjan added question Further information is requested and removed bug Something isn't working labels Jan 18, 2025
@mat-adamec
Copy link
Author

mat-adamec commented Jan 21, 2025

@kayjan

Thank you for the response and thorough explanation.

However, I still have quite a bit of confusion about what you have said. Specifically,

The intended usage is different from how you are using it. In Node, name is a mandatory attribute (not optional as you described), so it should be in the constructor of FooNode without the default of empty string, and as the first argument. name is the only mandatory attribute, so that is the only requirement to extend Node class, that the first argument has to be name, and other arguments can be as-is/whatever order you want.

This claim does not appear to be true from my understanding of the code. Here is the function signature for __init__ for Node:

def __init__(self, name: str = "", sep: str = "/", **kwargs: Any):

Name is an optional attribute. The superclass of Node, BaseNode, furthermore has no name attribute at all in its signature.

It is true that every node will have a name, as it has a default parameter assigned to it of the empty string (and errors if it somehow goes through with no name), but that doesn't change the fact that name is, indeed, an optional attribute in the signature.

For your suggestion to change the line to root = node_type(name=node_name, parent=parent_node, **child_dict), that will only work if all the classes that extend Node class has their attribute as name, which is a stricter requirement and thus would make it less extensible.

This is already a requirement (again, from my understanding). Node raises an error if it has no self.node_name attribute, and the self.node_name attribute is a property which calls self.name. Thus, all Node subclasses must have a name attribute. If one isn't supplied, then name is defaulted to the empty string. But the present implementation of the line sends the name to whatever the first required arg of the __init__ is, which is absolutely not guaranteed to be name, since name seems to be optional.

I understand that your example works because it functionally makes the name attribute the first argument. But since this isn't a pattern implemented by Node, where name does not seem to be a required argument, I would assume extensions should follow the established pattern and continue to have name as optional (if it isn't required by any other programmatic constraints). The introduction of any required arguments, then, breaks the line by shoving the required argument to the name of the Node. The test case works for the same reason; it implements name as the first required arg, even though the base class Node doesn't.

It seems that either Node should be rewritten so as to have name as a required input:

Node(name: str...)

instead of

Node(name: str = "")

or I believe the issue I've pointed out is a legitimate bug.

Of course, there's the final possibility that I am missing something, in which case I would appreciate clarification. I greatly appreciate all the work you have done in building this library!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants