-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT Function to visualize skops files #317
FEAT Function to visualize skops files #317
Conversation
Resolves skops-dev#301 This is not finished, just a basis for discussion.
Don't add it to parent nodes.
Okay, I made an update to only mark nodes as
|
Code should be much more straightforward now. Moreover, it is refactored so that the sink function now gets an iterator of all nodes, not just of a single node. This is better because printing of nodes can now happen in context. Just as an example, a node could now be printed differently if it is the first node of its level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great, and better than what I originally had. I personally like this version the best: #317 (comment), but that's because I'm colorblind-ish and the colors in the last version bother me a bit, but it'd be nice to have that as an argument on how to format things.
skops/io/_visualize.py
Outdated
return should_print | ||
|
||
|
||
def _check_node_and_children_safe(node: Node, trusted: bool | Sequence[str]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be okay to move this to nodes and cache them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, there is already node.is_safe()
>__<
skops/io/_visualize.py
Outdated
|
||
# use singledispatch so that we can register specialized visualization functions | ||
@singledispatch | ||
def format_node(node: Node) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also don't mind moving these to nodes.
@skops-dev/maintainers This is now ready for review As an example, here is the top part of the visualization of the following estimator: pipeline = Pipeline([
("features", FeatureUnion([
("scaler", StandardScaler()),
("scaled-poly", Pipeline([
("polys", FeatureUnion([
("poly1", PolynomialFeatures()),
("poly2", PolynomialFeatures(degree=3, include_bias=False))
])),
("square-root", FunctionTransformer(unsafe_function)),
("scale", MinMaxScaler()),
])),
])),
("clf", LogisticRegression(random_state=0, solver="liblinear")),
]).fit([[0, 1], [2, 3], [4, 5]], [0, 1, 2])
file = sio.dumps(pipeline)
sio.visualize_tree(file) Lots of stuff can be customized, e.g. if colors should be used (and which), how to mark untrusted types, etc. The visualization requires |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty nice. We do need to add rich
to setup.py
with a key though, somehow in extras. I would have liked to keep your own implementation of sink
, so that if rich
is not installed, or a parameter is passed in a certain way, we just use that.
skops/io/_visualize.py
Outdated
This function visits all nodes of the object tree and determines: | ||
|
||
- level: how nested the node is | ||
- key: the key of the node, e.g. the key of a dict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand 'key' here
The key to the current node. If "key_types" is encountered, it is | ||
skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be skipped? can it not be unsafe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is checked now
skops/io/_visualize.py
Outdated
# key_types is not helpful, as it is artificially added by skops to | ||
# circumvent the fact that json only allows keys to be strings. It is not | ||
# useful to the user and adds a lot of noise, thus skip key_types. | ||
# TODO: check that no funny business is going on in key types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could always hide them, unless they're unsafe maybe?
skops/io/_visualize.py
Outdated
) | ||
|
||
|
||
def visualize_tree( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters that it's a tree. What about just visualize
? Or inspect
(although there's the inspect
module)?
@adrinjalali I experimented a bit and could create a custom implementation that does not rely on |
Perfect.
That's fair, but I'd still add |
- rename visualize_tree => visualize - more explanatory comments - when encountering 'key_types', check type and safety - possibility to visualize without rich - make rich an extra install - more documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed your points, please review again.
The key to the current node. If "key_types" is encountered, it is | ||
skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is checked now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!
Resolves #301
This is not finished, just a basis for discussion.
Here is the output for a simple
MinMaxScaler(feature_range=(-555, 123))
:And below is the output for a bigger
Pipeline
discussed here, but with the optionshow="untrusted"
, i.e. only showing untrusted types:I.e. all the parent nodes are considered untrusted if a child is untrusted. This is now achieved by leveraging
node.get_unsafe_set()
.Before going too deep with this, I would like to discuss what other users and @skops-dev/maintainers think about this. Some open points that come to my mind:
json-type(123)
for instance.numpy.ufunc
, not what ufunc is actually used). I wonder if we should move this to theNode
classes?*Why are some children not visited? Because they can't. E.g. for ndarray, the children are
io.BytesIO
, not nodes.