You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
Short summary about the issue/question:
TorchModuleGraph builds the index for the nodes in jit.trace based on the name of the layers.
When the model reuses a layer(such as relu) at different places, jit.trace will create two nodes for the same layer, these two nodes have the same name, but different inputs and outputs. So, the layername is not a unique identifier of the node globally.
How to reproduce it:
For example, I trace the resnet18 using jit.trace and print the traced details, we can find there are too nodes called layer1.0.relu. This is caused by reusing the same relu layer in the code which is common. %input.8 : Float(1, 64, 56, 56) = aten::relu_(%input.7), scope: __module.layer1/__module.layer1.0/__module.layer1.0.relu # /home/core/anaconda3/envs/znx/lib/python3.6/site-packages/torch/nn/functional.py:912:0 %input.11 : Float(1, 64, 56, 56) = aten::relu_(%input.10), scope: __module.layer1/__module.layer1.0/__module.layer1.0.relu # /home/core/anaconda3/envs/znx/lib/python3.6/site-packages/torch/nn/functional.py:912:0
When I traverse to the "layer1.1.relu" at the bottom of the picture, because it has same name with the "layer1.1.relu" at the top, so when I call find_successor to find the next nodes of the bottom "layer1.1.relu", it will also return the "layer1.1.conv2" which is actually a successor of the top "layer1.1.relu".
nni Environment:
nni version: the latest
nni mode(local|pai|remote):
OS:
python version: 3.6
is conda or virtualenv used?: yes
is running in docker?: No
The text was updated successfully, but these errors were encountered:
Find another bug that: we cannot merge the node only based on the scopename. For example, there are many nodes whose scopename is empty, the following code tries to merge them into several NodeGroup.
`
for tname, nodes in func_to_nodes.items():
print('###', tname)
print(len(nodes))
used = set()
# extract non prim:: nodes
non_prim_nodes = list()
for node in nodes:
if not node.kind().startswith('prim::'):
non_prim_nodes.append(node)
# for each non prim node, expand it
for node in non_prim_nodes:
node_group = self._expand_non_prim_node(node, nodes, input_to_node, output_to_node)
used.update(node_group.node_cpps)
nodes_py.nodes_op.append(node_group)
# get shape infor for view (aten::view) func
if node_group.op_type in ['aten::view', 'aten::flatten']:
node_group.auxiliary = self._extract_shape_info(node)
print(len(set(nodes)-used))
print(set(nodes)-used)
`
However, most of the 'prim' nodes actually belong to the module nodes, so there are quite a few prim nodes that not merged into the graph.
Short summary about the issue/question:
TorchModuleGraph builds the index for the nodes in jit.trace based on the name of the layers.
When the model reuses a layer(such as relu) at different places, jit.trace will create two nodes for the same layer, these two nodes have the same name, but different inputs and outputs. So, the layername is not a unique identifier of the node globally.
How to reproduce it:
For example, I trace the resnet18 using jit.trace and print the traced details, we can find there are too nodes called layer1.0.relu. This is caused by reusing the same relu layer in the code which is common.
%input.8 : Float(1, 64, 56, 56) = aten::relu_(%input.7), scope: __module.layer1/__module.layer1.0/__module.layer1.0.relu # /home/core/anaconda3/envs/znx/lib/python3.6/site-packages/torch/nn/functional.py:912:0 %input.11 : Float(1, 64, 56, 56) = aten::relu_(%input.10), scope: __module.layer1/__module.layer1.0/__module.layer1.0.relu # /home/core/anaconda3/envs/znx/lib/python3.6/site-packages/torch/nn/functional.py:912:0
When I traverse to the "layer1.1.relu" at the bottom of the picture, because it has same name with the "layer1.1.relu" at the top, so when I call find_successor to find the next nodes of the bottom "layer1.1.relu", it will also return the "layer1.1.conv2" which is actually a successor of the top "layer1.1.relu".
nni Environment:
The text was updated successfully, but these errors were encountered: