TorchModuleGraph: graph construct error when the model has shared layers #2485

zheng-ningxin · 2020-05-25T04:13:46Z

Short summary about the issue/question:
TorchModuleGraph builds the index for the nodes in jit.trace based on the name of the layers.
When the model reuses a layer(such as relu) at different places, jit.trace will create two nodes for the same layer, these two nodes have the same name, but different inputs and outputs. So, the layername is not a unique identifier of the node globally.

How to reproduce it:
For example, I trace the resnet18 using jit.trace and print the traced details, we can find there are too nodes called layer1.0.relu. This is caused by reusing the same relu layer in the code which is common.
%input.8 : Float(1, 64, 56, 56) = aten::relu_(%input.7), scope: __module.layer1/__module.layer1.0/__module.layer1.0.relu # /home/core/anaconda3/envs/znx/lib/python3.6/site-packages/torch/nn/functional.py:912:0 %input.11 : Float(1, 64, 56, 56) = aten::relu_(%input.10), scope: __module.layer1/__module.layer1.0/__module.layer1.0.relu # /home/core/anaconda3/envs/znx/lib/python3.6/site-packages/torch/nn/functional.py:912:0

When I traverse to the "layer1.1.relu" at the bottom of the picture, because it has same name with the "layer1.1.relu" at the top, so when I call find_successor to find the next nodes of the bottom "layer1.1.relu", it will also return the "layer1.1.conv2" which is actually a successor of the top "layer1.1.relu".

nni Environment:

nni version: the latest
nni mode(local|pai|remote):
OS:
python version: 3.6
is conda or virtualenv used?: yes
is running in docker?: No

The text was updated successfully, but these errors were encountered:

zheng-ningxin · 2020-06-02T12:56:18Z

Find another bug that: we cannot merge the node only based on the scopename. For example, there are many nodes whose scopename is empty, the following code tries to merge them into several NodeGroup.
`

    for tname, nodes in func_to_nodes.items():
        print('###', tname)
        print(len(nodes))
        used = set()
        # extract non prim:: nodes
        non_prim_nodes = list()
        for node in nodes:
            if not node.kind().startswith('prim::'):
                non_prim_nodes.append(node)
        # for each non prim node, expand it
        for node in non_prim_nodes:
            node_group = self._expand_non_prim_node(node, nodes, input_to_node, output_to_node)
            used.update(node_group.node_cpps)
            nodes_py.nodes_op.append(node_group)
            # get shape infor for view (aten::view) func
            if node_group.op_type in ['aten::view', 'aten::flatten']:
                node_group.auxiliary = self._extract_shape_info(node)
        print(len(set(nodes)-used))
        print(set(nodes)-used)

`
However, most of the 'prim' nodes actually belong to the module nodes, so there are quite a few prim nodes that not merged into the graph.

zheng-ningxin · 2020-06-03T06:19:30Z

#2524

chicm-ms self-assigned this May 25, 2020

scarlett2018 mentioned this issue May 28, 2020

NNI Backlog - welcome to comment and suggest #1917

Closed

100 tasks

SparkSnail mentioned this issue May 29, 2020

【Released】Iteration plan for June #2507

Closed

38 tasks

zheng-ningxin mentioned this issue Jun 1, 2020

Analysis utils #2435

Merged

QuanluZhang linked a pull request Jun 10, 2020 that will close this issue

Bugfix issue2485 #2524

Merged

chicm-ms closed this as completed in #2524 Jun 11, 2020

chicm-ms mentioned this issue Jul 1, 2020

June release end game #2621

Closed

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchModuleGraph: graph construct error when the model has shared layers #2485

TorchModuleGraph: graph construct error when the model has shared layers #2485

zheng-ningxin commented May 25, 2020

zheng-ningxin commented Jun 2, 2020 •

edited

Loading

zheng-ningxin commented Jun 3, 2020

TorchModuleGraph: graph construct error when the model has shared layers #2485

TorchModuleGraph: graph construct error when the model has shared layers #2485

Comments

zheng-ningxin commented May 25, 2020

zheng-ningxin commented Jun 2, 2020 • edited Loading

zheng-ningxin commented Jun 3, 2020

zheng-ningxin commented Jun 2, 2020 •

edited

Loading