Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Gluon hybridize is not perfect. #16752

Open
JONGGON opened this issue Nov 7, 2019 · 11 comments
Open

Gluon hybridize is not perfect. #16752

JONGGON opened this issue Nov 7, 2019 · 11 comments
Assignees
Labels
Bug Gluon Python v1.x Targeting v1.x branch

Comments

@JONGGON
Copy link

JONGGON commented Nov 7, 2019

Description

(A clear and concise description of what the bug is.)
I am implementing yolov3.

If hybridize is not active, no problem will occur.
When active, the following error is generated:

not one output from hybrid_block
If you output multiple results,
The backward operation doesn't seem to find the right address.

Error Message

Traceback (most recent call last):
File "/home/jg/Downloads/GLUON-Detector/Yolov3_Detector/main.py", line 154, in
plot_class_thresh=plot_class_thresh)
File "/home/jg/Downloads/GLUON-Detector/Yolov3_Detector/train.py", line 353, in run
autograd.backward(total_loss)
File "/home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/autograd.py", line 267, in backward
ctypes.c_void_p(0)))
File "/home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/base.py", line 253, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator node_0_backward: [01:34:41] src/imperative/./imperative_utils.h:753: Check failed: g.GetAttr<size_t>("storage_type_num_unknown_nodes") == 0U (6 vs. 0) :
Stack trace:
[bt] (0) /home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4b09db) [0x7f8c50b709db]
[bt] (1) /home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x263d4d7) [0x7f8c52cfd4d7]
[bt] (2) /home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x263e52f) [0x7f8c52cfe52f]
[bt] (3) /home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2618805) [0x7f8c52cd8805]
[bt] (4) /home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2619730) [0x7f8c52cd9730]
[bt] (5) /home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x261d2a5) [0x7f8c52cdd2a5]
[bt] (6) /home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x263d5ef) [0x7f8c52cfd5ef]
[bt] (7) /home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::Imperative::Backward(std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, bool, bool, bool)+0x326c) [0x7f8c52d3361c]
[bt] (8) /home/jg/anaconda3/envs/mxnetcuda/lib/python3.6/site-packages/mxnet/libmxnet.so(MXAutogradBackwardEx+0x573) [0x7f8c52c23043]

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here
@JONGGON JONGGON added the Bug label Nov 7, 2019
@sxjscience
Copy link
Member

sxjscience commented Nov 7, 2019

@JONGGON There are a few know inconsistent behaviors the HybridBlock: #16279, #16140

There are some tips to avoid these issues:

  1. You should avoid a.reshape(shape=...) and use F.reshape(a, shape=...).
  2. You need to avoid index slicing, e.g., a[:, 1], in the HybridBlock.

@samskalicky
Copy link
Contributor

@zachgk assign @szha

@JONGGON
Copy link
Author

JONGGON commented Nov 12, 2019

@samskalicky
Copy link
Contributor

@JONGGON did any of the suggestions from @sxjscience help?

@JONGGON
Copy link
Author

JONGGON commented Nov 12, 2019

@samskalicky Hi sam

I think it's not a problem that can be solved by the methods @sxjscience suggests.

At present, we do not use hybridize mode
Hybridize only before calling net.export and extract json and param files.

@JONGGON
Copy link
Author

JONGGON commented Nov 12, 2019

error message is below !

File "C:/Users/JG/Desktop/GLUON-Detector/YoloV3_Detector/main.py", line 163, in
plot_class_thresh=plot_class_thresh)
File "C:\Users\JG\Desktop\GLUON-Detector\YoloV3_Detector\train.py", line 364, in run
autograd.backward(scaled_loss)
File "C:\ProgramData\Anaconda3\lib\site-packages\mxnet\autograd.py", line 267, in backward
ctypes.c_void_p(0)))
File "C:\ProgramData\Anaconda3\lib\site-packages\mxnet\base.py", line 253, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator node_0_backward: [04:24:50] c:\jenkins\workspace\mxnet-tag\mxnet\src\imperative./imperative_utils.h:725: Check failed: g.GetAttr<size_t>("storage_type_num_unknown_nodes") == 0U (9 vs. 0) :

in my opinion
When 'hybridblock' has multiple outputs
It seems that the correct location is not found in the 'backward' operation.

@leezu
Copy link
Contributor

leezu commented Nov 13, 2019

@JONGGON can you post a minimal example to reproduce the error? Thank you!

@JONGGON
Copy link
Author

JONGGON commented Nov 13, 2019

@leezu
ok, I will take the time to create sample examples as soon as possible.

@JONGGON
Copy link
Author

JONGGON commented Nov 13, 2019

@leezu
I did as you taught but reproduce the same error than I mentioned above..

@JONGGON
Copy link
Author

JONGGON commented Nov 17, 2019

@samskalicky @leezu Hi guys!!!

Finally solved.
anchor = F.identity (anchor)
offset = F.identity (offset)
stride = F.identity (stride)
When creating a model, copy it as an identity once.

How did i find out?
It's a little bit new, and I found out in the process of outputting the training model to onnx and importing the output onnx file.
 
The hybridblock doesn't seem to be perfect.

@szha szha added the v1.x Targeting v1.x branch label Jul 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Gluon Python v1.x Targeting v1.x branch
Projects
None yet
Development

No branches or pull requests

6 participants