Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi gradcheck failed #1

Open
mks0601 opened this issue Dec 9, 2017 · 16 comments
Open

Hi gradcheck failed #1

mks0601 opened this issue Dec 9, 2017 · 16 comments

Comments

@mks0601
Copy link

mks0601 commented Dec 9, 2017

Hi thanks for sharing your implementation.

I want to use RoIAlign layer in my pytorch code, and I found your implementation.
To verify your implementation, I ran the test.py and the gradcheck failed.
Did you check the code?

@longcw
Copy link
Owner

longcw commented Dec 10, 2017

I also noticed this problem. There is a gap between numerical grad and analytical grad.
But outputs and grads of pytorch version and tensorflow version are almost the same.

 numerical:(
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.2012  0.0000  0.0000  0.0000
 0.6258  0.1490  0.0000  0.0000
 0.0000  0.6855  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0373  0.0000  0.1788  0.0000
 0.1341  0.0298  0.5662  0.1192
 0.0000  0.1490  0.0000  0.5960
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0596  0.0000
 0.0000  0.0000  0.2086  0.0596
 0.0000  0.0000  0.0000  0.2384
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
[torch.FloatTensor of size 25x4]
,)
analytical:(
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.2111  0.0000  0.0000  0.0000
 0.6141  0.1408  0.0000  0.0000
 0.0000  0.6844  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0447  0.0000  0.1893  0.0000
 0.1300  0.0298  0.5507  0.1263
 0.0000  0.1449  0.0000  0.6138
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0665  0.0000
 0.0000  0.0000  0.1934  0.0444
 0.0000  0.0000  0.0000  0.2156
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
[torch.FloatTensor of size 25x4]
,)

@mks0601
Copy link
Author

mks0601 commented Dec 10, 2017 via email

@longcw
Copy link
Owner

longcw commented Dec 10, 2017

I am not working on Mask RCNN.
BTW, I found that this layer can pass the gradcheck if I set eps=1e-3.
eps is the perturbation for finite differences.

gradcheck(roi_align, (image_torch, boxes, box_index), eps=1e-3)

output (max_val, min_error, max_error, mean_error):

('forward:', 0.87139809, 0.0, 7.0184469e-06, 5.5792748e-07)
('backward:', 1.0228419, 0.0, 1.7911196e-05, 9.7078487e-09)
test ok

@mks0601
Copy link
Author

mks0601 commented Dec 10, 2017 via email

@longcw
Copy link
Owner

longcw commented Dec 10, 2017

@mks0601 Try to modify the random input image:

# image_data = np.random.randn(batch_size, depth, im_height, im_width).astype(np.float32)
# =>
image_data = np.random.rand(batch_size, depth, im_height, im_width).astype(np.float32)

@mks0601
Copy link
Author

mks0601 commented Dec 10, 2017 via email

@longcw
Copy link
Owner

longcw commented Dec 10, 2017

I don't think this is the problem of the implementation. It's the problem we using gradcheck.
Changing randn to rand actually decreases the max value of inputs. It can always pass the check if eps > max(inputs)/500, whatever the input is.

I don't know the real reason. You can check the gradcheck function and the source code if you want to figure out the reason for this problem.

@mks0601
Copy link
Author

mks0601 commented Dec 10, 2017 via email

@longcw
Copy link
Owner

longcw commented Dec 10, 2017

What you want is crop_and_resize.

import numpy as np
import torch
from torch.autograd import Variable

from roi_align.roi_align import RoIAlign


def to_varabile(arr, requires_grad=False, is_cuda=True):
    tensor = torch.from_numpy(arr)
    if is_cuda:
        tensor = tensor.cuda()
    var = Variable(tensor, requires_grad=requires_grad)
    return var


# inputs
is_cuda = False
image_data = np.tile(np.arange(7, dtype=np.float32), 7).reshape(7, 7)
image_data = image_data[np.newaxis, np.newaxis]
boxes_data = np.asarray([[0, 0, 2, 2]], dtype=np.float32)
box_index_data = np.asarray([0], dtype=np.int32)

image_torch = to_varabile(image_data, requires_grad=True, is_cuda=is_cuda)
boxes = to_varabile(boxes_data, requires_grad=False, is_cuda=is_cuda)
box_index = to_varabile(box_index_data, requires_grad=False, is_cuda=is_cuda)

# set transform_fpcoor to False is the crop_and_resize
roi_align = RoIAlign(3, 3, transform_fpcoor=False)
print(roi_align(image_torch, boxes, box_index))

output:

(0 ,0 ,.,.) = 
  0  1  2
  0  1  2
  0  1  2
[torch.cuda.FloatTensor of size 1x1x3x3 (GPU 0)]

If use RoIAlign in this implimentation:

# input
...
boxes_data = np.asarray([[0, 0, 3, 3]], dtype=np.float32)
...
roi_align = RoIAlign(3, 3, transform_fpcoor=True)
print(roi_align(image_torch, boxes, box_index))

output:

Variable containing:
(0 ,0 ,.,.) = 
  0  1  2
  0  1  2
  0  1  2
[torch.FloatTensor of size 1x1x3x3]

You can read more about the roialign here:
https://github.com/ppwwyyxx/tensorpack/blob/6d5ba6a970710eaaa14b89d24aace179eb8ee1af/examples/FasterRCNN/NOTES.md
https://github.com/ppwwyyxx/tensorpack/blob/6d5ba6a970710eaaa14b89d24aace179eb8ee1af/examples/FasterRCNN/model.py#L316

@mks0601
Copy link
Author

mks0601 commented Dec 10, 2017

Oh, I misunderstood the code of yours. Thanks for clarifying.
However, it is hard for me to understand the link you provided :(

Can you help me to understand the link you provided? I read the NOTES.md, however I cannot understand why crop_and_resize is different from roi_align except the input form (normalized/unnormalized coordinates?). Also, I cannot understand the code.

If I just set the boxes_data as [xmin, ymin, xmax+1, ymax+1] and set transform_fpcoor=True, then it seems works well so far.
And can you let me know 'just set the boxes_data as [xmin, ymin, xmax+1, ymax+1]' is correct? Do the fpcoord stand for feature plane corodinates?

@longcw
Copy link
Owner

longcw commented Dec 11, 2017

Crop_and_resize (bilinear sample assumes floating point coordinate (0.0, 0.0) is the same as pixel value (0, 0):
crop_and_resize

RoIAlign: split the RoI into crop_size grids with the same size first, then bilinear sample the value for each grid:
roi_align

To use crop_and_resize for RoIAlign, we shift the grids with -0.5:
roi_align_shifted

In your case, the crop is

Variable containing:
(0 ,0 ,.,.) = 
  0.0000  0.0000  0.0000
  0.0000  0.5000  1.1667
  0.0000  0.5000  1.1667
[torch.FloatTensor of size 1x1x3x3]

if you set bbox=[0, 0, 2, 2]:
roi_align_2

@mks0601
Copy link
Author

mks0601 commented Dec 11, 2017

Great help. Thank you.
So the difference arises from dividing roi into grids.
Then, I think just using [xmin, ymin, xmax+1, ymax+1] can output desired value where each values are float coordinates.
Is that right?

@mks0601
Copy link
Author

mks0601 commented Dec 14, 2017

Sorry, but I still cannot understand the code. What is the input of your roi_align module?
Let`s say the bounding box coordinate of roi is (xmin, ymin, xmax, ymax). Then, what is the input of your roi_align module?

spacing_w is function of x1-x0, not x1-x0+1. So, I think xmax and ymax should be ++. Also, I cannot understand why we have to subtract 0.5.

what if just

x0, y0, x1, y1 = tf.split(boxes, 4, axis=1)

nx0 = x0 / tf.to_float(image_shape[1] - 1)
ny0 = y0 / tf.to_float(image_shape[0] - 1)

nx1 = x1 / tf.to_float(image_shape[1] - 1)
ny1 = y1 / tf.to_float(image_shape[0] - 1)

return tf.concat([ny1, nx1, ny1, nx1], axis=1)

and transform_fpcoor = False?

@tensorboy
Copy link

Hi, @mks0601, you may try how to use it for MASK-RCNN at here: https://github.com/tensorboy/Pytorch_Mask_RCNN. :)

@fitsumreda
Copy link

@tensorboy i couldn't access the link.
Could you share a working link?

@pachiko
Copy link

pachiko commented Jun 21, 2019

Crop_and_resize (bilinear sample assumes floating point coordinate (0.0, 0.0) is the same as pixel value (0, 0):
crop_and_resize

RoIAlign: split the RoI into crop_size grids with the same size first, then bilinear sample the value for each grid:
roi_align

To use crop_and_resize for RoIAlign, we shift the grids with -0.5:
roi_align_shifted

In your case, the crop is

Variable containing:
(0 ,0 ,.,.) = 
  0.0000  0.0000  0.0000
  0.0000  0.5000  1.1667
  0.0000  0.5000  1.1667
[torch.FloatTensor of size 1x1x3x3]

if you set bbox=[0, 0, 2, 2]:
roi_align_2

The pictures are missing... Would be great if you can reupload them :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants