Update balanced_positive_negative_sampler.py #93

Jiayuan-Gu · 2018-11-01T22:28:57Z

Avoid unnecessary and unexpected copy from gpu to cpu.

I am trying to implement a multi-thread data parallel module instead of distributed modules. I have modified the data structure and almost finish it. I find a fatal bug when you use cpu version torch.randperm and multi-thread, it will raise Segmentation Fault during forward around a certain iteration.

I heard this repo is created for a benchmark performance contest, so I wonder whether you care about flexibility apart from speed. Distributed learning has to load the whole dataset multiple times. I guess it is why you do not use fixed proposals like Detectron, since proposal files need large memory, which your implementation is hard to support.

use cuda version torch.randperm to avoid copy from gpu to cpu and a fatal bug in multi-thread cpu version

fmassa · 2018-11-01T23:52:52Z

Hi,

Thanks for the PR!

The reason why I didn't use a device-specific randperm was that it was not available when I first wrote it.

About using DataParallel instead of DistributedDataParallel, I believe it might end up being a bit harder to implement, and also slower (at least if we use native DataParallel, which is bottlenecked by Python).

It might also be worth noting that for small tensors ( <30000), randperm on cuda of loads the computation to the CPU, see https://github.com/pytorch/pytorch/blob/b818d31a3e8bbe8fd13c520de145d84b0641cf70/aten/src/ATen/native/cuda/TensorFactories.cu#L65

But thanks for the PR, it looks good to me!

fmassa

Thanks!

Jiayuan-Gu · 2018-11-02T19:07:24Z

The fatal bug I encounter is just like pytorch/pytorch#1868.
There is something wrong with THRandom_random () in multi thread. I think it is fatal and must have a higher priority to fix?

If you need to reproduce the error, you could use my branch https://github.com/Jiayuan-Gu/maskrcnn-benchmark/tree/dev. Using quick_schedules/res50_faster_rcnn_fpn, you might get error around before 2000 iterations. For full schedules, res50_faster_rcnn_fpn, it raises error near 580 iterations.

However, using cuda wrapper of randperm could avoid this bug.

fmassa · 2018-11-02T22:52:40Z

@Jiayuan-Hi thanks for the link, it is very interesting!
I have a few questions:

Did you benchmark the runtime speed compared to using DistributedDataParallel?
How would you need to adapt the code to use multiple machines for training?

Jiayuan-Gu · 2018-11-03T00:24:25Z

I test it on 2 1080Ti GPUs with total batch size 4. Same accuracy for training res50FPN with 1epoch on coco2017train.

Speed goes from 0.45 -> 0.48 s/iter. Data time from 0.005 -> 0.01s/batch
I think most guys including me do not have multiple machines to run codes... Distributed programming is not straightforward.

What is your memory usage because you have to load annotation 8 times if you use 8 GPUs?

fmassa · 2018-11-03T16:44:12Z

Thanks for those numbers!

I don't have the exact numbers now, but as we load images on the fly, and we can control the number of data loading threads per process, we can control the amount of memory that is used.

I understand that it is easier to have a single file for the proposals, to a single dataset. But given that the current PyTorch + data loading model is to load images / data on the fly using multiple threads, I'd say that it might be more adapted to instead split the single file containing the proposals into multiple files, one per image. This way, augmenting the number of processes doesn't increase in a significant manner the amount of memory used, and we can amortized the loading cost by using more data loading processes.

Thoughts?

Summary: When we added `randperm_cpu` and `THTensor_(randperm)` we forgot to lock the `THGenerator` mutex before calling `THRandom_random`, which causes segfault error mentioned in facebookresearch/maskrcnn-benchmark#93 (comment). This PR fixes the bug. Closes #1868. Pull Request resolved: #13832 Differential Revision: D13025453 Pulled By: yf225 fbshipit-source-id: 6e363a35c72b4862412eaea6516a154126634c9d

Summary: When we added `randperm_cpu` and `THTensor_(randperm)` we forgot to lock the `THGenerator` mutex before calling `THRandom_random`, which causes segfault error mentioned in facebookresearch/maskrcnn-benchmark#93 (comment). This PR fixes the bug. Closes pytorch/pytorch#1868. Pull Request resolved: pytorch/pytorch#13832 Differential Revision: D13025453 Pulled By: yf225 fbshipit-source-id: 6e363a35c72b4862412eaea6516a154126634c9d

use cuda version torch.randperm to avoid copy from gpu to cpu and a fatal bug in multi-thread cpu version

Update balanced_positive_negative_sampler.py

4452fe7

use cuda version torch.randperm to avoid copy from gpu to cpu and a fatal bug in multi-thread cpu version

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 1, 2018

fmassa approved these changes Nov 1, 2018

View reviewed changes

fmassa merged commit b418212 into facebookresearch:master Nov 1, 2018

Jiayuan-Gu deleted the Jiayuan-Gu-patch-1 branch November 2, 2018 18:55

yf225 mentioned this pull request Nov 12, 2018

Use RNG mutex for randperm on CPU pytorch/pytorch#13832

Closed

nprasad2021 pushed a commit to nprasad2021/maskrcnn-benchmark that referenced this pull request Jan 29, 2019

Update balanced_positive_negative_sampler.py (facebookresearch#93)

503e170

use cuda version torch.randperm to avoid copy from gpu to cpu and a fatal bug in multi-thread cpu version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update balanced_positive_negative_sampler.py #93

Update balanced_positive_negative_sampler.py #93

Jiayuan-Gu commented Nov 1, 2018

fmassa commented Nov 1, 2018

fmassa left a comment

Jiayuan-Gu commented Nov 2, 2018 •

edited

Loading

fmassa commented Nov 2, 2018 •

edited

Loading

Jiayuan-Gu commented Nov 3, 2018 •

edited

Loading

fmassa commented Nov 3, 2018

Update balanced_positive_negative_sampler.py #93

Update balanced_positive_negative_sampler.py #93

Conversation

Jiayuan-Gu commented Nov 1, 2018

fmassa commented Nov 1, 2018

fmassa left a comment

Choose a reason for hiding this comment

Jiayuan-Gu commented Nov 2, 2018 • edited Loading

fmassa commented Nov 2, 2018 • edited Loading

Jiayuan-Gu commented Nov 3, 2018 • edited Loading

fmassa commented Nov 3, 2018

Jiayuan-Gu commented Nov 2, 2018 •

edited

Loading

fmassa commented Nov 2, 2018 •

edited

Loading

Jiayuan-Gu commented Nov 3, 2018 •

edited

Loading