Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up the estimation procedure for higher cuda/torch version #38

Open
JustinYuu opened this issue Aug 19, 2021 · 3 comments
Open

Comments

@JustinYuu
Copy link

JustinYuu commented Aug 19, 2021

Hi,
I would like to provide an acceleration strategy that can address the problem of slow optical flow estimation speed in >pytorch1.1 version. Since we want to predict the dense optical flow of a long 15fps or 30fps video, the time consumption could be a big concern with respect to the GPU utils and training speed.

After analyzing the entire training procedure of the PWC-Net, I found that the time bottleneck mainly came from two parts: correlation function and ContextNetwork.

For the correlation function, since the dockerfile is torch 1.1.0 + cuda 9.0, it can not be applied for cuda 10.1 version directly. For this part, a simple strategy is to change the 'cuda path' in l.21 of setup.py to the current cuda version. We have tested that the 10.1, 10.2, and 11.0 are all available. Besides, the higher cuda version requires higher gcc and g++ versions. The results show that gcc/g++ 7.3.0/7.3.1 is ok for almost all kinds of cuda and torch versions, while gcc/g++ 4.9 is only available for PyTorch 1.1.0+torchvision 0.3.0. If you directly update the gcc/g++ version from 4.9 to 7.3, an additional error may be thrown as "ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found" In this case, the first step is to check if there remain applicable libstdc++ version by the following command: strings /data/anaconda3/bin/../lib/./libstdc++.so.6 | grep GLIBCXX_3.4.20. If there exists, add this path to the environment variables as follows: export LD_LIBRARY_PATH=/data/anaconda3/lib:$LD_LIBRARY_PATH. After these operations, the cuda acceleration version of the correlation function is available for almost any kind of cuda/torch/torchvision version. And the speed is around 2x~5x faster than the python version correaltion_native.py.

Moreover, the most time-consuming part is the ContextNetwork part, which is very counter-intuitive and hard to find. We do not know whether this problem belongs to the PyTorch version conflict, but we did find a way to fix it. Specifically, if you use pytorch>1.1.0 to run the pwclite network, no matter using the correlation_cuda or correlation_native, the entire optical flow estimation time cost will be around 5x~10x higher than the PyTorch 1.1 version. We analyze the time cost of each part in the pwclite.py and localize the ContextNetwork class. However, this class is extremely simple since it only contains a sequence of conv functions, so we try to modify the conv function in l.12, pwclite.py. If we change the bias parameter to False, the speed will be as fast as the original version, while setting the bias parameter to True could lead to a slower estimation. However, though this function is applied in many classes, such as FeatureExtractor, FlowEstimatorDense, etc., the time costs do not change for all classes except the ContextNetwork classes, which I feel weird. Whatever, if we simply change the bias to False, the speed will be normal. However, we do not know whether this change could lead to any influence on the performance, thus we decide to only change the bias parameter in the ContextNetwork class, which can be implemented by adding a control argument in the function conv. By doing so, we only change the seven convolution layers in the ContextNetwork class, hoping will not impact the performance badly.

After performing these two changes above, the optical flow estimation speed will have a 5x~20x boost. I am aware that simply using the dockerfile and the fixed environment is a simpler way to reproduce, yet we hope our experience will help more researchers to expand this nice work to generalize more environments.

@JustinYuu JustinYuu changed the title Speeding up the estimation procedure Speeding up the estimation procedure for higher cuda/torch version Aug 19, 2021
@wrainbow0705
Copy link

Hi,i meet this issue, too.Have you tried to set bias to false?Did it decrease the performance?

@JustinYuu
Copy link
Author

Hi,i meet this issue, too.Have you tried to set bias to false?Did it decrease the performance?

The performance will decrease if directly leveraging the pre-trained weight for optical flow estimation, but after training the modified network for a few epochs, the performance of optical flow estimation will become acceptable for downstream applications.

@wrainbow0705
Copy link

Hi,i meet this issue, too.Have you tried to set bias to false?Did it decrease the performance?

The performance will decrease if directly leveraging the pre-trained weight for optical flow estimation, but after training the modified network for a few epochs, the performance of optical flow estimation will become acceptable for downstream applications.

Thanks!It helps me a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants