You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I would like to provide an acceleration strategy that can address the problem of slow optical flow estimation speed in >pytorch1.1 version. Since we want to predict the dense optical flow of a long 15fps or 30fps video, the time consumption could be a big concern with respect to the GPU utils and training speed.
After analyzing the entire training procedure of the PWC-Net, I found that the time bottleneck mainly came from two parts: correlation function and ContextNetwork.
For the correlation function, since the dockerfile is torch 1.1.0 + cuda 9.0, it can not be applied for cuda 10.1 version directly. For this part, a simple strategy is to change the 'cuda path' in l.21 of setup.py to the current cuda version. We have tested that the 10.1, 10.2, and 11.0 are all available. Besides, the higher cuda version requires higher gcc and g++ versions. The results show that gcc/g++ 7.3.0/7.3.1 is ok for almost all kinds of cuda and torch versions, while gcc/g++ 4.9 is only available for PyTorch 1.1.0+torchvision 0.3.0. If you directly update the gcc/g++ version from 4.9 to 7.3, an additional error may be thrown as "ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found" In this case, the first step is to check if there remain applicable libstdc++ version by the following command: strings /data/anaconda3/bin/../lib/./libstdc++.so.6 | grep GLIBCXX_3.4.20. If there exists, add this path to the environment variables as follows: export LD_LIBRARY_PATH=/data/anaconda3/lib:$LD_LIBRARY_PATH. After these operations, the cuda acceleration version of the correlation function is available for almost any kind of cuda/torch/torchvision version. And the speed is around 2x~5x faster than the python version correaltion_native.py.
Moreover, the most time-consuming part is the ContextNetwork part, which is very counter-intuitive and hard to find. We do not know whether this problem belongs to the PyTorch version conflict, but we did find a way to fix it. Specifically, if you use pytorch>1.1.0 to run the pwclite network, no matter using the correlation_cuda or correlation_native, the entire optical flow estimation time cost will be around 5x~10x higher than the PyTorch 1.1 version. We analyze the time cost of each part in the pwclite.py and localize the ContextNetwork class. However, this class is extremely simple since it only contains a sequence of conv functions, so we try to modify the conv function in l.12, pwclite.py. If we change the bias parameter to False, the speed will be as fast as the original version, while setting the bias parameter to True could lead to a slower estimation. However, though this function is applied in many classes, such as FeatureExtractor, FlowEstimatorDense, etc., the time costs do not change for all classes except the ContextNetwork classes, which I feel weird. Whatever, if we simply change the bias to False, the speed will be normal. However, we do not know whether this change could lead to any influence on the performance, thus we decide to only change the bias parameter in the ContextNetwork class, which can be implemented by adding a control argument in the function conv. By doing so, we only change the seven convolution layers in the ContextNetwork class, hoping will not impact the performance badly.
After performing these two changes above, the optical flow estimation speed will have a 5x~20x boost. I am aware that simply using the dockerfile and the fixed environment is a simpler way to reproduce, yet we hope our experience will help more researchers to expand this nice work to generalize more environments.
The text was updated successfully, but these errors were encountered:
JustinYuu
changed the title
Speeding up the estimation procedure
Speeding up the estimation procedure for higher cuda/torch version
Aug 19, 2021
Hi,i meet this issue, too.Have you tried to set bias to false?Did it decrease the performance?
The performance will decrease if directly leveraging the pre-trained weight for optical flow estimation, but after training the modified network for a few epochs, the performance of optical flow estimation will become acceptable for downstream applications.
Hi,i meet this issue, too.Have you tried to set bias to false?Did it decrease the performance?
The performance will decrease if directly leveraging the pre-trained weight for optical flow estimation, but after training the modified network for a few epochs, the performance of optical flow estimation will become acceptable for downstream applications.
Hi,
I would like to provide an acceleration strategy that can address the problem of slow optical flow estimation speed in >pytorch1.1 version. Since we want to predict the dense optical flow of a long 15fps or 30fps video, the time consumption could be a big concern with respect to the GPU utils and training speed.
After analyzing the entire training procedure of the PWC-Net, I found that the time bottleneck mainly came from two parts: correlation function and ContextNetwork.
For the correlation function, since the dockerfile is torch 1.1.0 + cuda 9.0, it can not be applied for cuda 10.1 version directly. For this part, a simple strategy is to change the 'cuda path' in l.21 of setup.py to the current cuda version. We have tested that the 10.1, 10.2, and 11.0 are all available. Besides, the higher cuda version requires higher gcc and g++ versions. The results show that gcc/g++ 7.3.0/7.3.1 is ok for almost all kinds of cuda and torch versions, while gcc/g++ 4.9 is only available for PyTorch 1.1.0+torchvision 0.3.0. If you directly update the gcc/g++ version from 4.9 to 7.3, an additional error may be thrown as "ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found" In this case, the first step is to check if there remain applicable libstdc++ version by the following command: strings /data/anaconda3/bin/../lib/./libstdc++.so.6 | grep GLIBCXX_3.4.20. If there exists, add this path to the environment variables as follows: export LD_LIBRARY_PATH=/data/anaconda3/lib:$LD_LIBRARY_PATH. After these operations, the cuda acceleration version of the correlation function is available for almost any kind of cuda/torch/torchvision version. And the speed is around 2x~5x faster than the python version correaltion_native.py.
Moreover, the most time-consuming part is the ContextNetwork part, which is very counter-intuitive and hard to find. We do not know whether this problem belongs to the PyTorch version conflict, but we did find a way to fix it. Specifically, if you use pytorch>1.1.0 to run the pwclite network, no matter using the correlation_cuda or correlation_native, the entire optical flow estimation time cost will be around 5x~10x higher than the PyTorch 1.1 version. We analyze the time cost of each part in the pwclite.py and localize the ContextNetwork class. However, this class is extremely simple since it only contains a sequence of conv functions, so we try to modify the conv function in l.12, pwclite.py. If we change the bias parameter to False, the speed will be as fast as the original version, while setting the bias parameter to True could lead to a slower estimation. However, though this function is applied in many classes, such as FeatureExtractor, FlowEstimatorDense, etc., the time costs do not change for all classes except the ContextNetwork classes, which I feel weird. Whatever, if we simply change the bias to False, the speed will be normal. However, we do not know whether this change could lead to any influence on the performance, thus we decide to only change the bias parameter in the ContextNetwork class, which can be implemented by adding a control argument in the function conv. By doing so, we only change the seven convolution layers in the ContextNetwork class, hoping will not impact the performance badly.
After performing these two changes above, the optical flow estimation speed will have a 5x~20x boost. I am aware that simply using the dockerfile and the fixed environment is a simpler way to reproduce, yet we hope our experience will help more researchers to expand this nice work to generalize more environments.
The text was updated successfully, but these errors were encountered: