-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Repeated inference with dynamic shape leads to out of memory error #8233
Comments
Is this specific to MaskRCNN? What happens if the target is CPU? |
That's a good point, I didn't think to check memory on CPU targets. Using llvm target, I also see memory usage increase with each inference. After about 300 inferences, the python process consumes ~25% of my 128GB physical RAM. I noticed that the rate of increase seems to slow down but varies a lot depending on the input. I've also seen this happen with FasterRCNN. |
Hi @dvhg @masahi On my T4 gpu with 16gb GPU memory and using pooled allocatior, I run out of memory on the 31st iteration. It looks like maybe pooled allocator is allocating too much memory or doing something weird?
cc @zhiics |
I've just hit this problem when evaluating PT MaskRCNN on coco dataset. I want to take a look at this issue. |
I'm trying to run PyTorch MaskRCNN on GPU and have been running into GPU memory issues. I get errors when running repeated inferences using different inputs. There's some variety in the error messages but this is the most common:
When looking at GPU memory usage (using
nvidia-smi
), I see memory usage increases over time until the test crashes once it nears the maximum. I'm running this on Ubuntu 18.04 and a T4 GPU with 16GB of GPU memory.Following the form of the unit test from
test_tensorrt.py
, the following script should reproduce the problem I'm seeing (using the COCO dataset). It differs from the unit test in 2 ways:@masahi, I heard you've been working on PyTorch MaskRCNN. Have you seen this issue in your testing, or is there a problem in my script? Thank you!
The text was updated successfully, but these errors were encountered: