Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run locally on multiple GPUs #29

Open
maximotus opened this issue Jan 9, 2024 · 3 comments
Open

Run locally on multiple GPUs #29

maximotus opened this issue Jan 9, 2024 · 3 comments

Comments

@maximotus
Copy link

Hello,

great work! What are the minimal adaptions I need to apply to the code so I can run the narrator on multiple GPUs locally?
nn.DataParallel is not optimal since I would need to adapt the model classes.

Cheers,
Max

@zhaoyue-zephyrus
Copy link
Contributor

Hi @maximotus

Could you make the question more clear? If you are referring to running "inference", then you don't need to parallelism at all. If you mean running a "training" job, we use torch.nn.parallel.DistributedDataParallel

@maximotus
Copy link
Author

maximotus commented Jan 11, 2024

Hi @zhaoyue-zephyrus,

sure. I was trying to run your demo script.
My goal is to produce captions for short video clips of 1 second.
python demo_narrator.py --video-path "../path/to/my/video"
If I do so without the --cuda flag, it works, but it needs about 70 seconds inference time per clip with nucleus k=10 on my device.
So I wanted to speed up using GPU(s).

But if I pass the cuda flag, so the command is like python demo_narrator.py --cuda --video-path "../path/to/my/video", my GPU with 10GB is not enough and I get a RuntimeError: CUDA error: out of memory.

However, I thought enabling parallelism could solve this since I have 4 GPUs with 10 GB each available.
But I could not manage to make this work with your code easily.

So I wonder now how I can manage to run the inference on more than one GPU so I will not get a RuntimeError: CUDA error: out of memory.

I tried out wrapping your model with torch.nn.DataParallel after line 57. In this case, I can observe that the model weights are being distributed on 2 GPUs, but when it comes to the specific function calls in line 74 and line 75, it fails since models wrapped with torch.nn.DataParallel are then only able to call the defaut forward method (compare https://discuss.pytorch.org/t/dataparallel-model-with-custom-functions/75053/10).

So I was thinking about adapting your code for these needs (so e.g. the custom methods like encode_image and generate are being passed to the default forward method and then there will be a case selection inside forward).

However, I thought it would be good asking you about this issue first since I may have overseen a more trivial solution.

Cheers,
Max

@Anirudh257
Copy link

Hi @maximotus did you figure this out?

You will need to use at least a 20 GB GPU.

If not, I think that the issue is due to model parallelism. Look into https://www.deepspeed.ai/tutorials/pipeline/ for model parallelism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants