Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: question about how Multi-GPU is intended to be used #11

Closed
GenevieveBuckley opened this issue Mar 22, 2022 · 4 comments
Closed

Docs: question about how Multi-GPU is intended to be used #11

GenevieveBuckley opened this issue Mar 22, 2022 · 4 comments

Comments

@GenevieveBuckley
Copy link

I have a multi-GPU workstation, but have found that only one GPU is being used when I select both "Use GPU" and "Multi GPU" for 3D inference.

My question is, is there anything specific the user needs to do to run multi-GPU jobs? Other than, load a large 3D image into napari, then start empanada-napari 3D inference with "Use GPU" & "Multi GPU" selected (and possibly also give it a zarr directory)?

Torch is able to access multiple GPUs within my environment:

In [1]: import torch
In [2]: torch.cuda.is_available()
Out [2]: True
In [3]: torch.cuda.device_count()
Out [3]: 2

But the NVIDIA GPU utilization graph shows only one active.

The only information I can find in the docs just says that this is an experimental feature:

Multi GPU: If the workstation is equipped with more than 1 GPU, inference can be distributed across them. This feature is considered experimental and may break.
That's fine. That also makes user error a stronger possibility, given the lack of details in the docs, I might be doing something silly without realizing.

I'm not very familiar with the code in multigpu.py. I could look at that a bit more to try and work it out. I could also double check torch is working well with some less complex multi-GPU examples.

Thanks for any advice you might have here

@GenevieveBuckley
Copy link
Author

Update: have checked torch with multi-GPU is configured and running correctly by running this tutorial notebook locally: https://colab.research.google.com/github/pytorch/tutorials/blob/gh-pages/_downloads/63ecfdf27b96977f3a89015f60065a2b/data_parallel_tutorial.ipynb
No errors, everything looks good.

@conradry
Copy link
Contributor

I think this is the line to blame.

if multigpu and (not hasattr(widget, 'engine') or widget.last_config != model_config):

It won't run multi-GPU inference if 3D inference has been run previously without the option checked. Can you try running with multi-GPU first thing after launching the 3D inference widget? That worked for me.

I've successfully run inference on a system with 4 GPUs before, but performance can be unstable. Using multi-GPU inference typically only makes sense for large volumes because there can be a significant initialization overhead associated with creating the process group.

@GenevieveBuckley
Copy link
Author

Ok, I'll try that next week when I next have time booked on that machine.

Using multi-GPU inference typically only makes sense for large volumes

How large is large, to you?

@conradry
Copy link
Contributor

Multi-GPU is much more stable now. See the note in https://empanada.readthedocs.io/en/latest/plugin/best-practice.html#inference-best-practices for usage advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants