-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use nvidia-ml-py and psutil to replace original command line call #20
Conversation
Use the python binding of NVML is more flexible and maintainable for future expansion. But the Not Supported error is remained to be considered in next commit.
Here comes many refactors in this commit. The processes information is now integrated in the GPUStat class, viewed as a property of GPU state. The GPUCollection now contains a list of GPUStat, rather than a dict indexed by GPU-UUID. But the UUID keyword is still remained for future usage.
This commit is a support for PR #17. - Fixes minor typos and improve code styles. - Correct the usage of colors: wrong colors introduced from d9cc46d are fixed (e.g. bold_white, bold_black); and the case of --no-color is also set properly. - Fix different behavior at python3 (including integer division, bytes to unicode decoding for name and uuid). - `cmdline` names are simplified to their basename, as were in the previous versions (psutil returns full path of executable). - Bring back the `new_query()` method.
All the pynvml/psutil objects interacting with gpustat are replaced by mock; so we can unittest the gpustat behavior even in the environment where nvidia cards are not present. Also fixed minor bugs, e.g. improve 'not supported' handling.
LGTM |
As we have changed the gpustat package to depend on non-builtin packages (psutils, etc.), it is no more possible to import gpustat from setup.py without having those libraries installed. This commit is a part of #20.
Thank you so much @Stonesjtu !! |
try: | ||
temperature = N.nvmlDeviceGetTemperature(handle, N.NVML_TEMPERATURE_GPU) | ||
except N.NVMLError: | ||
memory = None # Not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a typo. Maybe temperature = None
should be in except?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, you are correct. Fixed in 895e1f8. Thanks!
This is a support and follow-up for #17 by @Stonesjtu, in which all
nvidia-smi
andps
command line calls are replaced bynvidia-ml-py
andpsutil
, respectively. On top of #17, I have added unit tests and fixed several bugs (or different behavior) introduced in the PR.Changes
nvidia-smi
call bynvidia-ml-py
andpsutil
.nvidia-smi daemon
); moreover, we can obtain more detailed information on GPUs and processes.