Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory: '/proc/30094/stat' #95

Closed
zhudd-hub opened this issue Jan 12, 2021 · 14 comments
Closed

No such file or directory: '/proc/30094/stat' #95

zhudd-hub opened this issue Jan 12, 2021 · 14 comments
Labels
Milestone

Comments

@zhudd-hub
Copy link

zhudd-hub commented Jan 12, 2021

I re-install gpustat by pip,it stall has error when I used gpustat:

root$ gpustat --debug
Error on querying NVIDIA devices. Use --debug flag for details
Traceback (most recent call last):
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/gpustat/__main__.py", line 19, in print_gpustat
    gpu_stats = GPUStatCollection.new_query()
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/gpustat/core.py", line 396, in new_query
    gpu_info = get_gpu_info(handle)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/gpustat/core.py", line 365, in get_gpu_info
    process = get_process_info(nv_process)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/gpustat/core.py", line 294, in get_process_info
    ps_process = psutil.Process(pid=nv_process.pid)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/__init__.py", line 339, in __init__
    self._init(pid)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/__init__.py", line 366, in _init
    self.create_time()
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/__init__.py", line 697, in create_time
    self._create_time = self._proc.create_time()
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_pslinux.py", line 1459, in wrapper
    return fun(self, *args, **kwargs)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_pslinux.py", line 1641, in create_time
    values = self._parse_stat_file()
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_common.py", line 340, in wrapper
    return fun(self)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_pslinux.py", line 1498, in _parse_stat_file
    with open_binary("%s/%s/stat" % (self._procfs_path, self.pid)) as f:
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_pslinux.py", line 205, in open_binary
    return open(fname, "rb", **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/30094/stat'

so , what's wrong with my gpu,please help me

@zhudd-hub
Copy link
Author

@ashwin @wookayin hope your advice

@wookayin
Copy link
Owner

I updated your original post to have stacktraces wrapped with code blocks.

Does it happen only once or constantly? Can you provide the output of ps aux | grep 30094 when the error happened? What is the process with PID 30094?

@zhudd-hub
Copy link
Author

zhudd-hub commented Jan 12, 2021

it happen constantly,and i run ps aux | grep 30094 I got output as following:

$ ps aux | grep 30094
zhudd    30564  0.0  0.0  15980   932 pts/58   S+   22:33   0:00 grep --color=auto 30094

it seems nothing

@wookayin
Copy link
Owner

wookayin commented Jan 12, 2021

So there is no such process with pid 30094 for you. I think NVML is still reporting the process is utilizing the GPU (which is not valid any more). Can you confirm it by showing me an output of nvidia-smi or nvidia-smi --query-compute-apps=gpu_name,pid,name,used_memory --format=csv ?

@zhudd-hub
Copy link
Author

zhudd-hub commented Jan 12, 2021

thank you for your reply. you are right,when I use nvidia-smi, I got such output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:04:00.0 Off |                  N/A |
| 60%   76C    P2   193W / 250W |   4335MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:05:00.0 Off |                  N/A |
| 56%   71C    P2   194W / 250W |   4980MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:08:00.0 Off |                  N/A |
| 54%   70C    P2   192W / 250W |   4335MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:09:00.0 Off |                  N/A |
| 63%   80C    P2   185W / 250W |   4335MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 108...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 46%   61C    P2   180W / 250W |   4335MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  GeForce GTX 108...  Off  | 00000000:82:00.0 Off |                  N/A |
| 58%   74C    P2   185W / 250W |   4335MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  GeForce GTX 108...  Off  | 00000000:85:00.0 Off |                  N/A |
| 52%   68C    P2   186W / 250W |   4335MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  GeForce GTX 108...  Off  | 00000000:86:00.0 Off |                  N/A |
| 53%   69C    P2   180W / 250W |   4335MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   8  GeForce GTX 108...  Off  | 00000000:89:00.0 Off |                  N/A |
| 51%   65C    P2   181W / 250W |   4980MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   9  GeForce GTX 108...  Off  | 00000000:8A:00.0 Off |                  N/A |
| 52%   67C    P2   187W / 250W |   4335MiB / 11178MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     39861      C   python                              0MiB |
|    1   N/A  N/A     39861      C   python                            645MiB |
|    2   N/A  N/A     39861      C   python                              0MiB |
|    3   N/A  N/A     39861      C   python                              0MiB |
|    4   N/A  N/A     39861      C   python                              0MiB |
|    5   N/A  N/A     39861      C   python                              0MiB |
|    6   N/A  N/A     39861      C   python                              0MiB |
|    7   N/A  N/A     39861      C   python                              0MiB |
|    8   N/A  N/A     39861      C   python                            645MiB |
|    9   N/A  N/A     39861      C   python                              0MiB |
+-----------------------------------------------------------------------------+

I cant find the process using almost 50% of each GPU,I cant kill it ,I have no authority to sudo ,does gpu had taken up by other utils such as container of docker

@wookayin
Copy link
Owner

Thanks. So it is a bug similar to #12, but we were already catching psutil.NoSuchProcess. For some reason psutil throws a wrong exception (it was supposed to throw NoSuchProcess).

I think this is a bug of psutil: see giampaolo/psutil#1447 (fixed in 5.6.0). Can you confirm the version of psutil you are using?

$ python
>>> import psutil
>>> psutil.__version__
5.7.0

@zhudd-hub
Copy link
Author

zhudd-hub commented Jan 13, 2021

thanks for your advice,I confirm that my psutil's version is 5.4.7, I uninstall it and install the newest psuitl(5.8.0), whats more,I also re-install gpustat by pip just now,I am so sorry to show you it still have the same error when I use gpustat:

Error on querying NVIDIA devices. Use --debug flag for details
Traceback (most recent call last):
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_common.py", line 447, in wrapper
    ret = self._cache[fun]
AttributeError: _cache

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/gpustat/__main__.py", line 19, in print_gpustat
    gpu_stats = GPUStatCollection.new_query()
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/gpustat/core.py", line 396, in new_query
    gpu_info = get_gpu_info(handle)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/gpustat/core.py", line 365, in get_gpu_info
    process = get_process_info(nv_process)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/gpustat/core.py", line 294, in get_process_info
    ps_process = psutil.Process(pid=nv_process.pid)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/__init__.py", line 326, in __init__
    self._init(pid)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/__init__.py", line 354, in _init
    self.create_time()
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/__init__.py", line 710, in create_time
    self._create_time = self._proc.create_time()
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_pslinux.py", line 1576, in wrapper
    return fun(self, *args, **kwargs)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_pslinux.py", line 1788, in create_time
    ctime = float(self._parse_stat_file()['create_time'])
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_pslinux.py", line 1576, in wrapper
    return fun(self, *args, **kwargs)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_common.py", line 450, in wrapper
    return fun(self)
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_pslinux.py", line 1618, in _parse_stat_file
    with open_binary("%s/%s/stat" % (self._procfs_path, self.pid)) as f:
  File "/home/zhudd/anaconda3/lib/python3.7/site-packages/psutil/_common.py", line 711, in open_binary
    return open(fname, "rb", **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/30094/stat'


@wookayin
Copy link
Owner

wookayin commented Jan 13, 2021

Hmm.. the bug from psutil still presents. I will add a workaround to ignore untranslated exceptions thrown from psutil.

@wookayin wookayin added the bug label Jan 13, 2021
@zhudd-hub
Copy link
Author

A great job, I will continuely follw your work...

wookayin added a commit that referenced this issue Jan 13, 2021
It appears that there is a bug of psutil where FileNotFoundError
exceptions are not properly translated to NoSuchProcess when the
requested process actually does not exist. As a workaround, we ignore
FileNotFoundError exceptions thrown while querying process information
as well as NoSuchProcess.

We also require psutil to have a minimum version of 5.6.0+ to ensure
that the bug (hopefully and probablyj) does not happen.
@wookayin
Copy link
Owner

I pushed a fix for this. Please try the master version: pip install -I git+https://github.com/wookayin/gpustat.git@master and let me know if it works for you.

@zhudd-hub
Copy link
Author

zhudd-hub commented Jan 13, 2021

yes, no more errors when used gpustat,however,new issue is coming:

fishtank                Wed Jan 13 12:10:38 2021  455.32.00
[0] GeForce GTX 1080 Ti | 75°C, 100 % |  4335 / 11178 MB | jfx(?M)
[1] GeForce GTX 1080 Ti | 71°C, 100 % |  4980 / 11178 MB | jfx(645M)
[2] GeForce GTX 1080 Ti | 70°C, 100 % |  4335 / 11178 MB | jfx(?M)
[3] GeForce GTX 1080 Ti | 80°C, 100 % |  4335 / 11178 MB | jfx(?M)
[4] GeForce GTX 1080 Ti | 61°C, 100 % |  4335 / 11178 MB | jfx(?M)
[5] GeForce GTX 1080 Ti | 73°C, 100 % |  4335 / 11178 MB | jfx(?M)
[6] GeForce GTX 1080 Ti | 67°C, 100 % |  4335 / 11178 MB | jfx(?M)
[7] GeForce GTX 1080 Ti | 69°C, 100 % |  4335 / 11178 MB | jfx(?M)
[8] GeForce GTX 1080 Ti | 66°C, 100 % |  4980 / 11178 MB | jfx(645M)
[9] GeForce GTX 1080 Ti | 67°C, 100 % |  4335 / 11178 MB | jfx(?M)

there are ?M in the result
what's more,I used gpustat option as you recommended in README.md

 gpustat -cp
fishtank                Wed Jan 13 12:27:00 2021  455.32.00
[0] GeForce GTX 1080 Ti | 80°C, 100 % |  4335 / 11178 MB | python/39861(?M)
[1] GeForce GTX 1080 Ti | 75°C, 100 % |  4980 / 11178 MB | python/39861(645M)
[2] GeForce GTX 1080 Ti | 74°C, 100 % |  4335 / 11178 MB | python/39861(?M)
[3] GeForce GTX 1080 Ti | 83°C, 100 % |  4335 / 11178 MB | python/39861(?M)
[4] GeForce GTX 1080 Ti | 64°C, 100 % |  4335 / 11178 MB | python/39861(?M)
[5] GeForce GTX 1080 Ti | 78°C, 100 % |  4335 / 11178 MB | python/39861(?M)
[6] GeForce GTX 1080 Ti | 72°C, 100 % |  4335 / 11178 MB | python/39861(?M)
[7] GeForce GTX 1080 Ti | 73°C, 100 % |  4335 / 11178 MB | python/39861(?M)
[8] GeForce GTX 1080 Ti | 70°C, 100 % |  4980 / 11178 MB | python/39861(645M)
[9] GeForce GTX 1080 Ti | 72°C, 100 % |  4335 / 11178 MB | python/39861(?M)

does it means than 39861 process using almost half of each GPU resource? why it showed that (?M)

@wookayin
Copy link
Owner

wookayin commented Jan 13, 2021

@zhudd-hub Yours is not a normal situation -- as you can see in your nvidia-smi output, the memory usage cannot be reported (0M). So the NVML library reports this as erroneous or unknown. It often happens that GPU memory is not freed properly and does not show up in the process list (rebooting or other fixes required).

@wookayin wookayin added this to the 1.0 milestone Jan 13, 2021
@wookayin
Copy link
Owner

wookayin commented Jan 14, 2021

Closing as fixed (b52fe9a).

@zhanjiahui

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants