Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia-container-cli: initialization error on Ubuntu22.04LTS #250

Open
5 tasks done
Lonitch opened this issue Jul 11, 2022 · 14 comments
Open
5 tasks done

nvidia-container-cli: initialization error on Ubuntu22.04LTS #250

Lonitch opened this issue Jul 11, 2022 · 14 comments

Comments

@Lonitch
Copy link

Lonitch commented Jul 11, 2022

Hi there,

I recently wanted to build containers that can run GUI applications. My Dockerfile and docker-compose.yml work well in WSL2, but I ran into problems when building the same container in Ubuntu 22.04LTS. My Dockerfile looks like the following:

FROM osrf/ros:melodic-desktop-full

SHELL ["/bin/bash", "-c"]

# Minimal setup
RUN echo "source /opt/ros/melodic/setup.bash" >> ~/.bashrc
RUN source ~/.bashrc
# Extra pkg installation after this!

And docker-compose.yml looks like

services:
  melodic:
    build: .
    image: melodic
    command: roslaunch gazebo_ros empty_world.launch &&
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]
    environment:
      - DISPLAY=${DISPLAY}
      - NVIDIA_DRIVER_CAPABILITIES=all
      - NVIDIA_VISIBLE_DEVICES=all
      - QT_X11_NO_MITSHM=1
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix
      - ${PWD}/.Xauthority:/root/.Xauthority:rw
    network_mode: "host"

When I run docker compose up, the following error pops up:

Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

--------Steps I've taken so far--------
My theory was something wrong with the Nvidia runtime, so I added

runtime: nvidia

before command in the docker-compose.yml above, and when docker compose up again, I have the following error

Error response from daemon: Unknown runtime specified nvidia

Next, I followed the steps listed here to add the runtime using

sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime

which shows the following outputs:

INFO[2022-07-11T10:30:18.583217896-05:00] Starting up                                  
INFO[2022-07-11T10:30:18.584301607-05:00] detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf 
INFO[2022-07-11T10:30:18.585538257-05:00] parsed scheme: "unix"                         module=grpc
INFO[2022-07-11T10:30:18.585571148-05:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2022-07-11T10:30:18.585618515-05:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}  module=grpc
INFO[2022-07-11T10:30:18.585635177-05:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2022-07-11T10:30:18.586921837-05:00] parsed scheme: "unix"                         module=grpc
INFO[2022-07-11T10:30:18.586945960-05:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2022-07-11T10:30:18.586973034-05:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}  module=grpc
INFO[2022-07-11T10:30:18.586984481-05:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2022-07-11T10:30:18.595208030-05:00] [graphdriver] using prior storage driver: overlay2 
failed to start daemon: error while opening volume store metadata database: timeout

I try to add the runtime using systemd drop-in file, but the error persists even after I reboot the machine.

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
I0711 15:35:59.889166 508754 nvc.c:376] initializing library context (version=1.10.0, build=395fd41701117121f1fd04ada01e1d7e006a37ae)
I0711 15:35:59.889215 508754 nvc.c:350] using root /
I0711 15:35:59.889219 508754 nvc.c:351] using ldcache /etc/ld.so.cache
I0711 15:35:59.889222 508754 nvc.c:352] using unprivileged user 1000:1000
I0711 15:35:59.889243 508754 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0711 15:35:59.889443 508754 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W0711 15:35:59.889929 508754 nvc.c:258] failed to detect NVIDIA devices
W0711 15:35:59.890282 508755 nvc.c:273] failed to set inheritable capabilities
W0711 15:35:59.890372 508755 nvc.c:274] skipping kernel modules load due to failure
I0711 15:35:59.890815 508756 rpc.c:71] starting driver rpc service
I0711 15:35:59.902386 508757 rpc.c:71] starting nvcgo rpc service
I0711 15:35:59.906779 508754 nvc_info.c:766] requesting driver information with ''
I0711 15:35:59.908232 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.515.48.07
I0711 15:35:59.908279 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.48.07
I0711 15:35:59.908509 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.48.07
I0711 15:35:59.908752 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.515.48.07
I0711 15:35:59.908945 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.515.48.07
I0711 15:35:59.909177 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.515.48.07
I0711 15:35:59.909421 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.48.07
I0711 15:35:59.909449 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.515.48.07
I0711 15:35:59.909680 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.48.07
I0711 15:35:59.909709 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.48.07
I0711 15:35:59.909736 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.48.07
I0711 15:35:59.909963 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.515.48.07
I0711 15:35:59.910180 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.515.48.07
I0711 15:35:59.910220 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.48.07
I0711 15:35:59.910443 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.515.48.07
I0711 15:35:59.910478 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.515.48.07
I0711 15:35:59.910721 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.515.48.07
I0711 15:35:59.910969 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.515.48.07
I0711 15:35:59.911125 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.515.48.07
I0711 15:35:59.911228 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.515.48.07
I0711 15:35:59.911446 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.515.48.07
I0711 15:35:59.911613 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.515.48.07
I0711 15:35:59.911643 508754 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.515.48.07
I0711 15:35:59.911793 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.515.48.07
I0711 15:35:59.912020 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.515.48.07
I0711 15:35:59.912203 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.515.48.07
I0711 15:35:59.912440 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.515.48.07
I0711 15:35:59.912658 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.515.48.07
I0711 15:35:59.912889 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.515.48.07
I0711 15:35:59.913119 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.515.48.07
I0711 15:35:59.913340 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.515.48.07
I0711 15:35:59.913560 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.515.48.07
I0711 15:35:59.913798 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.515.48.07
I0711 15:35:59.914030 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.515.48.07
I0711 15:35:59.914254 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.515.48.07
I0711 15:35:59.914481 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.515.48.07
I0711 15:35:59.914726 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libcuda.so.515.48.07
I0711 15:35:59.914974 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.515.48.07
I0711 15:35:59.915200 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.515.48.07
I0711 15:35:59.915387 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.515.48.07
I0711 15:35:59.915626 508754 nvc_info.c:173] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.515.48.07
W0711 15:35:59.915641 508754 nvc_info.c:399] missing library libnvidia-nscq.so
W0711 15:35:59.915646 508754 nvc_info.c:399] missing library libcudadebugger.so
W0711 15:35:59.915650 508754 nvc_info.c:399] missing library libnvidia-fatbinaryloader.so
W0711 15:35:59.915655 508754 nvc_info.c:399] missing library libnvidia-pkcs11.so
W0711 15:35:59.915662 508754 nvc_info.c:399] missing library libvdpau_nvidia.so
W0711 15:35:59.915666 508754 nvc_info.c:399] missing library libnvidia-ifr.so
W0711 15:35:59.915671 508754 nvc_info.c:399] missing library libnvidia-cbl.so
W0711 15:35:59.915676 508754 nvc_info.c:403] missing compat32 library libnvidia-cfg.so
W0711 15:35:59.915680 508754 nvc_info.c:403] missing compat32 library libnvidia-nscq.so
W0711 15:35:59.915684 508754 nvc_info.c:403] missing compat32 library libcudadebugger.so
W0711 15:35:59.915690 508754 nvc_info.c:403] missing compat32 library libnvidia-fatbinaryloader.so
W0711 15:35:59.915693 508754 nvc_info.c:403] missing compat32 library libnvidia-allocator.so
W0711 15:35:59.915699 508754 nvc_info.c:403] missing compat32 library libnvidia-pkcs11.so
W0711 15:35:59.915706 508754 nvc_info.c:403] missing compat32 library libnvidia-ngx.so
W0711 15:35:59.915709 508754 nvc_info.c:403] missing compat32 library libvdpau_nvidia.so
W0711 15:35:59.915718 508754 nvc_info.c:403] missing compat32 library libnvidia-ifr.so
W0711 15:35:59.915723 508754 nvc_info.c:403] missing compat32 library libnvidia-rtcore.so
W0711 15:35:59.915727 508754 nvc_info.c:403] missing compat32 library libnvoptix.so
W0711 15:35:59.915733 508754 nvc_info.c:403] missing compat32 library libnvidia-cbl.so
I0711 15:35:59.915936 508754 nvc_info.c:299] selecting /usr/bin/nvidia-smi
I0711 15:35:59.915958 508754 nvc_info.c:299] selecting /usr/bin/nvidia-debugdump
I0711 15:35:59.915970 508754 nvc_info.c:299] selecting /usr/bin/nvidia-persistenced
I0711 15:35:59.916000 508754 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-control
I0711 15:35:59.916017 508754 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-server
W0711 15:35:59.916079 508754 nvc_info.c:425] missing binary nv-fabricmanager
I0711 15:35:59.916534 508754 nvc_info.c:343] listing firmware path /usr/lib/firmware/nvidia/515.48.07/gsp.bin
I0711 15:35:59.916559 508754 nvc_info.c:529] listing device /dev/nvidiactl
I0711 15:35:59.916564 508754 nvc_info.c:529] listing device /dev/nvidia-uvm
I0711 15:35:59.916568 508754 nvc_info.c:529] listing device /dev/nvidia-uvm-tools
I0711 15:35:59.916571 508754 nvc_info.c:529] listing device /dev/nvidia-modeset
I0711 15:35:59.916597 508754 nvc_info.c:343] listing ipc path /run/nvidia-persistenced/socket
W0711 15:35:59.916620 508754 nvc_info.c:349] missing ipc path /var/run/nvidia-fabricmanager/socket
W0711 15:35:59.916638 508754 nvc_info.c:349] missing ipc path /tmp/nvidia-mps
I0711 15:35:59.916642 508754 nvc_info.c:822] requesting device information with ''
I0711 15:35:59.922969 508754 nvc_info.c:713] listing device /dev/nvidia0 (GPU-fae27ba8-419c-98fd-a0bf-2727d9f9b612 at 00000000:17:00.0)
I0711 15:35:59.928540 508754 nvc_info.c:713] listing device /dev/nvidia1 (GPU-96aa5232-2bc1-4326-b17e-a4b633788cc0 at 00000000:73:00.0)
NVRM version:   515.48.07
CUDA version:   11.7

Device Index:   0
Device Minor:   0
Model:          NVIDIA RTX A6000
Brand:          NvidiaRTX
GPU UUID:       GPU-fae27ba8-419c-98fd-a0bf-2727d9f9b612
Bus Location:   00000000:17:00.0
Architecture:   8.6

Device Index:   1
Device Minor:   1
Model:          NVIDIA RTX A6000
Brand:          NvidiaRTX
GPU UUID:       GPU-96aa5232-2bc1-4326-b17e-a4b633788cc0
Bus Location:   00000000:73:00.0
Architecture:   8.6
I0711 15:35:59.928579 508754 nvc.c:434] shutting down library context
I0711 15:35:59.928619 508757 rpc.c:95] terminating nvcgo rpc service
I0711 15:35:59.929120 508754 rpc.c:135] nvcgo rpc service terminated successfully
I0711 15:35:59.932333 508756 rpc.c:95] terminating driver rpc service
I0711 15:35:59.932434 508754 rpc.c:135] driver rpc service terminated successfully
  • Driver information from nvidia-smi -a
==============NVSMI LOG==============

Timestamp                                 : Mon Jul 11 10:38:04 2022
Driver Version                            : 515.48.07
CUDA Version                              : 11.7

Attached GPUs                             : 2
GPU 00000000:17:00.0
    Product Name                          : NVIDIA RTX A6000
    Product Brand                         : NVIDIA RTX
    Product Architecture                  : Ampere
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Disabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1320922039617
    GPU UUID                              : GPU-fae27ba8-419c-98fd-a0bf-2727d9f9b612
    Minor Number                          : 0
    VBIOS Version                         : 94.02.5C.00.07
    MultiGPU Board                        : No
    Board ID                              : 0x1700
    GPU Part Number                       : 900-5G133-0100-001
    Module ID                             : 0
    Inforom Version
        Image Version                     : G133.0500.00.05
        OEM Object                        : 2.0
        ECC Object                        : 6.16
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x17
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x223010DE
        Bus Id                            : 00000000:17:00.0
        Sub System Id                     : 0x14591028
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 2
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 812000 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : 30 %
    Performance State                     : P5
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 49140 MiB
        Reserved                          : 454 MiB
        Used                              : 504 MiB
        Free                              : 48180 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 6 MiB
        Free                              : 250 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : Disabled
        Pending                           : Disabled
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 192 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 39 C
        GPU Shutdown Temp                 : 98 C
        GPU Slowdown Temp                 : 95 C
        GPU Max Operating Temp            : 93 C
        GPU Target Temperature            : 84 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 22.24 W
        Power Limit                       : 300.00 W
        Default Power Limit               : 300.00 W
        Enforced Power Limit              : 300.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 300.00 W
    Clocks
        Graphics                          : 450 MHz
        SM                                : 450 MHz
        Memory                            : 810 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Default Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 8001 MHz
        Video                             : 1950 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 750.000 mV
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2243
            Type                          : G
            Name                          : /usr/lib/xorg/Xorg
            Used GPU Memory               : 198 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2590
            Type                          : G
            Name                          : /usr/libexec/gnome-remote-desktop-daemon
            Used GPU Memory               : 4 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2713
            Type                          : G
            Name                          : /usr/bin/gnome-shell
            Used GPU Memory               : 78 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 3094990
            Type                          : G
            Name                          : /opt/docker-desktop/Docker Desktop --type=gpu-process --enable-crashpad --enable-crash-reporter=bb2e72bd-deee-4039-8f1a-387044ef5ff0,no_channel --user-data-dir=/home/fit/.config/Docker Desktop --gpu-preferences=UAAAAAAAAAAgAAAIAAAAAAAAAAAAAAAAAABgAAAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAABgAAAAAAAAAGAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAACAAAAAAAAAA= --shared-files --field-trial-handle=0,9431087310652734068,13867430110768028059,131072 --disable-features=PlzServiceWorker,SpareRendererForSitePerProcess
            Used GPU Memory               : 21 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 3156761
            Type                          : G
            Name                          : /usr/share/code/code --type=gpu-process --disable-color-correct-rendering --enable-crashpad --crashpad-handler-pid=3156633 --enable-crash-reporter=3cb89e58-246f-4914-bfb9-c5be6ca52941,no_channel --user-data-dir=/home/fit/.config/Code --gpu-preferences=WAAAAAAAAAAgAAAIAAAAAAAAAAAAAAAAAABgAAAAAAA4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAABAAAAAAAAAAgAAAAAAAAACAAAAAAAAAAIAAAAAAAAAA== --shared-files --field-trial-handle=0,i,15765629358387694078,15697432999206173879,131072 --disable-features=SpareRendererForSitePerProcess
            Used GPU Memory               : 31 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 3350821
            Type                          : G
            Name                          : /snap/firefox/1551/usr/lib/firefox/firefox
            Used GPU Memory               : 166 MiB

GPU 00000000:73:00.0
    Product Name                          : NVIDIA RTX A6000
    Product Brand                         : NVIDIA RTX
    Product Architecture                  : Ampere
    Display Mode                          : Enabled
    Display Active                        : Enabled
    Persistence Mode                      : Disabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1320922039629
    GPU UUID                              : GPU-96aa5232-2bc1-4326-b17e-a4b633788cc0
    Minor Number                          : 1
    VBIOS Version                         : 94.02.5C.00.07
    MultiGPU Board                        : No
    Board ID                              : 0x7300
    GPU Part Number                       : 900-5G133-0100-001
    Module ID                             : 0
    Inforom Version
        Image Version                     : G133.0500.00.05
        OEM Object                        : 2.0
        ECC Object                        : 6.16
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x73
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x223010DE
        Bus Id                            : 00000000:73:00.0
        Sub System Id                     : 0x14591028
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 1576000 KB/s
    Fan Speed                             : 30 %
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 49140 MiB
        Reserved                          : 457 MiB
        Used                              : 106 MiB
        Free                              : 48576 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 5 MiB
        Free                              : 251 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 10 %
        Memory                            : 13 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : Disabled
        Pending                           : Disabled
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 192 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 40 C
        GPU Shutdown Temp                 : 98 C
        GPU Slowdown Temp                 : 95 C
        GPU Max Operating Temp            : 93 C
        GPU Target Temperature            : 84 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 25.95 W
        Power Limit                       : 300.00 W
        Default Power Limit               : 300.00 W
        Enforced Power Limit              : 300.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 300.00 W
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Default Applications Clocks
        Graphics                          : 1800 MHz
        Memory                            : 8001 MHz
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 8001 MHz
        Video                             : 1950 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 737.500 mV
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2243
            Type                          : G
            Name                          : /usr/lib/xorg/Xorg
            Used GPU Memory               : 105 MiB
  • Docker version from docker version
Client: Docker Engine - Community
 Cloud integration: v1.0.24
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:02:46 2022
 OS/Arch:           linux/amd64
 Context:           desktop-linux
 Experimental:      true

Server: Docker Desktop 4.10.1 (82475)
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:23 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.2
  GitCommit:        v1.1.2-0-ga916309
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
||/ Name                                       Version                    Architecture Description
+++-==========================================-==========================-============-====================================>
un  libgldispatch0-nvidia                      <none>                     <none>       (no description available)
ii  libnvidia-cfg1-515:amd64                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA binary OpenGL/GLX configurati>
un  libnvidia-cfg1-any                         <none>                     <none>       (no description available)
un  libnvidia-common                           <none>                     <none>       (no description available)
ii  libnvidia-common-515                       515.48.07-0ubuntu0.22.04.2 all          Shared files used by the NVIDIA libr>
un  libnvidia-compute                          <none>                     <none>       (no description available)
ii  libnvidia-compute-515:amd64                515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA libcompute package
ii  libnvidia-compute-515:i386                 515.48.07-0ubuntu0.22.04.2 i386         NVIDIA libcompute package
ii  libnvidia-container-tools                  1.10.0-1                   amd64        NVIDIA container runtime library (co>
ii  libnvidia-container1:amd64                 1.10.0-1                   amd64        NVIDIA container runtime library
un  libnvidia-decode                           <none>                     <none>       (no description available)
ii  libnvidia-decode-515:amd64                 515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA Video Decoding runtime librar>
ii  libnvidia-decode-515:i386                  515.48.07-0ubuntu0.22.04.2 i386         NVIDIA Video Decoding runtime librar>
ii  libnvidia-egl-wayland1:amd64               1:1.1.9-1.1                amd64        Wayland EGL External Platform librar>
un  libnvidia-encode                           <none>                     <none>       (no description available)
ii  libnvidia-encode-515:amd64                 515.48.07-0ubuntu0.22.04.2 amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-515:i386                  515.48.07-0ubuntu0.22.04.2 i386         NVENC Video Encoding runtime library
un  libnvidia-encode1                          <none>                     <none>       (no description available)
un  libnvidia-extra                            <none>                     <none>       (no description available)
ii  libnvidia-extra-515:amd64                  515.48.07-0ubuntu0.22.04.2 amd64        Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                             <none>                     <none>       (no description available)
ii  libnvidia-fbc1-515:amd64                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA OpenGL-based Framebuffer Capt>
ii  libnvidia-fbc1-515:i386                    515.48.07-0ubuntu0.22.04.2 i386         NVIDIA OpenGL-based Framebuffer Capt>
un  libnvidia-gl                               <none>                     <none>       (no description available)
un  libnvidia-gl-390                           <none>                     <none>       (no description available)
un  libnvidia-gl-410                           <none>                     <none>       (no description available)
ii  libnvidia-gl-515:amd64                     515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND lib>
ii  libnvidia-gl-515:i386                      515.48.07-0ubuntu0.22.04.2 i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND lib>
un  libnvidia-legacy-390xx-egl-wayland1        <none>                     <none>       (no description available)
un  libnvidia-ml1                              <none>                     <none>       (no description available)
ii  linux-modules-nvidia-515-5.15.0-40-generic 5.15.0-40.43+1             amd64        Linux kernel nvidia modules for vers>
ii  linux-modules-nvidia-515-generic-hwe-22.04 5.15.0-40.43+1             amd64        Extra drivers for nvidia-515 for the>
ii  linux-objects-nvidia-515-5.15.0-40-generic 5.15.0-40.43+1             amd64        Linux kernel nvidia modules for vers>
ii  linux-signatures-nvidia-5.15.0-40-generic  5.15.0-40.43+1             amd64        Linux kernel signatures for nvidia m>
un  nvidia-384                                 <none>                     <none>       (no description available)
un  nvidia-390                                 <none>                     <none>       (no description available)
un  nvidia-common                              <none>                     <none>       (no description available)
un  nvidia-compute-utils                       <none>                     <none>       (no description available)
ii  nvidia-compute-utils-515                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA compute utilities
un  nvidia-container-runtime                   <none>                     <none>       (no description available)
un  nvidia-container-runtime-hook              <none>                     <none>       (no description available)
ii  nvidia-container-toolkit                   1.10.0-1                   amd64        NVIDIA container runtime hook
un  nvidia-dkms-515                            <none>                     <none>       (no description available)
un  nvidia-docker                              <none>                     <none>       (no description available)
ii  nvidia-docker2                             2.11.0-1                   all          nvidia-docker CLI wrapper
ii  nvidia-driver-515                          515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA driver metapackage
un  nvidia-driver-binary                       <none>                     <none>       (no description available)
un  nvidia-egl-wayland-common                  <none>                     <none>       (no description available)
un  nvidia-kernel-common                       <none>                     <none>       (no description available)
ii  nvidia-kernel-common-515                   515.48.07-0ubuntu0.22.04.2 amd64        Shared files used with the kernel mo>
un  nvidia-kernel-source                       <none>                     <none>       (no description available)
ii  nvidia-kernel-source-515                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA kernel source package
un  nvidia-libopencl1-dev                      <none>                     <none>       (no description available)
un  nvidia-opencl-icd                          <none>                     <none>       (no description available)
un  nvidia-persistenced                        <none>                     <none>       (no description available)
un  nvidia-prebuilt-kernel                     <none>                     <none>       (no description available)
ii  nvidia-prime                               0.8.17.1                   all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                            510.47.03-0ubuntu1         amd64        Tool for configuring the NVIDIA grap>
un  nvidia-settings-binary                     <none>                     <none>       (no description available)
un  nvidia-smi                                 <none>                     <none>       (no description available)
un  nvidia-utils                               <none>                     <none>       (no description available)
ii  nvidia-utils-515                           515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA driver support binaries
un  libgldispatch0-nvidia                      <none>                     <none>       (no description available)
ii  libnvidia-cfg1-515:amd64                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                         <none>                     <none>       (no description available)
un  libnvidia-common                           <none>                     <none>       (no description available)
ii  libnvidia-common-515                       515.48.07-0ubuntu0.22.04.2 all          Shared files used by the NVIDIA libraries
un  libnvidia-compute                          <none>                     <none>       (no description available)
ii  libnvidia-compute-515:amd64                515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA libcompute package
ii  libnvidia-compute-515:i386                 515.48.07-0ubuntu0.22.04.2 i386         NVIDIA libcompute package
ii  libnvidia-container-tools                  1.10.0-1                   amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                 1.10.0-1                   amd64        NVIDIA container runtime library
un  libnvidia-decode                           <none>                     <none>       (no description available)
ii  libnvidia-decode-515:amd64                 515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-515:i386                  515.48.07-0ubuntu0.22.04.2 i386         NVIDIA Video Decoding runtime libraries
ii  libnvidia-egl-wayland1:amd64               1:1.1.9-1.1                amd64        Wayland EGL External Platform library -- shared library
un  libnvidia-encode                           <none>                     <none>       (no description available)
ii  libnvidia-encode-515:amd64                 515.48.07-0ubuntu0.22.04.2 amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-515:i386                  515.48.07-0ubuntu0.22.04.2 i386         NVENC Video Encoding runtime library
un  libnvidia-encode1                          <none>                     <none>       (no description available)
un  libnvidia-extra                            <none>                     <none>       (no description available)
ii  libnvidia-extra-515:amd64                  515.48.07-0ubuntu0.22.04.2 amd64        Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                             <none>                     <none>       (no description available)
ii  libnvidia-fbc1-515:amd64                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-515:i386                    515.48.07-0ubuntu0.22.04.2 i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                               <none>                     <none>       (no description available)
un  libnvidia-gl-390                           <none>                     <none>       (no description available)
un  libnvidia-gl-410                           <none>                     <none>       (no description available)
ii  libnvidia-gl-515:amd64                     515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-515:i386                      515.48.07-0ubuntu0.22.04.2 i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-legacy-390xx-egl-wayland1        <none>                     <none>       (no description available)
un  libnvidia-ml1                              <none>                     <none>       (no description available)
ii  linux-modules-nvidia-515-5.15.0-40-generic 5.15.0-40.43+1             amd64        Linux kernel nvidia modules for version 5.15.0-40
ii  linux-modules-nvidia-515-generic-hwe-22.04 5.15.0-40.43+1             amd64        Extra drivers for nvidia-515 for the generic-hwe-22.04 flavour
ii  linux-objects-nvidia-515-5.15.0-40-generic 5.15.0-40.43+1             amd64        Linux kernel nvidia modules for version 5.15.0-40 (objects)
ii  linux-signatures-nvidia-5.15.0-40-generic  5.15.0-40.43+1             amd64        Linux kernel signatures for nvidia modules for version 5.15.0-4>
un  nvidia-384                                 <none>                     <none>       (no description available)
un  nvidia-390                                 <none>                     <none>       (no description available)
un  nvidia-common                              <none>                     <none>       (no description available)
un  nvidia-compute-utils                       <none>                     <none>       (no description available)
ii  nvidia-compute-utils-515                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA compute utilities
un  nvidia-container-runtime                   <none>                     <none>       (no description available)
un  nvidia-container-runtime-hook              <none>                     <none>       (no description available)
ii  nvidia-container-toolkit                   1.10.0-1                   amd64        NVIDIA container runtime hook
un  nvidia-dkms-515                            <none>                     <none>       (no description available)
un  nvidia-docker                              <none>                     <none>       (no description available)
ii  nvidia-docker2                             2.11.0-1                   all          nvidia-docker CLI wrapper
ii  nvidia-driver-515                          515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA driver metapackage
un  nvidia-driver-binary                       <none>                     <none>       (no description available)
un  nvidia-egl-wayland-common                  <none>                     <none>       (no description available)
un  nvidia-kernel-common                       <none>                     <none>       (no description available)
ii  nvidia-kernel-common-515                   515.48.07-0ubuntu0.22.04.2 amd64        Shared files used with the kernel module
un  nvidia-kernel-source                       <none>                     <none>       (no description available)
ii  nvidia-kernel-source-515                   515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA kernel source package
un  nvidia-libopencl1-dev                      <none>                     <none>       (no description available)
un  nvidia-opencl-icd                          <none>                     <none>       (no description available)
un  nvidia-persistenced                        <none>                     <none>       (no description available)
un  nvidia-prebuilt-kernel                     <none>                     <none>       (no description available)
ii  nvidia-prime                               0.8.17.1                   all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                            510.47.03-0ubuntu1         amd64        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                     <none>                     <none>       (no description available)
un  nvidia-smi                                 <none>                     <none>       (no description available)
un  nvidia-utils                               <none>                     <none>       (no description available)
ii  nvidia-utils-515                           515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA driver support binaries
ii  xserver-xorg-video-nvidia-515              515.48.07-0ubuntu0.22.04.2 amd64        NVIDIA binary Xorg driver
  • NVIDIA container library version from nvidia-container-cli -V
cli-version: 1.10.0
lib-version: 1.10.0
build date: 2022-06-13T10:39+00:00
build revision: 395fd41701117121f1fd04ada01e1d7e006a37ae
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

And comments on these info? Thank you!

@elezar
Copy link
Member

elezar commented Jul 12, 2022

Question: How is docker and docker compose installed? We have seen strange behaviour when these are installed using snaps.

Looking at the error message it seems as if the NVIDIA Container CLI cannot load the NVML library libnvidia-ml.so. This could occur if docker compose modifies the library search paths or ldcache in some way.

@Lonitch
Copy link
Author

Lonitch commented Jul 12, 2022

Question: How is docker and docker compose installed? We have seen strange behaviour when these are installed using snaps.

Looking at the error message it seems as if the NVIDIA Container CLI cannot load the NVML library libnvidia-ml.so. This could occur if docker compose modifies the library search paths or ldcache in some way.

Thanks for your reply. I installed docker engine by following the steps from docker docs. After that, I install the docker desktop by following the instructions here.

I didn't use snaps.

@elezar
Copy link
Member

elezar commented Jul 14, 2022

Sorry @Lonitch, I thought I had asked, but does running the container without docker compose work as expected:

docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all ubuntu:22.04 nvidia-smi

@Lonitch
Copy link
Author

Lonitch commented Jul 14, 2022

@elezar Thanks for your reply. It does not work unfortunately. Still gives docker: Error response from daemon: Unknown runtime specified nvidia.

@Lonitch
Copy link
Author

Lonitch commented Jul 16, 2022

Just reinstalled Ubuntu20.04 LTS on the machine, and the same error still spins out. I wonder if it has something to do with my dual graphic card(2 Nvidia RTX).

@diegoavillegasg
Copy link

@Lonitch have you tried installing drivers by issuing
Step 1
sudo ubuntu-drivers autoinstall
Step 2
ubuntu-drivers devices
Step 3
Install the recommended option based on the previous terminal output
for example:
sudo apt install nvidia-driver-515
Step 4
sudo reboot

This worked for me.

@elezar
Copy link
Member

elezar commented Sep 14, 2022

@Lonitch since docker complains with:

docker: Error response from daemon: Unknown runtime specified nvidia.

What are the contents of your /etc/docker/daemon.json file or how would you instruct docker-compose to use the NVIDIA runtime?

Note that this configuration is irrespective of the driver or the GPUs that you have installed.

@ThatCooperLewis
Copy link

ThatCooperLewis commented Dec 17, 2022

@elezar Maybe I can help give context here, I'm seeing the same issue here, both on Ubuntu and Arch. The cause is rooted somewhere between Docker Desktop and the Nvidia runtime.

On fresh installs of Arch and Ubuntu (20.04 and 22.04 LTS), following the documentation for Docker/Docker Desktop/CUDA/NCT consistently results in the error originally posted here, but only when using the desktop-linux context of Docker. (Using the default context with sudo, or the rootless context both work fine).

Steps to reproduce:

  1. Install the prerequisites for Docker on Linux
  2. Install Docker Desktop
  3. Nvidia CUDA Pre-Installation as well as CUDA installation as mentioned in that doc
  4. CUDA Post-Installation
  5. NCT Install
  6. Ensuring the runtime was installed and configured. I'm guessing @Lonitch didn't add the runtime, as I no longer got Unknown runtime specified after this:
    sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
  7. Fixing the runtime config.toml as mentioned here

At this point, there are two docker contexts installed.

$ docker context ls -q
default
desktop-linux

Any GPU-related image only succeeds if you docker context use default, but using desktop-linux fails. Here is my daemon.json:

$ cat /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

I'm at my wit's end with this problem. I have probably spent 40 hours the past week trying to solve this specific issue, including countless full re-installations of multiple distros. If you'd like some more info, I can provide it. I'm considering opening a new ticket since I have exact STR and it clearly extends beyond the scope of the original post.

@eugene-yao-zocdoc
Copy link

eugene-yao-zocdoc commented Apr 11, 2023

Also having this issue. Can confirm that it seems to be an issue with docker desktop and not the docker ce install

@allisontw
Copy link

allisontw commented Jun 6, 2023

Encountered the same issue for days on Ubuntu 18.04.6 LTS when running a docker container with GPUs, e.g.
docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

and the error shows
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

The gpu is NVIDIA GeForce RTX 3090 with the enabled persistence mode.

Referred to some suggestions from

, but still have no positive results. Was wondering if you have any other insights?
Any hint would be appreciated.

@elezar
Copy link
Member

elezar commented Jun 8, 2023

@allisontw are you also using Docker Desktop? This is not currently supported on Linux.

@allisontw
Copy link

Thanks @elezar for your reply and hint!

In my case, it seems not installed with Docker Desktop but docker engine related packages/configurations somehow broken since I discovered that the versions of docker server and docker client are inconsistent.

Not sure my assumption is correct, but I can work with docker and nvidia gpus correctly when the versions of docker server and docker client are the same in the past, so I removed docker related packages completely, installed them again, and the issue was solved.

Hope this solution could help some others as well.

@raymond-lau-lyf
Copy link

Thanks @elezar for your reply and hint!

In my case, it seems not installed with Docker Desktop but docker engine related packages/configurations somehow broken since I discovered that the versions of docker server and docker client are inconsistent.

Not sure my assumption is correct, but I can work with docker and nvidia gpus correctly when the versions of docker server and docker client are the same in the past, so I removed docker related packages completely, installed them again, and the issue was solved.

Hope this solution could help some others as well.

Thanks. I reinstalled docker compeletly, and it works.

@timpara
Copy link

timpara commented Jun 29, 2023

Thanks @elezar for your reply and hint!
In my case, it seems not installed with Docker Desktop but docker engine related packages/configurations somehow broken since I discovered that the versions of docker server and docker client are inconsistent.
Not sure my assumption is correct, but I can work with docker and nvidia gpus correctly when the versions of docker server and docker client are the same in the past, so I removed docker related packages completely, installed them again, and the issue was solved.
Hope this solution could help some others as well.

Thanks. I reinstalled docker compeletly, and it works.

Could you maybe elaborate a bit about the exact steps you took? Thanks!

@elezar elezar transferred this issue from NVIDIA/nvidia-docker Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants