Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP installation on Nvidia platform #3521

Open
ABHINAVONGOLU opened this issue Jun 13, 2024 · 20 comments
Open

HIP installation on Nvidia platform #3521

ABHINAVONGOLU opened this issue Jun 13, 2024 · 20 comments

Comments

@ABHINAVONGOLU
Copy link

apt-get install hip-runtime-nvidia hip-dev
When I run this command on my terminal I was getting following error message. Can someone help me fixing this issue ??

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package hip-runtime-nvidia
E: Unable to locate package hip-dev

@logic-finder
Copy link

Hi it seems that those error messages are exactly the same what I saw a few days ago. I think it might be helpful to refer to the this issue #3519. Please see the "additional information" section and the comment by harkgill-amd right below. Thank you.

@ABHINAVONGOLU
Copy link
Author

$ sudo amdgpu-install --usecase=hip,hiplibsdk
Get:1 file:/var/cuda-repo-ubuntu2204-12-5-local InRelease [1,572 B]
Get:1 file:/var/cuda-repo-ubuntu2204-12-5-local InRelease [1,572 B]
Hit:2 https://developer.download.nvidia.com/hpc-sdk/ubuntu/amd64 InRelease
Hit:3 https://dl.google.com/linux/chrome/deb stable InRelease
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:5 https://packages.microsoft.com/repos/code stable InRelease
Hit:6 http://in.archive.ubuntu.com/ubuntu jammy InRelease
Hit:7 http://in.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:8 https://ppa.launchpadcontent.net/touchegg/stable/ubuntu jammy InRelease
Hit:9 http://in.archive.ubuntu.com/ubuntu jammy-backports InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package amdgpu-dkms is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Unable to locate package rocm-hip-runtime
E: Unable to locate package rocm-hip-sdk
E: Package 'amdgpu-dkms' has no installation candidate

still Iam getting this error??

@harkgill-amd
Copy link

Hi @ABHINAVONGOLU, as @logic-finder mentioned, we do currently have an investigation ongoing to fix the installation issues for HIP on NVIDIA platforms. Thanks!

@ABHINAVONGOLU
Copy link
Author

ABHINAVONGOLU commented Jun 14, 2024

Just inform here when fixing is completed please @harkgill-amd

@ABHINAVONGOLU
Copy link
Author

@logic-finder , @harkgill-amd can someone one confirm whether it is working fine or not?? Please

@harkgill-amd
Copy link

Hi @ABHINAVONGOLU, this issue is still being investigated. I will provide updates as soon as I receive them.

@avickars
Copy link

This is still broken with Hip 6.2

@harkgill-amd
Copy link

@avickars, are you attempting to install on Ubuntu 24.04 or on 22.04? What error are you running into?

@avickars
Copy link

@avickars, are you attempting to install on Ubuntu 24.04 or on 22.04? What error are you running into?

I am trying to install it on 24.04 and I am getting the same error as originally reported in this GitHub issue

@harkgill-amd
Copy link

Could you try adding the radeon repo through

wget https://repo.radeon.com/amdgpu-install/6.2/ubuntu/noble/amdgpu-install_6.2.60200-1_all.deb
sudo apt install ./amdgpu-install_6.2.60200-1_all.deb
sudo apt update

and then running apt-get install hip-runtime-nvidia hip-dev

The repo should have the missing packages that are causing the error. However, the documentation needs to be updated to reflect this, and I will begin the process of getting it updated.

@avickars
Copy link

apt-get install hip-runtime-nvidia

Thanks but it then doesn't work correctly. Below is the output of my hipconfig:

HIP version: 6.2.41133-dd7f95766

==hipconfig
HIP_PATH           :/opt/rocm-6.2.0
ROCM_PATH          :/opt/rocm-6.2.0
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-6.2.0/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm-6.2.0/lib/llvm/bin
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.2.0 24292 26466ce804ac523b398608f17388eb6d605a3f09)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.2.0/lib/llvm/bin
Configuration file: /opt/rocm-6.2.0/lib/llvm/bin/clang++.cfg
AMD LLVM version 18.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver4

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
 -O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link

== Environment Variables
PATH =/opt/rocm/bin:/usr/local/cuda/bin:/home/aidan/miniconda3/bin:/home/aidan/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
LD_LIBRARY_PATH=/opt/rocm/lib:

== Linux Kernel
Hostname      :
the-dark-knight
Linux the-dark-knight 6.8.0-40-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul  5 10:34:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble

It thinks I have an AMD gpu. Conversely when I compiled HIP 6.1 from source, I would get:

HIP version  : 6.1.40093-bd86f1708

== hipconfig
HIP_PATH     : /opt/rocm-nvidia-6.1
ROCM_PATH    : /opt/rocm
HIP_COMPILER : nvcc
HIP_PLATFORM : amd
HIP_RUNTIME  : cuda
Use of uninitialized value $CPP_CONFIG in print at .//hipconfig.pl line 154.
CPP_CONFIG   : 

Unexpected HIP_COMPILER: nvcc

=== Environment Variables
PATH=/opt/rocm/bin:/usr/local/cuda/bin:/home/aidan/miniconda3/bin:/home/aidan/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
LD_LIBRARY_PATH=/opt/rocm/lib:

== Linux Kernel
Hostname     : the-dark-knight
Linux the-dark-knight 6.8.0-40-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul  5 10:34:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04 LTS
Release:	24.04
Codename:	noble

@harkgill-amd
Copy link

Adding the environment variable HIP_PLATFORM='nvidia' should resolve this discrepancy. Please give it a try and let me know.

@eljrte
Copy link

eljrte commented Aug 29, 2024

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
hip-runtime-nvidia : Depends: cuda (>= 7.5) but it is not installable
rocprofiler-register : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
Depends: libstdc++6 (>= 13.1) but 12.3.0-1ubuntu1~22.04 is to be installed
E: Unable to correct problems, you have held broken packages.

I have met the error above.
My environment is ubuntu22.04+RTX4080+nvcc12.1+gcc/gxx11.4
Relllllllly appreciate your tips

@eljrte
Copy link

eljrte commented Aug 29, 2024

Reading package lists... Done Building dependency tree... Done Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:

The following packages have unmet dependencies: hip-runtime-nvidia : Depends: cuda (>= 7.5) but it is not installable rocprofiler-register : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed Depends: libstdc++6 (>= 13.1) but 12.3.0-1ubuntu1~22.04 is to be installed E: Unable to correct problems, you have held broken packages.

I have met the error above. My environment is ubuntu22.04+RTX4080+nvcc12.1+gcc/gxx11.4 Relllllllly appreciate your tips

By the way , I have followed the replies above in this issue

@avickars
Copy link

avickars commented Sep 2, 2024

Adding the environment variable HIP_PLATFORM='nvidia' should resolve this discrepancy. Please give it a try and let me know.

@harkgill-amd I tried it today and it didn't work. It had the same results as before. To be succinct I executed the following:

export HIP_PLATFORM='nvidia'
wget https://repo.radeon.com/amdgpu-install/6.2/ubuntu/noble/amdgpu-install_6.2.60200-1_all.deb
sudo apt install ./amdgpu-install_6.2.60200-1_all.deb
sudo apt update
sudo apt-get install hip-runtime-nvidia hip-dev

Again my hipconfig lookes like:

HIP version: 6.2.41133-dd7f95766

==hipconfig
HIP_PATH           :/opt/rocm-6.2.0
ROCM_PATH          :/opt/rocm-6.2.0
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-6.2.0/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm-6.2.0/lib/llvm/bin
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.2.0 24292 26466ce804ac523b398608f17388eb6d605a3f09)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.2.0/lib/llvm/bin
Configuration file: /opt/rocm-6.2.0/lib/llvm/bin/clang++.cfg
AMD LLVM version 18.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver4

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
 -O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link

== Environment Variables
PATH =/opt/rocm/bin:/usr/local/cuda/bin:/home/aidan/Projects/pipeline/env/bin:/home/aidan/miniconda3/condabin:/home/aidan/.vscode-server/cli/servers/Stable-fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/server/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH=/opt/rocm/lib/:

== Linux Kernel
Hostname      :
the-dark-knight
Linux the-dark-knight 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble

Are you able to reproduce this?

@harkgill-amd
Copy link

@avickars, looks like the environment variable is being overridden by the hip installation. Can you try setting it again after sudo apt-get install hip-runtime-nvidia hip-dev? For reference, here is what I see on my end

  1. Output of hipconfig post-install. (We have identified why the compiler is being set to amd and will be addressing this issue)
==hipconfig
HIP_PATH           :/opt/rocm-6.2.0
ROCM_PATH          :/opt/rocm-6.2.0
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-6.2.0/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm-6.2.0/lib/llvm/bin
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.2.0 24292 26466ce804ac523b398608f17388eb6d605a3f09)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.2.0/lib/llvm/bin
Configuration file: /opt/rocm-6.2.0/lib/llvm/bin/clang++.cfg
AMD LLVM version 18.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: alderlake

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
 -O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link

== Environment Variables
PATH =/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

== Linux Kernel
Hostname      :
rocm
Linux rocm 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble
  1. Set Environment Variable export HIP_PLATFORM='nvidia'
  2. hipconfig is updated to use the nvcc compiler and the HIP_PLATFORM variable is listed under ==Environment Variables
==hipconfig
HIP_PATH           :/opt/rocm-6.2.0
ROCM_PATH          :/opt/rocm-6.2.0
HIP_COMPILER       :nvcc
HIP_PLATFORM       :nvidia
HIP_RUNTIME        :cuda
CPP_CONFIG         : -D__HIP_PLATFORM_NVCC__= -D__HIP_PLATFORM_NVIDIA__= -I/opt/rocm-6.2.0/include -I/usr/local/cuda/include

== nvcc
CUDA_PATH          :/usr/local/cuda
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:10:22_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

== Environment Variables
PATH =/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
HIP_PLATFORM=nvidia

== Linux Kernel
Hostname      :
rocm
Linux rocm 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble

@harkgill-amd
Copy link

@eljrte, the error you are seeing is likely due to the installation of the 24.04 amdgpu-install .deb rather than the 22.04 version. There is also an error regarding a cuda dependency so let's try a clean install with the following

  1. Remove previous amdgpu-install
sudo amdgpu-install --uninstall
sudo apt purge amdgpu-install
sudo apt autoremove
  1. Install 22.04 (jammy) amdgpu-install
wget https://repo.radeon.com/amdgpu-install/6.2/ubuntu/jammy/amdgpu-install_6.2.60200-1_all.deb
sudo apt install ./amdgpu-install_6.2.60200-1_all.deb
sudo apt update
  1. Install Cuda Toolkit for Ubuntu 22.04 following the commands here.

  2. Install hip-runtime-nvidia and hip-dev packages with sudo apt-get install hip-runtime-nvidia hip-dev.

  3. Set HIP_PLATFORM environment variable with export HIP_PLATFORM='nvidia'.

  4. Run /opt/rocm/bin/hipconfig --full and confirm hip has been installed and is set to nvcc compiler.

@eddy16112
Copy link

Is it possible to install hip-runtime-nvidia without installing cuda and cuda driver?

@harkgill-amd
Copy link

@eddy16112, the cuda packages are currently a dependency for hip-runtime-nvidia and are required for installation.

@eddy16112
Copy link

@eddy16112, the cuda packages are currently a dependency for hip-runtime-nvidia and are required for installation.

Most of the nvidia backend of hip are header files, so I do not see why cuda (specially the cuda driver) is the dependency. Installing a cuda driver inside a cuda container could break the container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants