Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document support for using NVIDIA GPUs #138

Merged
merged 5 commits into from
Dec 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 157 additions & 5 deletions docs/gpu.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,161 @@
# GPU support

!!! warning
**The support for GPU software in EESSI is a work-in-progess.**

More information on the actions that must be performed to ensure that GPU software included in EESSI
can use the GPU in your system will be available here soon.
can use the GPU in your system is available below.

[Please open a support issue](support.md) if you need help or have questions regarding GPU support.

!!! tip "Make sure the `${EESSI_VERSION}` version placeholder is defined!"
In this page, we use `${EESSI_VERSION}` as a placeholder for the version of the EESSI repository,
for example:
```{ .bash .copy }
/cvmfs/software.eessi.io/versions/${EESSI_VERSION}
```

Before inspecting paths, or executing any of the specified commands, you should define `$EESSI_VERSION` first,
for example with:
```{ .bash .copy }
export EESSI_VERSION=2023.06
```

## Support for using NVIDIA GPUs {: #nvidia }

EESSI supports running CUDA-enabled software. All CUDA-enabled modules are marked with the `(gpu)` feature,
which is visible in the output produced by `module avail`.

### NVIDIA GPU drivers {: #nvidia_drivers }

For CUDA-enabled software to run, it needs to be able to find the **NVIDIA GPU drivers** of the host system.
The challenge here is that the NVIDIA GPU drivers are not _always_ in a standard system location, and that we
can not install the GPU drivers in EESSI (since they are too closely tied to the client OS and GPU hardware).

### Compiling CUDA software {: #cuda_sdk }

An additional requirement is necessary if you want to be able to compile CUDA-enabled software using a CUDA installation included in EESSI. This requires a *full* CUDA SDK, but the [CUDA SDK End User License Agreement (EULA)](https://docs.nvidia.com/cuda/eula/index.html) does not allow for full redistribution. In EESSI, we are (currently) only allowed to redistribute the files needed to *run* CUDA software.

!!! note "Full CUDA SDK only needed to *compile* CUDA software"
Without a full CUDA SDK on the host system, you will still be able to *run* CUDA-enabled software from the EESSI stack,
you just won't be able to *compile* additional CUDA software.

Below, we describe how to make sure that the EESSI software stack can find your NVIDIA GPU drivers and (optionally) full installations of the CUDA SDK.

### `host_injections` variant symlink {: #host_injections }

In the EESSI repository, a special directory has been prepared where system administrators can install files that can be picked up by
software installations included in EESSI. This gives the ability to administrators to influence the behaviour (and capabilities) of the EESSI software stack.

This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*:
a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see ['Variant Symlinks' in the official CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks)).

!!! info "Default target for `host_injections` variant symlink"

Unless otherwise configured in the CernVM-FS client configuration for the EESSI repository, the `host_injections` symlink points to `/opt/eessi` on the client system:
```
$ ls -l /cvmfs/software.eessi.io/host_injections
lrwxrwxrwx 1 cvmfs cvmfs 10 Oct 3 13:51 /cvmfs/software.eessi.io/host_injections -> /opt/eessi
```

As an example, let's imagine that we want to use a architecture-specific location on a shared filesystem as the target for the symlink. This has the advantage that one can make changes under `host_injections` that affect all nodes which share that CernVM-FS configuration. Configuring this in your CernVM-FS configuration would mean adding the following line in the client configuration file:

```{ .ini .copy }
EESSI_HOST_INJECTIONS=/shared_fs/path
```

!!! note "Don't forget to reload the CernVM-FS configuration"
After making a change to a CernVM-FS configuration file, you also need to reload the configuration:
```{ .bash .copy }
sudo cvmfs_config reload
```

All CUDA-enabled software in EESSI expects the CUDA drivers to be available in a specific subdirectory of this `host_injections` directory.
In addition, installations of the CUDA SDK included EESSI are stripped down to the files that we are allowed to redistribute;
all other files are replaced by symbolic links that point to another specific subdirectory of `host_injections`. For example:
```
$ ls -l /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc
lrwxrwxrwx 1 cvmfs cvmfs 109 Dec 21 14:49 /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc
```

If the corresponding full installation of the CUDA SDK is available there, the CUDA installation included in EESSI can be used to build CUDA software.


### Using NVIDIA GPUs via a native EESSI installation {: #nvidia_eessi_native }

Here, we describe the steps to enable GPU support when you have a [native EESSI installation](getting_access/native_installation.md) on your system.

!!! warning "Required permissions"
To enable GPU support for EESSI on your system, you will typically need to have system administration rights, since you need write permissions on the folder to the target directory of the `host_injections` symlink.

#### Exposing NVIDIA GPU drivers

To install the symlinks to your GPU drivers in `host_injections`, run the `link_nvidia_host_libraries.sh` script that is included in EESSI:

```{ .bash .copy }
/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
```

This script uses `ldconfig` on your host system to locate your GPU drivers, and creates symbolic links to them in the correct location under `host_injections` directory. It also stores the CUDA version supported by the driver that the symlinks were created for.

!!! tip "Re-run `link_nvidia_host_libraries.sh` after NVIDIA GPU driver update"
You should re-run this script every time you update the NVIDIA GPU drivers on the host system.

Note that it is safe to re-run the script even if no driver updates were done: the script should detect that the current version of the drivers were already symlinked.

#### Installing full CUDA SDK (optional)

To install a full CUDA SDK under `host_injections`, use the `install_cuda_host_injections.sh` script that is included in EESSI:

```{ .bash .copy }
/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
```

For example, to install CUDA 12.1.1 in the directory that the [`host_injections` variant symlink](#host_injections) points to,
using `/tmp/$USER/EESSI` as directory to store temporary files:
```
/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh --cuda-version 12.1.1 --temp-dir /tmp/$USER/EESSI --accept-cuda-eula
```
You should choose the CUDA version you wish to install according to what CUDA versions are included in EESSI;
see the output of `module avail CUDA/` after [setting up your environment for using
EESSI](using_eessi/setting_up_environment.md).

You can run `/cvmfs/software.eessi.io/scripts/install_cuda_host_injections.sh --help` to check all of the options.

!!! tip

This script uses EasyBuild to install the CUDA SDK. For this to work, two requirements need to be satisfied:

* `module load EasyBuild` should work (or the `eb` command is already available in the environment);
* The version of EasyBuild being used should provide the requested version of the CUDA easyconfig file
(in the example case above, that's `CUDA-12.1.1.eb`).

You can rely on the EasyBuild installation that is included in EESSI for this.

Alternatively, you may load an EasyBuild module manually _before_ running the `install_cuda_host_injections.sh`
script to make an `eb` command available.


### Using NVIDIA GPUs via EESSI in a container {: #nvidia_eessi_container }

We focus here on the [Apptainer](https://apptainer.org/)/[Singularity](https://sylabs.io/singularity) use case,
and have only tested the [`--nv` option](https://apptainer.org/docs/user/latest/gpu.html#nvidia-gpus-cuda-standard)
to enable access to GPUs from within the container.

If you are using the [EESSI container](getting_access/eessi_container.md) to access the EESSI software,
the procedure for enabling GPU support is slightly different and will be documented here eventually.

#### Exposing NVIDIA GPU drivers

When running a container with `apptainer` or `singularity` it is _not_ necessary to run the `install_cuda_host_injections.sh`
script since both these tools use `$LD_LIBRARY_PATH` internally in order to make the host GPU drivers available
in the container.

The only scenario where this would be required is if `$LD_LIBRARY_PATH` is modified or undefined.

### Testing the GPU support {: #gpu_cuda_testing }

For now, [please open a support issue](support.md) if you need help or have questions regarding GPU support.
The quickest way to test if software installations included in EESSI can access and use your GPU is to run the
`deviceQuery` executable that is part of the `CUDA-Samples` module:
```
module load CUDA-Samples
deviceQuery
```
If both are successful, you should see information about your GPU printed to your terminal.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ nav:
- adding_software/deploying_software.md
# Todo: write on how to contribute to the EESSI test suite
# - Contributing software tests to the EESSI test suite:
- GPU support: gpu.md
- Getting support: support.md
- Meetings:
- Overview: meetings.md
Expand Down