From f9eaa3e94129bdb907d4c372ed402064ea675d89 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 4 Jun 2024 12:50:48 +0200 Subject: [PATCH 01/12] IGeneral xplaination of host_injections --- docs/site_specific_config/host_injections.md | 43 ++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 docs/site_specific_config/host_injections.md diff --git a/docs/site_specific_config/host_injections.md b/docs/site_specific_config/host_injections.md new file mode 100644 index 000000000..89fff6ea7 --- /dev/null +++ b/docs/site_specific_config/host_injections.md @@ -0,0 +1,43 @@ +# How does it work? + +## The `host_injections` variant symlink + +While the EESSI software stack aims to work 'out of the box', one cannot avoid some degree of site-specific configuration for certain functionality and/or performance tuning. For example, GPU drivers may be in different locations on different systems. Also, a site might want to tune the OpenMPI installation that comes with EESSI for optimal performance on the local hardware. + +To allow for such tuning, the EESSI repository includes a special directory where system administrations can install files that can be picked up by the software installations included in EESSI. This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*: +a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see ['Variant Symlinks' in the official CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks)). + +!!! info "Default target for `host_injections` variant symlink" + + Unless otherwise configured in the CernVM-FS client configuration for the EESSI repository, the `host_injections` symlink points to `/opt/eessi` on the client system: + ``` + $ ls -l /cvmfs/software.eessi.io/host_injections + lrwxrwxrwx 1 cvmfs cvmfs 10 Oct 3 13:51 /cvmfs/software.eessi.io/host_injections -> /opt/eessi + ``` + +The target for this symlink can be controlled by setting the `EESSI_HOST_INJECTIONS` variable in your local CVMFS configuration for EESSI. E.g. +```{bash} +sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/' > /etc/cvmfs/default.local" + +``` + +!!! note "Don't forget to reload the CernVM-FS configuration" + After making a change to a CernVM-FS configuration file, you also need to reload the configuration: + ```{ .bash .copy } + sudo cvmfs_config reload + ``` + +On a heterogenous system, you may want to use different targets for different node types. For example, you might have two types of GPU nodes (`gpu1` and `gpu2`) for which the GPU drivers are _not_ in the same location. Since that location is something we configure under `host_injections`, you'll need seperate `host_injections` directories for each node type. That can easily be achieved by putting e.g. + +```{bash} +sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu1/' > /etc/cvmfs/default.local" + +``` + +in the CVMFS config on the `gpu1` nodes, and + +```{bash} +sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu2/' > /etc/cvmfs/default.local" + +``` +in the CVMFS config on the `gpu2` nodes. From 1b937c6ad53294a46afdd8aff8d5b3d649304711 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 4 Jun 2024 12:55:55 +0200 Subject: [PATCH 02/12] Move the documentation on how to enable accelerator support to the installation chapter --- docs/{ => site_specific_config}/gpu.md | 28 +------------------------- 1 file changed, 1 insertion(+), 27 deletions(-) rename docs/{ => site_specific_config}/gpu.md (80%) diff --git a/docs/gpu.md b/docs/site_specific_config/gpu.md similarity index 80% rename from docs/gpu.md rename to docs/site_specific_config/gpu.md index 16009ba79..92d872507 100644 --- a/docs/gpu.md +++ b/docs/site_specific_config/gpu.md @@ -39,33 +39,7 @@ An additional requirement is necessary if you want to be able to compile CUDA-en Below, we describe how to make sure that the EESSI software stack can find your NVIDIA GPU drivers and (optionally) full installations of the CUDA SDK. -### `host_injections` variant symlink {: #host_injections } - -In the EESSI repository, a special directory has been prepared where system administrators can install files that can be picked up by -software installations included in EESSI. This gives the ability to administrators to influence the behaviour (and capabilities) of the EESSI software stack. - -This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*: -a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see ['Variant Symlinks' in the official CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks)). - -!!! info "Default target for `host_injections` variant symlink" - - Unless otherwise configured in the CernVM-FS client configuration for the EESSI repository, the `host_injections` symlink points to `/opt/eessi` on the client system: - ``` - $ ls -l /cvmfs/software.eessi.io/host_injections - lrwxrwxrwx 1 cvmfs cvmfs 10 Oct 3 13:51 /cvmfs/software.eessi.io/host_injections -> /opt/eessi - ``` - -As an example, let's imagine that we want to use a architecture-specific location on a shared filesystem as the target for the symlink. This has the advantage that one can make changes under `host_injections` that affect all nodes which share that CernVM-FS configuration. Configuring this in your CernVM-FS configuration would mean adding the following line in the client configuration file: - -```{ .ini .copy } -EESSI_HOST_INJECTIONS=/shared_fs/path -``` - -!!! note "Don't forget to reload the CernVM-FS configuration" - After making a change to a CernVM-FS configuration file, you also need to reload the configuration: - ```{ .bash .copy } - sudo cvmfs_config reload - ``` +### Configuring CUDA driver location {: #driver_location } All CUDA-enabled software in EESSI expects the CUDA drivers to be available in a specific subdirectory of this `host_injections` directory. In addition, installations of the CUDA SDK included EESSI are stripped down to the files that we are allowed to redistribute; From 6e80dc81c57cc1385fd26b691c1a2f7ae6911a01 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 4 Jun 2024 16:36:37 +0200 Subject: [PATCH 03/12] Change the cvmfs config file to one that is specific to the domain, as the variant symlink is specific to the EESSI repositories --- docs/site_specific_config/host_injections.md | 12 ++++++------ mkdocs.yml | 4 ++++ 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/site_specific_config/host_injections.md b/docs/site_specific_config/host_injections.md index 89fff6ea7..9d5b8c3fd 100644 --- a/docs/site_specific_config/host_injections.md +++ b/docs/site_specific_config/host_injections.md @@ -1,10 +1,10 @@ -# How does it work? +# How to configure EESSI ## The `host_injections` variant symlink While the EESSI software stack aims to work 'out of the box', one cannot avoid some degree of site-specific configuration for certain functionality and/or performance tuning. For example, GPU drivers may be in different locations on different systems. Also, a site might want to tune the OpenMPI installation that comes with EESSI for optimal performance on the local hardware. -To allow for such tuning, the EESSI repository includes a special directory where system administrations can install files that can be picked up by the software installations included in EESSI. This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*: +To allow such site-specific configuration, the EESSI repository includes a special directory where system administrations can install files that can be picked up by the software installations included in EESSI. This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*: a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see ['Variant Symlinks' in the official CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks)). !!! info "Default target for `host_injections` variant symlink" @@ -17,7 +17,7 @@ a symbolic link for which the target can be controlled by the CernVM-FS client c The target for this symlink can be controlled by setting the `EESSI_HOST_INJECTIONS` variable in your local CVMFS configuration for EESSI. E.g. ```{bash} -sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/' > /etc/cvmfs/default.local" +sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/' > /etc/cvmfs/domain.d/eessi.io.local" ``` @@ -27,17 +27,17 @@ sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/' > sudo cvmfs_config reload ``` -On a heterogenous system, you may want to use different targets for different node types. For example, you might have two types of GPU nodes (`gpu1` and `gpu2`) for which the GPU drivers are _not_ in the same location. Since that location is something we configure under `host_injections`, you'll need seperate `host_injections` directories for each node type. That can easily be achieved by putting e.g. +On a heterogenous system, you may want to use different targets for the variant symlink for different node types. For example, you might have two types of GPU nodes (`gpu1` and `gpu2`) for which the GPU drivers are _not_ in the same location, or not of the same version. Since those are both things we configure under `host_injections`, you'll need seperate `host_injections` directories for each node type. That can easily be achieved by putting e.g. ```{bash} -sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu1/' > /etc/cvmfs/default.local" +sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu1/' > /etc/cvmfs/domain.d/eessi.io.local" ``` in the CVMFS config on the `gpu1` nodes, and ```{bash} -sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu2/' > /etc/cvmfs/default.local" +sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu2/' > /etc/cvmfs/domain.d/eessi.io.local" ``` in the CVMFS config on the `gpu2` nodes. diff --git a/mkdocs.yml b/mkdocs.yml index 3d5897152..c5cd827ce 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,6 +26,10 @@ nav: - Is EESSI already installed?: getting_access/is_eessi_accessible.md - Native: getting_access/native_installation.md - Container: getting_access/eessi_container.md + - Configuring EESSI: + - How to configure EESSI: site_specific_config/host_injections.md + - GPU support: site_specific_config/gpu.md + - Custom LMOD hooks: site_specific_config/lmod_hooks.md - Basic usage: - Set up environment: using_eessi/setting_up_environment.md - Basic commands: using_eessi/basic_commands.md From ac35456ea16418c390f8f55e87ce76cf81f2114e Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 4 Jun 2024 18:11:06 +0200 Subject: [PATCH 04/12] Added documentation on how to make Lmod hooks --- docs/site_specific_config/lmod_hooks.md | 192 ++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 docs/site_specific_config/lmod_hooks.md diff --git a/docs/site_specific_config/lmod_hooks.md b/docs/site_specific_config/lmod_hooks.md new file mode 100644 index 000000000..f24ea001c --- /dev/null +++ b/docs/site_specific_config/lmod_hooks.md @@ -0,0 +1,192 @@ +# Configuring site-specific Lmod hooks +You may want to customize what happens when certain modules are loaded, for example, you may want to set additional environment variables. This is possible with [LMOD hooks](https://lmod.readthedocs.io/en/latest/170_hooks.html). A typical example would be when you want to tune the OpenMPI module for your system by setting additional environment variables when an OpenMPI module is loaded. + + +## Location of the hooks +The EESSI software stack provides its own set of hooks in `$LMOD_PACKAGE_PATH/SitePackage.lua`. This `SitePackage.lua` also searches for site-specific hooks in two additonal locations: + +- `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua` +- `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/$EESSI_OS_TYPE/$EESSI_SOFTWARE_SUBDIR/.lmod/SitePackage.lua` + +The first allows for hooks that need to be executed for that system, irrespective of the CPU architecture. The second allows for hooks specific to a certain architecture. + +## Architecture-independent hooks +Hooks are written in Lua and can use any of the standard Lmod functionality as described in the [Lmod documentation](https://lmod.readthedocs.io/en/latest/170_hooks.html). While there are many types of hooks, you most likely want to specify a load or unload hook. + +First, you typically want to load the necessary Lua packages: +```lua +-- $EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua + +-- The Strict package checks for the use of undeclared variables: +require("strict") + +-- Load the Lmod Hook package +local hook=require("Hook") +``` + +Next, we define a function that we want to use as a hook. Let's assume that we want to define a `load` hook that sets the environment variable `MY_ENV_VAR` to `1` whenever an `OpenMPI` module is loaded. Unfortunately, registering multiple hooks of the same type (e.g. multiple `load` hooks) is only supported in Lmod 8.7.35+. EESSI version 2023.06 uses Lmod 8.7.30. Thus, we define our function without the local keyword, so that we can still ad to it later in an architecture-specific hook (if we wanted to): + +```lua +-- Define a function for the hook +-- Note that we define this without 'local' keyword +-- That way we can still add to this function in an architecture-specific hook +function set_my_env_var_openmpi(t) + local simpleName = string.match(t.modFullName, "(.-)/") + if simpleName == 'OpenMPI' then + setenv('MY_ENV_VAR', '1') + end +end +``` + +for the same reason that multiple hooks cannot be registered, we need to combine this function for our site-specific (architecture-independent) with the function that specifies the EESSI `load` hook + +```lua +-- Registering multiple hook functions, e.g. multiple load hooks is only supported in Lmod 8.7.35+ +-- EESSI version 2023.06 uses lmod 8.7.30. Thus, we first have to combine all functions into a single one, +-- before registering it as a hook +local function combined_load_hook(t) + -- Call the EESSI load hook (if it exists) + -- Note that if you wanted to overwrite the EESSI hooks (not recommended!), you would ommit this + if eessi_load_hook ~= nil then + eessi_load_hook(t) + end + -- Call the site-specific load hook + set_my_env_var_openmpi(t) +end +``` + +before we can finally register this function as an Lmod hook: + +```lua +hook.register("load", combined_load_hook) +``` + +Thus, our complete `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua` now looks like this (omitting the comments): + +```lua +require("strict") +local hook=require("Hook") + +function set_my_env_var_openmpi(t) + local simpleName = string.match(t.modFullName, "(.-)/") + if simpleName == 'OpenMPI' then + setenv('MY_ENV_VAR', '1') + end +end + +local function combined_load_hook(t) + if eessi_load_hook ~= nil then + eessi_load_hook(t) + end + set_my_env_var_openmpi(t) +end + +hook.register("load", combined_load_hook) +``` + +Note that for future EESSI versions, if they use Lmod 8.7.35+, this would be simplified to: + +```lua +require("strict") +local hook=require("Hook") + +local function set_my_env_var_openmpi(t) + local simpleName = string.match(t.modFullName, "(.-)/") + if simpleName == 'OpenMPI' then + setenv('MY_ENV_VAR', '1') + end +end + +hook.register("load", set_my_env_var_openmpi, "append") +``` + +## Architecture-dependent hooks +Now, assume that in addition we want to set an environment variable `MY_SECOND_ENV_VAR` to `5`, but only for nodes that have the `zen3` architecture. First, again, you typically want to load the necessary Lua packages: + +```lua +-- $EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua + +-- The Strict package checks for the use of undeclared variables: +require("strict") + +-- Load the Lmod Hook package +local hook=require("Hook") +``` + +Next, we define the function for the hook itself + +```lua +-- Define a function for the hook +-- This time, we can define it as a local function, as there are no hooks more specific than this +local function set_my_second_env_var_openmpi(t) + local simpleName = string.match(t.modFullName, "(.-)/") + if simpleName == 'OpenMPI' then + setenv('MY_SECOND_ENV_VAR', '5') + end +end +``` + +Then, we combine the functions into one + +```lua +local function combined_load_hook(t) + -- Call the EESSI load hook first + if eessi_load_hook ~= nil then + eessi_load_hook(t) + end + -- Then call the architecture-independent load hook + if set_my_env_var_openmpi(t) ~= nil then + set_my_env_var_openmpi(t) + end + -- And finally the architecture-dependent load hook we just defined + set_my_second_env_var_openmpi(t) +end +``` + +before finally registering it as an Lmod hook + +```lua +hook.register("load", combined_load_hook) +``` + +Thus, our full `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua` now looks like this (ommitting the comments): + +```lua +require("strict") +local hook=require("Hook") + +local function set_my_second_env_var_openmpi(t) + local simpleName = string.match(t.modFullName, "(.-)/") + if simpleName == 'OpenMPI' then + setenv('MY_SECOND_ENV_VAR', '5') + end +end + +local function combined_load_hook(t) + if eessi_load_hook ~= nil then + eessi_load_hook(t) + end + if set_my_env_var_openmpi(t) ~= nil then + set_my_env_var_openmpi(t) + end + set_my_second_env_var_openmpi(t) +end + +hook.register("load", combined_load_hook) +``` + +Again, note that for future EESSI versions, if they use Lmod 8.7.35+, this would simplify to + +```lua +require("strict") +local hook=require("Hook") + +local function set_my_second_env_var_openmpi(t) + local simpleName = string.match(t.modFullName, "(.-)/") + if simpleName == 'OpenMPI' then + setenv('MY_SECOND_ENV_VAR', '5') + end +end + +hook.register("load", set_my_second_var_openmpi, "append") +``` From 3ed2e2016d806c23999d1dfb3572dc84441a793e Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 4 Jun 2024 18:15:15 +0200 Subject: [PATCH 05/12] Final polishing --- docs/site_specific_config/lmod_hooks.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/site_specific_config/lmod_hooks.md b/docs/site_specific_config/lmod_hooks.md index f24ea001c..a09a64790 100644 --- a/docs/site_specific_config/lmod_hooks.md +++ b/docs/site_specific_config/lmod_hooks.md @@ -11,7 +11,7 @@ The EESSI software stack provides its own set of hooks in `$LMOD_PACKAGE_PATH/Si The first allows for hooks that need to be executed for that system, irrespective of the CPU architecture. The second allows for hooks specific to a certain architecture. ## Architecture-independent hooks -Hooks are written in Lua and can use any of the standard Lmod functionality as described in the [Lmod documentation](https://lmod.readthedocs.io/en/latest/170_hooks.html). While there are many types of hooks, you most likely want to specify a load or unload hook. +Hooks are written in Lua and can use any of the standard Lmod functionality as described in the [Lmod documentation](https://lmod.readthedocs.io/en/latest/170_hooks.html). While there are many types of hooks, you most likely want to specify a load or unload hook. Note that the EESSI hooks provide a nice example of what you can do with hooks. Here, as an example, we will define a `load` hook that environment variable `MY_ENV_VAR` to `1` whenever an `OpenMPI` module is loaded. First, you typically want to load the necessary Lua packages: ```lua @@ -24,7 +24,7 @@ require("strict") local hook=require("Hook") ``` -Next, we define a function that we want to use as a hook. Let's assume that we want to define a `load` hook that sets the environment variable `MY_ENV_VAR` to `1` whenever an `OpenMPI` module is loaded. Unfortunately, registering multiple hooks of the same type (e.g. multiple `load` hooks) is only supported in Lmod 8.7.35+. EESSI version 2023.06 uses Lmod 8.7.30. Thus, we define our function without the local keyword, so that we can still ad to it later in an architecture-specific hook (if we wanted to): +Next, we define a function that we want to use as a hook. Unfortunately, registering multiple hooks of the same type (e.g. multiple `load` hooks) is only supported in Lmod 8.7.35+. EESSI version 2023.06 uses Lmod 8.7.30. Thus, we define our function without the local keyword, so that we can still add to it later in an architecture-specific hook (if we wanted to): ```lua -- Define a function for the hook @@ -38,7 +38,7 @@ function set_my_env_var_openmpi(t) end ``` -for the same reason that multiple hooks cannot be registered, we need to combine this function for our site-specific (architecture-independent) with the function that specifies the EESSI `load` hook +for the same reason that multiple hooks cannot be registered, we need to combine this function for our site-specific (architecture-independent) with the function that specifies the EESSI `load` hook. Note that all EESSI hooks will be called `eessi__hook` by convention. ```lua -- Registering multiple hook functions, e.g. multiple load hooks is only supported in Lmod 8.7.35+ @@ -55,7 +55,7 @@ local function combined_load_hook(t) end ``` -before we can finally register this function as an Lmod hook: +Then, we can finally register this function as an Lmod hook: ```lua hook.register("load", combined_load_hook) From 094adab3c4ccc99431f8838027e154220c921230 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Thu, 6 Jun 2024 11:20:32 +0200 Subject: [PATCH 06/12] Codespell fixes --- docs/site_specific_config/host_injections.md | 2 +- docs/site_specific_config/lmod_hooks.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/site_specific_config/host_injections.md b/docs/site_specific_config/host_injections.md index 9d5b8c3fd..4bb897608 100644 --- a/docs/site_specific_config/host_injections.md +++ b/docs/site_specific_config/host_injections.md @@ -27,7 +27,7 @@ sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/' > sudo cvmfs_config reload ``` -On a heterogenous system, you may want to use different targets for the variant symlink for different node types. For example, you might have two types of GPU nodes (`gpu1` and `gpu2`) for which the GPU drivers are _not_ in the same location, or not of the same version. Since those are both things we configure under `host_injections`, you'll need seperate `host_injections` directories for each node type. That can easily be achieved by putting e.g. +On a heterogeneous system, you may want to use different targets for the variant symlink for different node types. For example, you might have two types of GPU nodes (`gpu1` and `gpu2`) for which the GPU drivers are _not_ in the same location, or not of the same version. Since those are both things we configure under `host_injections`, you'll need separate `host_injections` directories for each node type. That can easily be achieved by putting e.g. ```{bash} sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu1/' > /etc/cvmfs/domain.d/eessi.io.local" diff --git a/docs/site_specific_config/lmod_hooks.md b/docs/site_specific_config/lmod_hooks.md index a09a64790..f4e191e5c 100644 --- a/docs/site_specific_config/lmod_hooks.md +++ b/docs/site_specific_config/lmod_hooks.md @@ -3,7 +3,7 @@ You may want to customize what happens when certain modules are loaded, for exam ## Location of the hooks -The EESSI software stack provides its own set of hooks in `$LMOD_PACKAGE_PATH/SitePackage.lua`. This `SitePackage.lua` also searches for site-specific hooks in two additonal locations: +The EESSI software stack provides its own set of hooks in `$LMOD_PACKAGE_PATH/SitePackage.lua`. This `SitePackage.lua` also searches for site-specific hooks in two additional locations: - `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua` - `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/$EESSI_OS_TYPE/$EESSI_SOFTWARE_SUBDIR/.lmod/SitePackage.lua` @@ -46,7 +46,7 @@ for the same reason that multiple hooks cannot be registered, we need to combine -- before registering it as a hook local function combined_load_hook(t) -- Call the EESSI load hook (if it exists) - -- Note that if you wanted to overwrite the EESSI hooks (not recommended!), you would ommit this + -- Note that if you wanted to overwrite the EESSI hooks (not recommended!), you would omit this if eessi_load_hook ~= nil then eessi_load_hook(t) end @@ -149,7 +149,7 @@ before finally registering it as an Lmod hook hook.register("load", combined_load_hook) ``` -Thus, our full `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua` now looks like this (ommitting the comments): +Thus, our full `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua` now looks like this (omitting the comments): ```lua require("strict") From 4298e0f7a9bc02deabc93af70a8c3db87dd63ed3 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Thu, 6 Jun 2024 14:00:02 +0200 Subject: [PATCH 07/12] Fixes for broken refs --- docs/adding_software/opening_pr.md | 2 +- docs/site_specific_config/gpu.md | 2 +- docs/test-suite/ReFrame-configuration-file.md | 12 ++++++------ docs/test-suite/installation-configuration.md | 4 ++-- docs/test-suite/release-notes.md | 8 ++++---- docs/test-suite/usage.md | 4 ++-- 6 files changed, 16 insertions(+), 16 deletions(-) diff --git a/docs/adding_software/opening_pr.md b/docs/adding_software/opening_pr.md index b1352db1d..73ad0fda6 100644 --- a/docs/adding_software/opening_pr.md +++ b/docs/adding_software/opening_pr.md @@ -97,7 +97,7 @@ git push koala example_branch If all goes well, one or more bots :robot: should almost instantly create a comment in your pull request with an overview of how it is configured - you will need this information when providing build instructions. -### Rebuilding software +### Rebuilding software {: #rebuilding_software } We typically do not rebuild software, since (strictly speaking) this breaks reproducibility for anyone using the software. However, there are certain situations in which it is difficult or impossible to avoid. To do a rebuild, you add the software you want to rebuild to a dedicated easystack file in the `rebuilds` directory. Use the following naming convention: `YYYYMMDD-eb----.yml`, where `YYYYMMDD` is the opening date of your PR. E.g. `2024.05.06-eb-4.9.1-CUDA-12.1.1-ship-full-runtime.yml` was added in a PR on the 6th of May 2024 and used to rebuild CUDA-12.1.1 using EasyBuild 4.9.1 to resolve an issue with some runtime libraries missing from the initial CUDA 12.1.1 installation. diff --git a/docs/site_specific_config/gpu.md b/docs/site_specific_config/gpu.md index 92d872507..b28e52c92 100644 --- a/docs/site_specific_config/gpu.md +++ b/docs/site_specific_config/gpu.md @@ -82,7 +82,7 @@ To install a full CUDA SDK under `host_injections`, use the `install_cuda_host_i /cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh ``` -For example, to install CUDA 12.1.1 in the directory that the [`host_injections` variant symlink](#host_injections) points to, +For example, to install CUDA 12.1.1 in the directory that the [`host_injections` variant symlink](host_injections.md) points to, using `/tmp/$USER/EESSI` as directory to store temporary files: ``` /cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh --cuda-version 12.1.1 --temp-dir /tmp/$USER/EESSI --accept-cuda-eula diff --git a/docs/test-suite/ReFrame-configuration-file.md b/docs/test-suite/ReFrame-configuration-file.md index 1c7e28f43..5c54839d9 100644 --- a/docs/test-suite/ReFrame-configuration-file.md +++ b/docs/test-suite/ReFrame-configuration-file.md @@ -3,21 +3,21 @@ In order for ReFrame to run tests on your system, it needs to know some properties about your system. For example, it needs to know what kind of job scheduler you have, which partitions the system has, how to submit to those partitions, etc. -All of this has to be described in a *ReFrame configuration file* (see also the [section on `$RFM_CONFIG_FILES` above](#RFM_CONFIG_FILES)). +All of this has to be described in a *ReFrame configuration file* (see also the [section on `$RFM_CONFIG_FILES`](installation-configuration.md#RFM_CONFIG_FILES)). This page is organized as follows: -* available ReFrame configuration file +* available ReFrame configuration files * Verifying your ReFrame configuration * How to write a ReFrame configuration file -## Available ReFrame configuration file +## Available ReFrame configuration files There are some available ReFrame configuration files for HPC systems and public cloud in the [config directory](https://github.com/EESSI/test-suite/tree/main/config/) for more inspiration. Below is a simple ReFrame configuration file with minimal changes required for getting you started on using the test suite for a CPU partition. Please check that `stagedir` is set to a path on a (shared) scratch filesystem for storing (temporary) files related to the tests, and `access` is set to a list of arguments that you would normally pass to the scheduler when submitting to this partition (for example '-p cpu' for submitting to a Slurm partition called cpu). -To write a ReFrame configuration file for your system, check the section How to write a ReFrame configuration file. +To write a ReFrame configuration file for your system, check the section [How to write a ReFrame configuration file](#write-reframe-config). ```python @@ -118,7 +118,7 @@ For example, to only show the `launcher` value for the `gpu` partition of the `e reframe --system example:gpu --show-config systems/0/partitions/@gpu/launcher ``` -## How to write a ReFrame configuration file +## How to write a ReFrame configuration file {: #write-reframe-config} The [official ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/configure.html) provides the full description on configuring ReFrame for your site. However, there are some configuration settings that are specifically @@ -184,7 +184,7 @@ The most common configuration items defined at this level are: if not specified otherwise. We recommend setting the `$RFM_PREFIX` environment variable rather than specifying `prefix` in your configuration file, so our [common logging configuration](#logging) can pick up on it - (see also [`$RFM_PREFIX`](#RFM_PREFIX)). + (see also [`$RFM_PREFIX`](installation-configuration.md#RFM_PREFIX)). - [`stagedir`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.stagedir): A shared directory that is available on all nodes that will execute ReFrame tests. This is used for storing (temporary) files related to the test. Typically, you want to set this to a path on a (shared) scratch filesystem. Defining this is optional: the default is a '`stage`' directory inside the `prefix` directory. - [`partitions`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions): Details on system partitions, see below. diff --git a/docs/test-suite/installation-configuration.md b/docs/test-suite/installation-configuration.md index 3830f4b4d..131dbccba 100644 --- a/docs/test-suite/installation-configuration.md +++ b/docs/test-suite/installation-configuration.md @@ -106,7 +106,7 @@ export RFM_CONFIG_FILES=$HOME/EESSI-test-suite/config/example.py Alternatively, you can use the `--config-file` (or `-C`) `reframe` option. -See the [section on the ReFrame configuration file](#reframe-config-file) below for more information. +See the [section on the ReFrame configuration file](ReFrame-configuration-file.md#reframe-config-file) for more information. #### Search path for tests (`$RFM_CHECK_SEARCH_PATH`) @@ -145,7 +145,7 @@ This involves: Note that the default is for ReFrame to use the current directory as prefix. We recommend setting a prefix so that logs are not scattered around and nicely appended for each run. -If our [common logging configuration](#logging) is used, the regular ReFrame log file will +If our [common logging configuration](ReFrame-configuration-file.md#logging) is used, the regular ReFrame log file will also end up in the location specified by `$RFM_PREFIX`. !!! warning diff --git a/docs/test-suite/release-notes.md b/docs/test-suite/release-notes.md index c2c1b9882..5f5a07d6c 100644 --- a/docs/test-suite/release-notes.md +++ b/docs/test-suite/release-notes.md @@ -34,11 +34,11 @@ It includes: [hooks](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/hooks.py), and [tests](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/), which can be [installed with "`pip install`"](installation-configuration.md#pip-install). -* Tests for [GROMACS](usage.md#gromacs) and [TensorFlow](usage.md#tensorflow) in [`eessi.testsuite.tests.apps`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/apps) +* Tests for [GROMACS](available-tests.md#gromacs) and [TensorFlow](available-tests.md#tensorflow) in [`eessi.testsuite.tests.apps`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/apps) that leverage the functionality provided by `eessi.testsuite.*`. -* Examples of [ReFrame configuration files](installation-configuration.md#reframe-config-file) for various systems in +* Examples of [ReFrame configuration files](ReFrame-configuration-file.md#reframe-config-file) for various systems in the [`config` subdirectory](https://github.com/EESSI/test-suite/tree/main/config). -* A [`common_logging_config()`](installation-configuration.md#logging) function to facilitate the ReFrame logging configuration. -* A set of standard *device types* and *features* that can be used in the [`partitions` section of the ReFrame configuration file](installation-configuration.md#partitions). +* A [`common_logging_config()`](ReFrame-configuration-file.md#logging) function to facilitate the ReFrame logging configuration. +* A set of standard *device types* and *features* that can be used in the [`partitions` section of the ReFrame configuration file](ReFrame-configuration-file.md#partitions). * A set of [*tags* (`CI` + `scale`) that can be used to filter checks](usage.md#filter-tag). * [Scripts](https://github.com/EESSI/test-suite/tree/main/scripts) that show how to run the test suite. diff --git a/docs/test-suite/usage.md b/docs/test-suite/usage.md index 2151cb800..b45994a12 100644 --- a/docs/test-suite/usage.md +++ b/docs/test-suite/usage.md @@ -10,7 +10,7 @@ system. To list the tests that are available in the EESSI test suite, use `reframe --list` (or `reframe -L` for short). -If you have properly [configured ReFrame](#Configuring-ReFrame), you should +If you have properly [configured ReFrame](installation-configuration.md), you should see a (potentially long) list of checks in the output: ``` @@ -71,7 +71,7 @@ ReFrame will generate various output and log files: * performance log files for each test, which include performance results for the test runs; We strongly recommend controlling where these files go by using the [common logging configuration that -is provided by the EESSI test suite in your ReFrame configuration file](installation-configuration.md#logging) +is provided by the EESSI test suite in your ReFrame configuration file](ReFrame-configuration-file.md#logging) and setting [`$RFM_PREFIX`](installation-configuration.md#RFM_PREFIX) (avoid using the cmd line option `--prefix`). If you do, and if you use [ReFrame v4.3.3 or more newer](installation-configuration.md#requirements), From c0ee9dc8125782bd847b5044942498f3aeecd7d6 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Thu, 6 Jun 2024 14:13:47 +0200 Subject: [PATCH 08/12] Update link --- docs/adding_software/debugging_failed_builds.md | 2 +- mkdocs.yml | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/adding_software/debugging_failed_builds.md b/docs/adding_software/debugging_failed_builds.md index 30a5f9530..2f8193cb4 100644 --- a/docs/adding_software/debugging_failed_builds.md +++ b/docs/adding_software/debugging_failed_builds.md @@ -61,7 +61,7 @@ If you want to install NVIDIA GPU software, make sure to also add the `--nvidia While the above works perfectly well, you might not be able to complete your debugging session in one go. With the above approach, several steps will just be repeated every time you start a debugging session: - Downloading the container -- Installing `CUDA` in your [host injections](../gpu.md#host_injections) directory (only if you use the `EESSI-install-software.sh` script, see below) +- Installing `CUDA` in your [host injections](../site_specific_config/host_injections.md) directory (only if you use the `EESSI-install-software.sh` script, see below) - Installing all dependencies (before you get to the package that actually fails to build) To avoid this, we create two directories. One holds the container & `host_injections`, which are (typically) common between multiple PRs and thus you don't have to redownload the container / reinstall the `host_injections` if you start working on another PR. The other will hold the PR-specific data: a tarball storing the software you'll build in your interactive debugging session. The paths we pick here are just example, you can pick any persistent, writeable location for this: diff --git a/mkdocs.yml b/mkdocs.yml index c5cd827ce..015492fdc 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -49,8 +49,6 @@ nav: - Usage: test-suite/usage.md - Available tests: test-suite/available-tests.md - Release notes: test-suite/release-notes.md - - Accelerators support: - - GPUs: gpu.md - Adding software to EESSI: - Overview: adding_software/overview.md - For contributors: From 6c3c9c7ed598f5cc837f9a7d8258ff680a55325d Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Thu, 6 Jun 2024 14:24:27 +0200 Subject: [PATCH 09/12] Fix links --- docs/site_specific_config/gpu.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/site_specific_config/gpu.md b/docs/site_specific_config/gpu.md index b28e52c92..cc19b6734 100644 --- a/docs/site_specific_config/gpu.md +++ b/docs/site_specific_config/gpu.md @@ -3,7 +3,7 @@ More information on the actions that must be performed to ensure that GPU software included in EESSI can use the GPU in your system is available below. -[Please open a support issue](support.md) if you need help or have questions regarding GPU support. +[Please open a support issue](../support.md) if you need help or have questions regarding GPU support. !!! tip "Make sure the `${EESSI_VERSION}` version placeholder is defined!" In this page, we use `${EESSI_VERSION}` as a placeholder for the version of the EESSI repository, @@ -54,7 +54,7 @@ If the corresponding full installation of the CUDA SDK is available there, the C ### Using NVIDIA GPUs via a native EESSI installation {: #nvidia_eessi_native } -Here, we describe the steps to enable GPU support when you have a [native EESSI installation](getting_access/native_installation.md) on your system. +Here, we describe the steps to enable GPU support when you have a [native EESSI installation](../getting_access/native_installation.md) on your system. !!! warning "Required permissions" To enable GPU support for EESSI on your system, you will typically need to have system administration rights, since you need write permissions on the folder to the target directory of the `host_injections` symlink. @@ -89,7 +89,7 @@ using `/tmp/$USER/EESSI` as directory to store temporary files: ``` You should choose the CUDA version you wish to install according to what CUDA versions are included in EESSI; see the output of `module avail CUDA/` after [setting up your environment for using -EESSI](using_eessi/setting_up_environment.md). +EESSI](../using_eessi/setting_up_environment.md). You can run `/cvmfs/software.eessi.io/scripts/install_cuda_host_injections.sh --help` to check all of the options. @@ -113,7 +113,7 @@ We focus here on the [Apptainer](https://apptainer.org/)/[Singularity](https://s and have only tested the [`--nv` option](https://apptainer.org/docs/user/latest/gpu.html#nvidia-gpus-cuda-standard) to enable access to GPUs from within the container. -If you are using the [EESSI container](getting_access/eessi_container.md) to access the EESSI software, +If you are using the [EESSI container](../getting_access/eessi_container.md) to access the EESSI software, the procedure for enabling GPU support is slightly different and will be documented here eventually. #### Exposing NVIDIA GPU drivers From 005aaf75a47fc5f5d5e96ce553a658174e52fa53 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Thu, 6 Jun 2024 14:28:14 +0200 Subject: [PATCH 10/12] Try to fix one more ref --- docs/test-suite/installation-configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/test-suite/installation-configuration.md b/docs/test-suite/installation-configuration.md index 131dbccba..d065feed4 100644 --- a/docs/test-suite/installation-configuration.md +++ b/docs/test-suite/installation-configuration.md @@ -126,7 +126,7 @@ export RFM_CHECK_SEARCH_RECURSIVE=1 Alternatively, you can use the `--checkpath` (or `-c`) and `--recursive` (or `-R`) `reframe` options. -#### ReFrame prefix (`$RFM_PREFIX`) { #RFM_PREFIX } +#### ReFrame prefix (`$RFM_PREFIX`) {: #RFM_PREFIX } *(see also [`RFM_PREFIX` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_PREFIX))* From 76d4a747fba37f3712f12250d61866d5fa5b5a0b Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 1 Jul 2024 15:30:21 +0200 Subject: [PATCH 11/12] Make more clear why/when configuration is necessary --- docs/site_specific_config/host_injections.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/site_specific_config/host_injections.md b/docs/site_specific_config/host_injections.md index 4bb897608..0f14791d3 100644 --- a/docs/site_specific_config/host_injections.md +++ b/docs/site_specific_config/host_injections.md @@ -1,8 +1,13 @@ # How to configure EESSI -## The `host_injections` variant symlink +## Why configuration is necessary + +Just [installing EESSI](../getting_access/native_installation.md) is enough to get started with the EESSI software stack on a CPU-based system. However, additional configuration is necessary in many other cases, such as +- enabling GPU support on GPU-based systems +- site-specific configuration / tuning of the MPI libraries provided by EESSI +- overriding EESSI's MPI library with an ABI compatible host MPI -While the EESSI software stack aims to work 'out of the box', one cannot avoid some degree of site-specific configuration for certain functionality and/or performance tuning. For example, GPU drivers may be in different locations on different systems. Also, a site might want to tune the OpenMPI installation that comes with EESSI for optimal performance on the local hardware. +## The `host_injections` variant symlink To allow such site-specific configuration, the EESSI repository includes a special directory where system administrations can install files that can be picked up by the software installations included in EESSI. This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*: a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see ['Variant Symlinks' in the official CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks)). From 085c890a78e3bee7feace97a07209599f8b2af30 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 1 Jul 2024 15:31:59 +0200 Subject: [PATCH 12/12] Add configuration to the installation header --- mkdocs.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mkdocs.yml b/mkdocs.yml index 9bdd01bda..cd28e6166 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -28,7 +28,7 @@ nav: - Production: repositories/software.eessi.io.md - RISC-V: repositories/riscv.eessi.io.md - Pilot: repositories/pilot.md - - Installation: + - Installation and configuration: - Is EESSI already installed?: getting_access/is_eessi_accessible.md - Native: getting_access/native_installation.md - Container: getting_access/eessi_container.md