From 70004f95700557e39b5fc0a2ca98f6a98700a70a Mon Sep 17 00:00:00 2001 From: Xin An <34663977+xinan1911@users.noreply.github.com> Date: Tue, 4 Jun 2024 09:58:45 +0200 Subject: [PATCH 1/4] Update known issue about lmod hook in host-injection --- docs/known_issues/eessi-2023.06.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/docs/known_issues/eessi-2023.06.md b/docs/known_issues/eessi-2023.06.md index a4908b2d1..b3c07382b 100644 --- a/docs/known_issues/eessi-2023.06.md +++ b/docs/known_issues/eessi-2023.06.md @@ -7,7 +7,7 @@

This is an error that occurs with OpenMPI after updating to OFED 23.10.

-

Their is an upstream issue on this problem opened with EasyBuild. +

There is an upstream issue on this problem opened with EasyBuild. See: https://github.com/easybuilders/easybuild-easyconfigs/issues/20233

Workarounds @@ -26,3 +26,17 @@ export OMPI_MCA_pml='ucx' export OMPI_MCA_mtl='^ofi' ``` + +### `Bug in EESSI initialization and priority mechanisms: site OpenMPI or UCX not loaded` +
+ +

This error may occur when bugs resolving or site-specific tuning is needed for OpenMPI or UCX.

+ +

There is an issue on this problem opened with EESSI software layer repository. +See: https://github.com/EESSI/software-layer/issues/456

+ +Workarounds + +

The workaround is to specify site properties and allow defining lmod hooks in host injections (see https://github.com/EESSI/software-layer/pull/525). +

+ From afb7c245b9e380a61874c1cbf26b0b046d92627a Mon Sep 17 00:00:00 2001 From: Xin An <34663977+xinan1911@users.noreply.github.com> Date: Tue, 18 Jun 2024 13:59:33 +0200 Subject: [PATCH 2/4] Update mkdocs.yml for structure and remove version pages of known issues --- mkdocs.yml | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/mkdocs.yml b/mkdocs.yml index 086edb9fa..cb2b3f2c3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -22,19 +22,24 @@ nav: - Compatibility layer: compatibility_layer.md - Software layer: software_layer.md - Supported CPU targets: software_layer/cpu_targets.md + - Available software and repositories: + - Software: available_software/overview.md + - Repositories: + - Production: repositories/software.eessi.io.md + - RISC-V: repositories/riscv.eessi.io.md + - Pilot: repositories/pilot.md - Installation: - Is EESSI already installed?: getting_access/is_eessi_accessible.md - Native: getting_access/native_installation.md - Container: getting_access/eessi_container.md + - Windows and macOS: + - Windows with WSL: getting_access/eessi_wsl.md + - macOS with Lima: getting_access/eessi_limactl.md - Basic usage: - Set up environment: using_eessi/setting_up_environment.md - Basic commands: using_eessi/basic_commands.md - Demos: using_eessi/eessi_demos.md - Advanced usage: - - Repositories: - - Production: repositories/software.eessi.io.md - - RISC-V: repositories/riscv.eessi.io.md - - Pilot: repositories/pilot.md - Setting up your Stratum: filesystem_layer/stratum1.md - Building software with EESSI: using_eessi/building_on_eessi.md - Test suite: @@ -46,6 +51,8 @@ nav: - Release notes: test-suite/release-notes.md - Accelerators support: - GPUs: gpu.md + - Known issues and workarounds: + - v2023.06: known_issues/eessi-2023.06.md - Adding software to EESSI: - Overview: adding_software/overview.md - For contributors: @@ -57,10 +64,6 @@ nav: - Building software: adding_software/building_software.md - Deploying software: adding_software/deploying_software.md - Build nodes: software_layer/build_nodes.md - - Known issues: - - v2023.06: known_issues/eessi-2023.06.md - - v2022.02: [] - - pilot: [] - Community and support: - Getting support: support.md - Meetings: meetings.md From eb02210c8bd63a1384374636a8987b32fab9394a Mon Sep 17 00:00:00 2001 From: Xin An <34663977+xinan1911@users.noreply.github.com> Date: Tue, 18 Jun 2024 14:16:36 +0200 Subject: [PATCH 3/4] Adding reference to Lmod hooks --- docs/known_issues/eessi-2023.06.md | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/docs/known_issues/eessi-2023.06.md b/docs/known_issues/eessi-2023.06.md index b3c07382b..fdc595331 100644 --- a/docs/known_issues/eessi-2023.06.md +++ b/docs/known_issues/eessi-2023.06.md @@ -25,18 +25,7 @@ export OMPI_MCA_btl='^uct,ofi' export OMPI_MCA_pml='ucx' export OMPI_MCA_mtl='^ofi' ``` - - -### `Bug in EESSI initialization and priority mechanisms: site OpenMPI or UCX not loaded` -
- -

This error may occur when bugs resolving or site-specific tuning is needed for OpenMPI or UCX.

- -

There is an issue on this problem opened with EESSI software layer repository. -See: https://github.com/EESSI/software-layer/issues/456

- -Workarounds -

The workaround is to specify site properties and allow defining lmod hooks in host injections (see https://github.com/EESSI/software-layer/pull/525). +You may also set these additional environment variables via site-specific Lmod hooks. For more information about how to write and implement site-specific Lmod hooks, please check [EESSI Site Specific Configuration LMOD Hooks](site_specific_config/lmod_hooks.md)

From 0b6c2e8fa441c2ba3d3f2cb273870663e3c75841 Mon Sep 17 00:00:00 2001 From: Xin An <34663977+xinan1911@users.noreply.github.com> Date: Wed, 19 Jun 2024 13:58:49 +0200 Subject: [PATCH 4/4] Update docs/known_issues/eessi-2023.06.md Co-authored-by: Caspar van Leeuwen <33718780+casparvl@users.noreply.github.com> --- docs/known_issues/eessi-2023.06.md | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/docs/known_issues/eessi-2023.06.md b/docs/known_issues/eessi-2023.06.md index fdc595331..41204425d 100644 --- a/docs/known_issues/eessi-2023.06.md +++ b/docs/known_issues/eessi-2023.06.md @@ -26,6 +26,30 @@ export OMPI_MCA_pml='ucx' export OMPI_MCA_mtl='^ofi' ``` -You may also set these additional environment variables via site-specific Lmod hooks. For more information about how to write and implement site-specific Lmod hooks, please check [EESSI Site Specific Configuration LMOD Hooks](site_specific_config/lmod_hooks.md) +You may also set these additional environment variables via site-specific Lmod hooks: +``` +require("strict") +local hook=require("Hook") + +-- Fix Failed to modify UD QP to INIT on mlx5_0: Operation not permitted +function fix_ud_qp_init_openmpi(t) + local simpleName = string.match(t.modFullName, "(.-)/") + if simpleName == 'OpenMPI' then + setenv('OMPI_MCA_btl', '^uct,ofi') + setenv('OMPI_MCA_pml', 'ucx') + setenv('OMPI_MCA_mtl', '^ofi') + end +end + +local function combined_load_hook(t) + if eessi_load_hook ~= nil then + eessi_load_hook(t) + end + fix_ud_qp_init_openmpi(t) +end + +hook.register("load", combined_load_hook) +``` + For more information about how to write and implement site-specific Lmod hooks, please check [EESSI Site Specific Configuration LMOD Hooks](site_specific_config/lmod_hooks.md)