Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPI build problem with clashing PMIx versions #19456

Open
verdurin opened this issue Dec 22, 2023 · 6 comments · May be fixed by easybuilders/easybuild-easyblocks#3511
Open

OpenMPI build problem with clashing PMIx versions #19456

verdurin opened this issue Dec 22, 2023 · 6 comments · May be fixed by easybuilders/easybuild-easyblocks#3511
Milestone

Comments

@verdurin
Copy link
Member

Am trying to build OpenMPI-4.1.5-GCC-12.3.0 on CentOS 7.9.

There's an error which appears to show a clash between the OS PMIx and the EB PMIx:

--- MCA component pmix:pmix3x (m4 configuration macro)
checking for MCA component pmix:pmix3x compile mode... dso
configure: WARNING: Found configure shell variable clash at line 176474!
configure: WARNING: OPAL_VAR_SCOPE_PUSH called on "PMIX_VERSION",
configure: WARNING: but it is already defined with value "3.2.3"
configure: WARNING: This usually indicates an error in configure.
configure: error: Cannot continue
 (at easybuild/tools/run.py:681 in parse_cmd_output)

In config.log we see this:

configure:14792: checking if user requested internal PMIx support(/apps/eb/el7/2023a/skylake/software/PMIx/4.2.4-GCCcore-12.3.0)
configure:14806: result: no
configure:14861: checking for pmix.h in /apps/eb/el7/2023a/skylake/software/PMIx/4.2.4-GCCcore-12.3.0
configure:14870: result: not found
configure:14872: checking for pmix.h in /apps/eb/el7/2023a/skylake/software/PMIx/4.2.4-GCCcore-12.3.0/include
configure:14877: result: found
configure:14931: checking libpmix.* in /apps/eb/el7/2023a/skylake/software/PMIx/4.2.4-GCCcore-12.3.0/lib64
configure:14936: result: found
configure:14966: checking PMIx version
configure:14977: result: version file found
configure:14986: checking version 4x
configure:15004: gcc -E -I/apps/eb/el7/2023a/skylake/software/PMIx/4.2.4-GCCcore-12.3.0/include -I/apps/eb/el7/2023a/skylake/software/UCC/1.2.0-GCCcore-12.3.0/include -I/apps/eb/el7/2023a/skylake/software/PMIx/4.2.4-GCCcore-12.3.0/include -I/apps/eb/el7/2023a/skylake/software/libfabric/1.18.0-GCCcore-12.3.0/include -I/apps/eb/el7/2023a/skylake/software/UCX/1.14.1-GCCcore-12.3.0/include -I/apps/eb/el7/2023a/skylake/software/libevent/2.1.12-GCCcore-12.3.0/include -I/apps/eb/el7/2023a/skylake/software/hwloc/2.9.1-GCCcore-12.3.0/include -I/apps/eb/el7/2023a/skylake/software/zlib/1.2.13-GCCcore-12.3.0/include -I/apps/eb/el7/2023a/skylake/software/pkgconf/1.9.5-GCCcore-12.3.0/include conftest.c
configure:15004: $? = 0
configure:15006: result: found
configure:15205: checking PMIx version to be used
configure:15209: result: external(4x)
@branfosj
Copy link
Member

@jfgrimm saw this in #19449 (comment)

@jfgrimm
Copy link
Member

jfgrimm commented Dec 22, 2023

Indeed
Perhaps we should explicitly unset PMIX_* variables before configure/build steps?

@boegel boegel added this to the 4.x milestone Jan 3, 2024
@boegel
Copy link
Member

boegel commented Jan 3, 2024

@jfgrimm Wouldn't that simply shift the problem to a runtime issue when trying to use the OpenMPI module?

@Flamefire
Copy link
Contributor

Flamefire commented Jun 6, 2024

I saw that recently too: The shell variable $PMIX_VERSION is defined and that is all it takes to trigger the issue.

Not only that: We have PMIx installed in /usr (e.g. /usr/lib64/libpmix.so.2.2.34) and hence pass -with-pmix=/usr --with-libevent=/usr to configureopts. So it is the correct configuration but it still errors because the variable is set, i.e. there isn't even a version clash.

The problem is that we build in a SLURM job (i.e. after srun) which sets those variables.

I assume unsetting them is fine.

@verdurin
Copy link
Member Author

Just saw this again after a cluster OS upgrade:

--- MCA component pmix:pmix3x (m4 configuration macro)
checking for MCA component pmix:pmix3x compile mode... dso
configure: WARNING: Found configure shell variable clash at line 175707!
configure: WARNING: OPAL_VAR_SCOPE_PUSH called on "PMIX_VERSION",
configure: WARNING: but it is already defined with value "4.2.9"
configure: WARNING: This usually indicates an error in configure.
configure: error: Cannot continue
 (at easybuild/tools/run.py:695 in parse_cmd_output)

@Flamefire
Copy link
Contributor

@verdurin Are you running this inside a SLURM job? I opened a PR to make the easyblock unset the PMIX variables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants