Skip to content

Commit

Permalink
Add some documentation on how to debug the test step implemented in E…
Browse files Browse the repository at this point in the history
  • Loading branch information
casparvl casparvl committed Mar 1, 2024
1 parent cf9e514 commit 51bca17
Showing 1 changed file with 30 additions and 0 deletions.
30 changes: 30 additions & 0 deletions docs/adding_software/debugging_failed_builds.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,36 @@ After some time, this build fails while trying to build `Plumed`, and we can acc
!!! Note
While this might be faster than the EasyStack-based approach, this is _not_ how the bot builds. So why it _may_ reproduce the failure the bot encounters, it may not reproduce the bug _at all_ (no failure) or run into _different_ bugs. If you want to be sure, use the EasyStack-based approach.

## Running the test step
If you are still in the prefix layer (i.e. after previously building something), exit it first:
```
$ exit
logout
Leaving Gentoo Prefix with exit status 0
```
Then, source the EESSI init script (again):
```
Apptainer> source ${EESSI_CVMFS_REPO}/versions/${EESSI_VERSION}/init/bash
Environment set up to use EESSI (2023.06), have fun!
{EESSI 2023.06} Apptainer>
```

!!! Note
If you are in a SLURM environment, make sure to run `for i in $(env | grep SLURM); do unset "${i%=*}"; done` to unset any SLURM environment variables. Failing to do so will cause `mpirun` to pick up on these and e.g. infer how many slots are available. If you run into errors of the form "There are not enough slots available in the system to satisfy the X slots that were requested by the application:", you probably forgot this step.

Then, execute the `run_tests.sh` script. We are assuming you are still in the root of the `software-layer` repository that you cloned earlier:
```
./run_tests.sh
```
if all goes well, you should see (part of) the EESSI test suite being run by ReFrame, finishing with something like

```
[ PASSED ] Ran X/Y test case(s) from Z check(s) (0 failure(s), 0 skipped, 0 aborted)
```

!!! Note
If you are running on a system with hyperthreading enabled, you may still run into the "There are not enough slots available in the system to satisfy the X slots that were requested by the application:" error from `mpirun`, because hardware threads are not considered to be slots by default by OpenMPIs `mpirun`. In this case, run with `OMPI_MCA_hwloc_base_use_hwthreads_as_cpus=1./run_tests.sh` (for OpenMPI 4.X) or `PRTE_MCA_rmaps_default_mapping_policy=:hwtcpus ./run_tests.sh` (for OpenMPI 5.X).

## Known causes of issues in EESSI

### The custom system prefix of the compatibility layer
Expand Down

0 comments on commit 51bca17

Please sign in to comment.