Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs on how to debug the test step that is run on the build node #158

Merged
merged 2 commits into from
Jun 4, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions docs/adding_software/debugging_failed_builds.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,36 @@ After some time, this build fails while trying to build `Plumed`, and we can acc
!!! Note
While this might be faster than the EasyStack-based approach, this is _not_ how the bot builds. So why it _may_ reproduce the failure the bot encounters, it may not reproduce the bug _at all_ (no failure) or run into _different_ bugs. If you want to be sure, use the EasyStack-based approach.

## Running the test step
If you are still in the prefix layer (i.e. after previously building something), exit it first:
```
$ exit
logout
Leaving Gentoo Prefix with exit status 0
```
Then, source the EESSI init script (again):
```
Apptainer> source ${EESSI_CVMFS_REPO}/versions/${EESSI_VERSION}/init/bash
Environment set up to use EESSI (2023.06), have fun!
{EESSI 2023.06} Apptainer>
```

!!! Note
If you are in a SLURM environment, make sure to run `for i in $(env | grep SLURM); do unset "${i%=*}"; done` to unset any SLURM environment variables. Failing to do so will cause `mpirun` to pick up on these and e.g. infer how many slots are available. If you run into errors of the form "There are not enough slots available in the system to satisfy the X slots that were requested by the application:", you probably forgot this step.

Then, execute the `run_tests.sh` script. We are assuming you are still in the root of the `software-layer` repository that you cloned earlier:
```
./run_tests.sh
```
if all goes well, you should see (part of) the EESSI test suite being run by ReFrame, finishing with something like

```
[ PASSED ] Ran X/Y test case(s) from Z check(s) (0 failure(s), 0 skipped, 0 aborted)
```

!!! Note
If you are running on a system with hyperthreading enabled, you may still run into the "There are not enough slots available in the system to satisfy the X slots that were requested by the application:" error from `mpirun`, because hardware threads are not considered to be slots by default by OpenMPIs `mpirun`. In this case, run with `OMPI_MCA_hwloc_base_use_hwthreads_as_cpus=1 ./run_tests.sh` (for OpenMPI 4.X) or `PRTE_MCA_rmaps_default_mapping_policy=:hwtcpus ./run_tests.sh` (for OpenMPI 5.X).

## Known causes of issues in EESSI

### The custom system prefix of the compatibility layer
Expand Down
Loading