MPI

The Message Passing Interface (MPI) is a library for distributed memory parallelism using a single program, multiple data (SPMD) model.

Which MPI?

There are two main MPI implementions available as modules on the cluster: openmpi (Open MPI) and mpich (MPICH). Both should work, and we haven't done any detailed benchmarking to notice any performance differences. Note that they have different Application Binary Interfaces (ABIs), so code will need to be recompiled if switching implementations.

Some modules (such as hdf5) depend on MPI: you will have to make sure you load the corresponding versions.

Configuring Julia

The MPI.jl package provides Julia bindings to MPI. You can configure it to use the system binaries (instead of the bundled libraries) using the MPIPreferences package, see https://juliaparallel.org/MPI.jl/latest/configuration/ for more details.

Downstream packages which use MPI will also need to be configured appropriately (e.g. if you want to use the HDF5.jl package with MPI).

See https://github.com/CliMA/ClimaCore.jl/commit/bcdffe3f0846cadb9709ff2d0aa19a819282263e for an example.

Starting MPI jobs

MPI jobs are started using a launcher: this is a program that will start the multiple processes of your executable on the relevant machines, and make sure they can talk to each other. Typical usage is

<launcher> [launcher opts] <executable> [executable opts]

There are several different MPI launchers available on the cluster:

srun is the Slurm launcher
mpiexec/mpirun is the launcher from the MPI distribution
- Open MPI, also known as orterun
- MPICH.

In general, I would recommend using srun if possible, since:

when one process is killed, it prints that error before all the others.
it collects statistics more accurately (see Memory#basic-slurm-commands).

One slightly weird difference is that it if one process dies, it doesn't kill all the other jobs by default. You can fix this by using srun --kill-on-bad-exit=1 (or setting the environment SLURM_KILL_BAD_EXIT=1).

Managing output

By default, mpiexec will redirect stdout and stderr from all processes to the main process, which can make identifying errors confusing. Most launchers provide an option to redirect each process to a file (--output in srun). The downside is that then the live logs won't be visible in the Buildkite log.

A simple solution is to launch a background process that watches the output from one of these output files, and writes it to stdout. e.g.

    command:
      - ": > print-out-0.log"
      - tail -F print-out-0.log & # print log in background
      - srun --output=print-out-%t.log <executable>
      - sleep 1 # give time for log to print

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI

Which MPI?

Configuring Julia

Starting MPI jobs

Managing output

Clone this wiki locally