Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Great Lakes Cluster (UMich) #4869

Merged
merged 6 commits into from
Apr 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Docs/source/install/hpc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ This section documents quick-start guides for a selection of supercomputers that
hpc/spock
hpc/summit
hpc/taurus
hpc/greatlakes

.. tip::

Expand Down
240 changes: 240 additions & 0 deletions Docs/source/install/hpc/greatlakes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
.. _building-greatlakes:

Great Lakes (UMich)
===================

The `Great Lakes cluster <https://arc.umich.edu/greatlakes/>`_ is located at University of Michigan.
The cluster has various partitions, including `GPU nodes and CPU nodes <https://arc.umich.edu/greatlakes/configuration/>`__.


Introduction
------------

If you are new to this system, **please see the following resources**:

* `Great Lakes user guide <https://arc.umich.edu/greatlakes/>`__
* Batch system: `Slurm <https://arc.umich.edu/greatlakes/slurm-user-guide/>`__
* `Jupyter service <https://greatlakes.arc-ts.umich.edu>`__ (`documentation <https://arc.umich.edu/greatlakes/user-guide/#document-2>`__)
* `Filesystems <https://arc.umich.edu/greatlakes/user-guide/#document-1>`__:

* ``$HOME``: per-user directory, use only for inputs, source and scripts; backed up (80GB)
* ``/scratch``: per-project `production directory <https://arc.umich.edu/greatlakes/user-guide/#scratchpolicies>`__; very fast for parallel jobs; purged every 60 days (10TB default)


.. _building-greatlakes-preparation:

Preparation
-----------

Use the following commands to download the WarpX source code:

.. code-block:: bash

git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx

On Great Lakes, you can run either on GPU nodes with `fast V100 GPUs (recommended), the even faster A100 GPUs (only a few available) or CPU nodes <https://arc.umich.edu/greatlakes/configuration/>`__.

.. tab-set::

.. tab-item:: V100 GPUs

We use system software modules, add environment hints and further dependencies via the file ``$HOME/greatlakes_v100_warpx.profile``.
Create it now:

.. code-block:: bash

cp $HOME/src/warpx/Tools/machines/greatlakes-umich/greatlakes_v100_warpx.profile.example $HOME/greatlakes_v100_warpx.profile

.. dropdown:: Script Details
:color: light
:icon: info
:animate: fade-in-slide-down

.. literalinclude:: ../../../../Tools/machines/greatlakes-umich/greatlakes_v100_warpx.profile.example
:language: bash

Edit the 2nd line of this script, which sets the ``export proj=""`` variable.
For example, if you are member of the project ``iloveplasma``, then run ``nano $HOME/greatlakes_v100_warpx.profile`` and edit line 2 to read:

.. code-block:: bash

export proj="iloveplasma"

Exit the ``nano`` editor with ``Ctrl`` + ``O`` (save) and then ``Ctrl`` + ``X`` (exit).

.. important::

Now, and as the first step on future logins to Great Lakes, activate these environment settings:

.. code-block:: bash

source $HOME/greatlakes_v100_warpx.profile

Finally, since Great Lakes does not yet provide software modules for some of our dependencies, install them once:

.. code-block:: bash

bash $HOME/src/warpx/Tools/machines/greatlakes-umich/install_v100_dependencies.sh
source ${HOME}/sw/greatlakes/v100/venvs/warpx-v100/bin/activate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above the guide informs the user to always source $HOME/greatlakes_v100_warpx.profile.

Is activate copied to the greatlakes_v100_warpx.profile or are we loading a different source here? If so, why?

Copy link
Member Author

@ax3l ax3l Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra line is only needed once, as we set up the dependencies, to continue in the same terminal.

The reason for that extra line in this step is that we already sourced the profile but only the line now adds the venv - so it was not yet activated.


.. dropdown:: Script Details
:color: light
:icon: info
:animate: fade-in-slide-down

.. literalinclude:: ../../../../Tools/machines/greatlakes-umich/install_v100_dependencies.sh
:language: bash


.. tab-item:: A100 Nodes

.. note::

This section is TODO.


.. tab-item:: CPU Nodes

.. note::

This section is TODO.


.. _building-greatlakes-compilation:

Compilation
-----------

Use the following :ref:`cmake commands <building-cmake>` to compile the application executable:

.. tab-set::

.. tab-item:: V100 GPUs

.. code-block:: bash

cd $HOME/src/warpx
rm -rf build_v100

cmake -S . -B build_v100 -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_v100 -j 8

The WarpX application executables are now in ``$HOME/src/warpx/build_v100/bin/``.
Additionally, the following commands will install WarpX as a Python module:

.. code-block:: bash

cd $HOME/src/warpx
rm -rf build_v100_py

cmake -S . -B build_v100_py -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_v100_py -j 8 --target pip_install


.. tab-item:: A100 Nodes

.. note::

This section is TODO.


.. tab-item:: CPU Nodes

.. note::

This section is TODO.

Now, you can :ref:`submit Great Lakes compute jobs <running-cpp-greatlakes>` for WarpX :ref:`Python (PICMI) scripts <usage-picmi>` (:ref:`example scripts <usage-examples>`).
Or, you can use the WarpX executables to submit greatlakes jobs (:ref:`example inputs <usage-examples>`).
For executables, you can reference their location in your :ref:`job script <running-cpp-greatlakes>` or copy them to a location in ``/scratch``.


.. _building-greatlakes-update:

Update WarpX & Dependencies
---------------------------

If you already installed WarpX in the past and want to update it, start by getting the latest source code:

.. code-block:: bash

cd $HOME/src/warpx

# read the output of this command - does it look ok?
git status

# get the latest WarpX source code
git fetch
git pull

# read the output of these commands - do they look ok?
git status
git log # press q to exit

And, if needed,

- :ref:`update the greatlakes_v100_warpx.profile file <building-greatlakes-preparation>`,
- log out and into the system, activate the now updated environment profile as usual,
- :ref:`execute the dependency install scripts <building-greatlakes-preparation>`.

As a last step, clean the build directory ``rm -rf $HOME/src/warpx/build_*`` and rebuild WarpX.


.. _running-cpp-greatlakes:

Running
-------

.. tab-set::

.. tab-item:: V100 (16GB) GPUs

The batch script below can be used to run a WarpX simulation on multiple nodes (change ``-N`` accordingly) on the supercomputer Great Lakes at University of Michigan.
This partition has `20 nodes, each with two V100 GPUs <https://arc.umich.edu/greatlakes/configuration/>`__.

Replace descriptions between chevrons ``<>`` by relevant values, for instance ``<input file>`` could be ``plasma_mirror_inputs``.
Note that we run one MPI rank per GPU.

.. literalinclude:: ../../../../Tools/machines/greatlakes-umich/greatlakes_v100.sbatch
:language: bash
:caption: You can copy this file from ``$HOME/src/warpx/Tools/machines/greatlakes-umich/greatlakes_v100.sbatch``.

To run a simulation, copy the lines above to a file ``greatlakes_v100.sbatch`` and run

.. code-block:: bash

sbatch greatlakes_v100.sbatch

to submit the job.


.. tab-item:: A100 (80GB) GPUs

This partition has `2 nodes, each with four A100 GPUs <https://arc.umich.edu/greatlakes/configuration/>`__ that provide 80 GB HBM per A100 GPU.
To the user, each node will appear as if it has 8 A100 GPUs with 40 GB memory each.

.. note::

This section is TODO.


.. tab-item:: CPU Nodes

The Great Lakes CPU partition as up to `455 nodes <https://arc.umich.edu/greatlakes/configuration/>`__, each with 2x Intel Xeon Gold 6154 CPUs and 180 GB RAM.

.. note::

This section is TODO.


.. _post-processing-greatlakes:

Post-Processing
---------------

For post-processing, many users prefer to use the online `Jupyter service <https://greatlakes.arc-ts.umich.edu>`__ (`documentation <https://arc.umich.edu/greatlakes/user-guide/#document-2>`__) that is directly connected to the cluster's fast filesystem.

.. note::

This section is a stub and contributions are welcome.
We can document further details, e.g., which recommended post-processing Python software to install or how to customize Jupyter kernels here.
38 changes: 38 additions & 0 deletions Tools/machines/greatlakes-umich/greatlakes_v100.sbatch
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/bin/bash -l

# Copyright 2024 The WarpX Community
#
# Author: Axel Huebl
# License: BSD-3-Clause-LBNL

#SBATCH -t 00:10:00
#SBATCH -N 1
#SBATCH -J WarpX
#SBATCH -A <proj>
#SBATCH --partition=gpu
#SBATCH --exclusive
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=20
#SBATCH --gpus-per-task=v100:1
ax3l marked this conversation as resolved.
Show resolved Hide resolved
#SBATCH --gpu-bind=single:1
#SBATCH -o WarpX.o%j
Copy link

@bstassel bstassel Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend expliciting putting
#SBATCH --mem=0
to signify this request is allocating all the memory on the node. This should happen dynamically, since --exclusive is set but i think for users it is a good idea to put so they have some reference to what is implicitly happening with the job request

Copy link
Member Author

@ax3l ax3l Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, --mem=0 means all? o.0
I think --exclusive is a bit clearer for now / avoids duplication, unless it does not work to reserve all host memory.

#SBATCH -e WarpX.e%j

# executable & inputs file or python interpreter & PICMI script here
EXE=./warpx
INPUTS=inputs

# threads for OpenMP and threaded compressors per MPI rank
# per node are 2x 2.4 GHz Intel Xeon Gold 6148
# note: the system seems to only expose cores (20 per socket),
# not hyperthreads (40 per socket)
export SRUN_CPUS_PER_TASK=20
export OMP_NUM_THREADS=${SRUN_CPUS_PER_TASK}

# GPU-aware MPI optimizations
GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1"

# run WarpX
srun --cpu-bind=cores \
${EXE} ${INPUTS} ${GPU_AWARE_MPI} \
> output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# please set your project account
export proj="" # change me!

# remembers the location of this script
export MY_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE)
if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your $MY_PROFILE file! Please edit its line 2 to continue!"; return; fi

# required dependencies
module purge
module load gcc/10.3.0
module load cuda/12.1.1
module load cmake/3.26.3
module load openblas/0.3.23
module load openmpi/4.1.6-cuda

# optional: for QED support
module load boost/1.78.0

# optional: for openPMD and PSATD+RZ support
module load phdf5/1.12.1

SW_DIR="${HOME}/sw/greatlakes/v100"
export CMAKE_PREFIX_PATH=${SW_DIR}/c-blosc2-2.14.4:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${SW_DIR}/adios2-2.10.0:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${SW_DIR}/blaspp-master:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=${SW_DIR}/lapackpp-master:$CMAKE_PREFIX_PATH

export LD_LIBRARY_PATH=${SW_DIR}/c-blosc2-2.14.4/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${SW_DIR}/adios2-2.10.0/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${SW_DIR}/blaspp-master/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${SW_DIR}/lapackpp-master/lib64:$LD_LIBRARY_PATH

export PATH=${SW_DIR}/adios2-2.10.0/bin:${PATH}

# optional: for Python bindings or libEnsemble
module load python/3.12.1

if [ -d "${SW_DIR}/venvs/warpx-v100" ]
then
source ${SW_DIR}/venvs/warpx-v100/bin/activate
fi

# an alias to request an interactive batch node for one hour
# for parallel execution, start on the batch node: srun <command>
alias getNode="salloc -N 1 --partition=gpu --ntasks-per-node=2 --cpus-per-task=20 --gpus-per-task=v100:1 -t 1:00:00 -A $proj"
# an alias to run a command on a batch node for up to 30min
# usage: runNode <command>
alias runNode="srun -N 1 --partition=gpu --ntasks-per-node=2 --cpus-per-task=20 --gpus-per-task=v100:1 -t 1:00:00 -A $proj"

# optimize CUDA compilation for V100
export AMREX_CUDA_ARCH=7.0

# optimize CPU microarchitecture for Intel Xeon Gold 6148
export CXXFLAGS="-march=skylake-avx512"
export CFLAGS="-march=skylake-avx512"

# compiler environment hints
export CC=$(which gcc)
export CXX=$(which g++)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}
Loading
Loading