-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Great Lakes Cluster (UMich) #4869
Changes from all commits
dd94cf0
f5a89f0
dd4acc7
b42138b
5730f8c
3b0fbd6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,240 @@ | ||
.. _building-greatlakes: | ||
|
||
Great Lakes (UMich) | ||
=================== | ||
|
||
The `Great Lakes cluster <https://arc.umich.edu/greatlakes/>`_ is located at University of Michigan. | ||
The cluster has various partitions, including `GPU nodes and CPU nodes <https://arc.umich.edu/greatlakes/configuration/>`__. | ||
|
||
|
||
Introduction | ||
------------ | ||
|
||
If you are new to this system, **please see the following resources**: | ||
|
||
* `Great Lakes user guide <https://arc.umich.edu/greatlakes/>`__ | ||
* Batch system: `Slurm <https://arc.umich.edu/greatlakes/slurm-user-guide/>`__ | ||
* `Jupyter service <https://greatlakes.arc-ts.umich.edu>`__ (`documentation <https://arc.umich.edu/greatlakes/user-guide/#document-2>`__) | ||
* `Filesystems <https://arc.umich.edu/greatlakes/user-guide/#document-1>`__: | ||
|
||
* ``$HOME``: per-user directory, use only for inputs, source and scripts; backed up (80GB) | ||
* ``/scratch``: per-project `production directory <https://arc.umich.edu/greatlakes/user-guide/#scratchpolicies>`__; very fast for parallel jobs; purged every 60 days (10TB default) | ||
|
||
|
||
.. _building-greatlakes-preparation: | ||
|
||
Preparation | ||
----------- | ||
|
||
Use the following commands to download the WarpX source code: | ||
|
||
.. code-block:: bash | ||
|
||
git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx | ||
|
||
On Great Lakes, you can run either on GPU nodes with `fast V100 GPUs (recommended), the even faster A100 GPUs (only a few available) or CPU nodes <https://arc.umich.edu/greatlakes/configuration/>`__. | ||
|
||
.. tab-set:: | ||
|
||
.. tab-item:: V100 GPUs | ||
|
||
We use system software modules, add environment hints and further dependencies via the file ``$HOME/greatlakes_v100_warpx.profile``. | ||
Create it now: | ||
|
||
.. code-block:: bash | ||
|
||
cp $HOME/src/warpx/Tools/machines/greatlakes-umich/greatlakes_v100_warpx.profile.example $HOME/greatlakes_v100_warpx.profile | ||
|
||
.. dropdown:: Script Details | ||
:color: light | ||
:icon: info | ||
:animate: fade-in-slide-down | ||
|
||
.. literalinclude:: ../../../../Tools/machines/greatlakes-umich/greatlakes_v100_warpx.profile.example | ||
:language: bash | ||
|
||
Edit the 2nd line of this script, which sets the ``export proj=""`` variable. | ||
For example, if you are member of the project ``iloveplasma``, then run ``nano $HOME/greatlakes_v100_warpx.profile`` and edit line 2 to read: | ||
|
||
.. code-block:: bash | ||
|
||
export proj="iloveplasma" | ||
|
||
Exit the ``nano`` editor with ``Ctrl`` + ``O`` (save) and then ``Ctrl`` + ``X`` (exit). | ||
|
||
.. important:: | ||
|
||
Now, and as the first step on future logins to Great Lakes, activate these environment settings: | ||
|
||
.. code-block:: bash | ||
|
||
source $HOME/greatlakes_v100_warpx.profile | ||
|
||
Finally, since Great Lakes does not yet provide software modules for some of our dependencies, install them once: | ||
|
||
.. code-block:: bash | ||
|
||
bash $HOME/src/warpx/Tools/machines/greatlakes-umich/install_v100_dependencies.sh | ||
source ${HOME}/sw/greatlakes/v100/venvs/warpx-v100/bin/activate | ||
|
||
.. dropdown:: Script Details | ||
:color: light | ||
:icon: info | ||
:animate: fade-in-slide-down | ||
|
||
.. literalinclude:: ../../../../Tools/machines/greatlakes-umich/install_v100_dependencies.sh | ||
:language: bash | ||
|
||
|
||
.. tab-item:: A100 Nodes | ||
|
||
.. note:: | ||
|
||
This section is TODO. | ||
|
||
|
||
.. tab-item:: CPU Nodes | ||
|
||
.. note:: | ||
|
||
This section is TODO. | ||
|
||
|
||
.. _building-greatlakes-compilation: | ||
|
||
Compilation | ||
----------- | ||
|
||
Use the following :ref:`cmake commands <building-cmake>` to compile the application executable: | ||
|
||
.. tab-set:: | ||
|
||
.. tab-item:: V100 GPUs | ||
|
||
.. code-block:: bash | ||
|
||
cd $HOME/src/warpx | ||
rm -rf build_v100 | ||
|
||
cmake -S . -B build_v100 -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_DIMS="1;2;RZ;3" | ||
cmake --build build_v100 -j 8 | ||
|
||
The WarpX application executables are now in ``$HOME/src/warpx/build_v100/bin/``. | ||
Additionally, the following commands will install WarpX as a Python module: | ||
|
||
.. code-block:: bash | ||
|
||
cd $HOME/src/warpx | ||
rm -rf build_v100_py | ||
|
||
cmake -S . -B build_v100_py -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3" | ||
cmake --build build_v100_py -j 8 --target pip_install | ||
|
||
|
||
.. tab-item:: A100 Nodes | ||
|
||
.. note:: | ||
|
||
This section is TODO. | ||
|
||
|
||
.. tab-item:: CPU Nodes | ||
|
||
.. note:: | ||
|
||
This section is TODO. | ||
|
||
Now, you can :ref:`submit Great Lakes compute jobs <running-cpp-greatlakes>` for WarpX :ref:`Python (PICMI) scripts <usage-picmi>` (:ref:`example scripts <usage-examples>`). | ||
Or, you can use the WarpX executables to submit greatlakes jobs (:ref:`example inputs <usage-examples>`). | ||
For executables, you can reference their location in your :ref:`job script <running-cpp-greatlakes>` or copy them to a location in ``/scratch``. | ||
|
||
|
||
.. _building-greatlakes-update: | ||
|
||
Update WarpX & Dependencies | ||
--------------------------- | ||
|
||
If you already installed WarpX in the past and want to update it, start by getting the latest source code: | ||
|
||
.. code-block:: bash | ||
|
||
cd $HOME/src/warpx | ||
|
||
# read the output of this command - does it look ok? | ||
git status | ||
|
||
# get the latest WarpX source code | ||
git fetch | ||
git pull | ||
|
||
# read the output of these commands - do they look ok? | ||
git status | ||
git log # press q to exit | ||
|
||
And, if needed, | ||
|
||
- :ref:`update the greatlakes_v100_warpx.profile file <building-greatlakes-preparation>`, | ||
- log out and into the system, activate the now updated environment profile as usual, | ||
- :ref:`execute the dependency install scripts <building-greatlakes-preparation>`. | ||
|
||
As a last step, clean the build directory ``rm -rf $HOME/src/warpx/build_*`` and rebuild WarpX. | ||
|
||
|
||
.. _running-cpp-greatlakes: | ||
|
||
Running | ||
------- | ||
|
||
.. tab-set:: | ||
|
||
.. tab-item:: V100 (16GB) GPUs | ||
|
||
The batch script below can be used to run a WarpX simulation on multiple nodes (change ``-N`` accordingly) on the supercomputer Great Lakes at University of Michigan. | ||
This partition has `20 nodes, each with two V100 GPUs <https://arc.umich.edu/greatlakes/configuration/>`__. | ||
|
||
Replace descriptions between chevrons ``<>`` by relevant values, for instance ``<input file>`` could be ``plasma_mirror_inputs``. | ||
Note that we run one MPI rank per GPU. | ||
|
||
.. literalinclude:: ../../../../Tools/machines/greatlakes-umich/greatlakes_v100.sbatch | ||
:language: bash | ||
:caption: You can copy this file from ``$HOME/src/warpx/Tools/machines/greatlakes-umich/greatlakes_v100.sbatch``. | ||
|
||
To run a simulation, copy the lines above to a file ``greatlakes_v100.sbatch`` and run | ||
|
||
.. code-block:: bash | ||
|
||
sbatch greatlakes_v100.sbatch | ||
|
||
to submit the job. | ||
|
||
|
||
.. tab-item:: A100 (80GB) GPUs | ||
|
||
This partition has `2 nodes, each with four A100 GPUs <https://arc.umich.edu/greatlakes/configuration/>`__ that provide 80 GB HBM per A100 GPU. | ||
To the user, each node will appear as if it has 8 A100 GPUs with 40 GB memory each. | ||
|
||
.. note:: | ||
|
||
This section is TODO. | ||
|
||
|
||
.. tab-item:: CPU Nodes | ||
|
||
The Great Lakes CPU partition as up to `455 nodes <https://arc.umich.edu/greatlakes/configuration/>`__, each with 2x Intel Xeon Gold 6154 CPUs and 180 GB RAM. | ||
|
||
.. note:: | ||
|
||
This section is TODO. | ||
|
||
|
||
.. _post-processing-greatlakes: | ||
|
||
Post-Processing | ||
--------------- | ||
|
||
For post-processing, many users prefer to use the online `Jupyter service <https://greatlakes.arc-ts.umich.edu>`__ (`documentation <https://arc.umich.edu/greatlakes/user-guide/#document-2>`__) that is directly connected to the cluster's fast filesystem. | ||
|
||
.. note:: | ||
|
||
This section is a stub and contributions are welcome. | ||
We can document further details, e.g., which recommended post-processing Python software to install or how to customize Jupyter kernels here. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#!/bin/bash -l | ||
|
||
# Copyright 2024 The WarpX Community | ||
# | ||
# Author: Axel Huebl | ||
# License: BSD-3-Clause-LBNL | ||
|
||
#SBATCH -t 00:10:00 | ||
#SBATCH -N 1 | ||
#SBATCH -J WarpX | ||
#SBATCH -A <proj> | ||
#SBATCH --partition=gpu | ||
#SBATCH --exclusive | ||
#SBATCH --ntasks-per-node=2 | ||
#SBATCH --cpus-per-task=20 | ||
#SBATCH --gpus-per-task=v100:1 | ||
ax3l marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#SBATCH --gpu-bind=single:1 | ||
#SBATCH -o WarpX.o%j | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Recommend expliciting putting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait, |
||
#SBATCH -e WarpX.e%j | ||
|
||
# executable & inputs file or python interpreter & PICMI script here | ||
EXE=./warpx | ||
INPUTS=inputs | ||
|
||
# threads for OpenMP and threaded compressors per MPI rank | ||
# per node are 2x 2.4 GHz Intel Xeon Gold 6148 | ||
# note: the system seems to only expose cores (20 per socket), | ||
# not hyperthreads (40 per socket) | ||
export SRUN_CPUS_PER_TASK=20 | ||
export OMP_NUM_THREADS=${SRUN_CPUS_PER_TASK} | ||
|
||
# GPU-aware MPI optimizations | ||
GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1" | ||
|
||
# run WarpX | ||
srun --cpu-bind=cores \ | ||
${EXE} ${INPUTS} ${GPU_AWARE_MPI} \ | ||
> output.txt |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# please set your project account | ||
export proj="" # change me! | ||
|
||
# remembers the location of this script | ||
export MY_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE) | ||
if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your $MY_PROFILE file! Please edit its line 2 to continue!"; return; fi | ||
|
||
# required dependencies | ||
module purge | ||
module load gcc/10.3.0 | ||
module load cuda/12.1.1 | ||
module load cmake/3.26.3 | ||
module load openblas/0.3.23 | ||
module load openmpi/4.1.6-cuda | ||
|
||
# optional: for QED support | ||
module load boost/1.78.0 | ||
|
||
# optional: for openPMD and PSATD+RZ support | ||
module load phdf5/1.12.1 | ||
|
||
SW_DIR="${HOME}/sw/greatlakes/v100" | ||
export CMAKE_PREFIX_PATH=${SW_DIR}/c-blosc2-2.14.4:$CMAKE_PREFIX_PATH | ||
export CMAKE_PREFIX_PATH=${SW_DIR}/adios2-2.10.0:$CMAKE_PREFIX_PATH | ||
export CMAKE_PREFIX_PATH=${SW_DIR}/blaspp-master:$CMAKE_PREFIX_PATH | ||
export CMAKE_PREFIX_PATH=${SW_DIR}/lapackpp-master:$CMAKE_PREFIX_PATH | ||
|
||
export LD_LIBRARY_PATH=${SW_DIR}/c-blosc2-2.14.4/lib64:$LD_LIBRARY_PATH | ||
export LD_LIBRARY_PATH=${SW_DIR}/adios2-2.10.0/lib64:$LD_LIBRARY_PATH | ||
export LD_LIBRARY_PATH=${SW_DIR}/blaspp-master/lib64:$LD_LIBRARY_PATH | ||
export LD_LIBRARY_PATH=${SW_DIR}/lapackpp-master/lib64:$LD_LIBRARY_PATH | ||
|
||
export PATH=${SW_DIR}/adios2-2.10.0/bin:${PATH} | ||
|
||
# optional: for Python bindings or libEnsemble | ||
module load python/3.12.1 | ||
|
||
if [ -d "${SW_DIR}/venvs/warpx-v100" ] | ||
then | ||
source ${SW_DIR}/venvs/warpx-v100/bin/activate | ||
fi | ||
|
||
# an alias to request an interactive batch node for one hour | ||
# for parallel execution, start on the batch node: srun <command> | ||
alias getNode="salloc -N 1 --partition=gpu --ntasks-per-node=2 --cpus-per-task=20 --gpus-per-task=v100:1 -t 1:00:00 -A $proj" | ||
# an alias to run a command on a batch node for up to 30min | ||
# usage: runNode <command> | ||
alias runNode="srun -N 1 --partition=gpu --ntasks-per-node=2 --cpus-per-task=20 --gpus-per-task=v100:1 -t 1:00:00 -A $proj" | ||
|
||
# optimize CUDA compilation for V100 | ||
export AMREX_CUDA_ARCH=7.0 | ||
|
||
# optimize CPU microarchitecture for Intel Xeon Gold 6148 | ||
export CXXFLAGS="-march=skylake-avx512" | ||
export CFLAGS="-march=skylake-avx512" | ||
|
||
# compiler environment hints | ||
export CC=$(which gcc) | ||
export CXX=$(which g++) | ||
export FC=$(which gfortran) | ||
export CUDACXX=$(which nvcc) | ||
export CUDAHOSTCXX=${CXX} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above the guide informs the user to always
source $HOME/greatlakes_v100_warpx.profile
.Is activate copied to the greatlakes_v100_warpx.profile or are we loading a different source here? If so, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra line is only needed once, as we set up the dependencies, to continue in the same terminal.
The reason for that extra line in this step is that we already sourced the profile but only the line now adds the venv - so it was not yet activated.