This repository provides multiple parallel n-body simulation algorithms, implemented in portable ISO C++ that runs on multi-core CPUs and GPUs:
- All-Pairs;
$O(N^2)$ time complexity:- Classic
all-pair
, parallelized over bodies. -
all-pairs-collapsed
, parallelized over force pairs.
- Classic
- Barnes-Hut ;
$O(N \log N)$ time complexity:- Starvation-free
octree
algorithm: requires parallel forward progress. - Hilbert-sorted Bounding Volume Hierarchy (
bvh
) algorithm: requires weakly parallel forward progress.
- Starvation-free
Pre-requisites: docker
and HPCCM:
$ pip install hpccm
Run samples as follows:
# Options
# ./ci/run_docker <toolchain> <algorithm> <workload case> <dim> <precision> <bodies> <steps>
# Example: nvc++ gpu compiler, octree algorithm, galaxy simulation, 3D, double precision:
$ ./ci/run_docker nvgpu octree galaxy 3 double
# Build, but not run:
$ BUILD_ONLY=1 ./ci/run_docker nvgpu octree galaxy 3 double
# Run assuming binary is built:
$ RUN_ONLY=1 ./ci/run_docker nvgpu octree galaxy 3 double
To reproduce without a container, a properly set up environment is required, in which case the ./ci/run
script can be used instead.
Following options available:
- Toolchain:
- Open-source vendor-neutral:
acpp
(AdaptiveCpp),gcc
(Intel TBB),clang
(Intel TBB), - Vendor-specific:
- AMD ROCm stdpar:
amdclang
. - NVIDIA HPC SDK:
nvgpu
(nvc++ -stdpar=gpu
),nvcpu
(nvc++ -stdpar=cpu
). - Intel oneAPI:
dpc++
- AMD ROCm stdpar:
- Open-source vendor-neutral:
- Algorithm:
all-pairs
,all-pairs-collapsed
,octree
,bvh
. - Dimensions:
2
(2D),3
(3D). - Precision:
float
,double
. - Workloads:
galaxy
nasa
: loads data-set from file, requires using./ci/run_docker thuering fetch
for set up.
To run all benchmarks on a given systems, you can use ./ci/run_docker bench
.
MIT License, see LICENSE.
Thomas Lane Cassell, Tom Deakin, Aksel Alpay, Vincent Heuveline, and Gonzalo Brito Gadeschi. "Efficient Tree-Based Parallel Algorithms for N-Body Simulations Using C++ Standard Parallelism." In Workshop on Irregular Applications: Architectures and Algorithms Held in Conjunction with Supercomputing (P3HPC). IEEE, 2024.
When contributing code, you may format your contributions as follows:
$ ./ci/run_docker fmt
but doing this is not required.
The environment is made portable through mamba/conda.
This must be installed as a prerequisite, e.g., run the Miniforge installer from https://github.com/conda-forge/miniforge .
Then create the stdpar-nbody
environment:
$ mamba env create -f environment.yaml
Other things you might want:
- NVIDIA HPC SDK
Use make
to build the program.
This must be done within the mamba environment:
$ mamba activate stdpar-bh
The number of dimensions can be specified with D=<dim>
parameter to make
.
By default D=2
is used.
These are the available targets:
CPU
make gcc
make clang
make nvcpp
GPU
make gpu
to build for NVIDIA GPUs usingnvc++
The output will be ./nbody_d<dim>_<target>
.
When running the nvcpp
version, it is recommended to use the following environment variables:
OMP_PLACES=cores OMP_PROC_BIND=close ./nbody_d2_nvcpp -s 5 -n 1000000
If you get an error about missing libraries then try running with the following environment variable:
LD_LIBRARY_PATH=${CONDA_PREFIX}/lib ./nbody_d2_clang -s 5 -n 1000000
Run Barnes-Hut with
$ ./nbody_d2_gpu -s 5 -n 10 --print-state --theta 0
$ ./nbody_d2_gpu -s 5 -n 10 --print-state --algorithm all-pairs
Run a large Barnes-Hut simulation with 1,000,000 bodies:
$ ./nbody_d2_gpu -s 5 -n 1000000
Generate a similar image to the above GIF:
$ ./nbody_d2_gpu -s 1000 -n 10000 --save pos --workload galaxy
$ python3 scripts/plotter.py pos --galaxy --gif
To find other program arguments:
$ ./nbody_d2_gpu --help