-
Notifications
You must be signed in to change notification settings - Fork 33
Madgraph4GPU User Guide
Last modification: 2 May 2024
- TL;DR
- Introduction
- Environment for building the software
- Retrieving the software
- Generating code for a physics process and run the event generation
- Nomenclature
generate 10k events for p p > t t~ g
on a CPU:
git clone --recurse-submodules https://github.com/madgraph5/madgraph4gpu.git
cd madgraph4gpu/MG5aMC/mg5amcnlo
cat << EOF >> pp_ttxg.mg5
generate p p > t t~ g
output madgraph_simd
launch
EOF
./bin/mg5_aMC pp_ttxg.mg5
The instructions below shall provide the necessary information for users of the madgraph4gpu repository to install the software, generate the source code for various physics processes and to run the event generation for those.
We have agreed with the upstream mg5amcnlo team to provide madgraph4gpu as a "plugin" to the project. The instructions below provide you the current way of retrieving and building the software. It is expected that the way of retrieving the software will change, the usage of it though shall be less impacted by the future evolution of this software.
If you run into any troubles with the guide below please contact [email protected]
We are building the software with various compiler versions and linux operating systems. Below you find a table of combinations which we have tested.
OS | Compiler |
---|---|
Alma Linux 9.3 | gcc 13.1.1 |
OS | CPU compiler | NVidia GPU compiler |
---|---|---|
Alma Linux 9.3 | gcc 13.1.1 | nvcc 12.4 |
OS | CPU compiler | AMD GPU compiler |
---|---|---|
As the software is still under development we have not provided an official release of it yet. We always intend to keep the master branches in a working state. To retrieve the madgraph4gpu package together with the upstream mg5amcnlo generator software you can checkout the software with
git clone --recurse-submodules https://github.com/madgraph5/madgraph4gpu.git
The upstream mg5amcnlo package is currently being provided as a git submodule in the repository under madgraph4gpu/MG5aMC/mg5amcnlo
. With the git clone above the proper version with the correct git hash of the submodule shall be checked out which is compatible with the madgraph4gpu project. In case of incompatibilities you can checkout the proper tip of the branch of this submodule with
cd madgraph4gpu/MG5aMC/mg5amcnlo
git checkout gpucpp
The project aims to stay as close as possible to the original syntax of mg5amcnlo when it comes to generating physics processes. We have augmented the syntax where necessary e.g. to steer the generation of code for a certain hardware platform or the level of parallelisation for the event generation.
From a functionality point of view two modes for the code generation can be distinguished
- madevent mode which allows the full fledged generation of events which can be generated together with the mg5amcnlo package which provides pieces such as the random number generation, phase space sampling, phase space integration and I/O.
- standalone mode provides a reduced functionality and shall be done when e.g. used in connection with another event generator where the matrix element calculations are used as a "plugin"
Some further remarks:
- madgraph4gpu is currently capable to generate code for standard model leading-order processes and run them.
- We also provide or are working on a limited set of SUSY, HEFT and SMEFT processes. If you are interested in those please send us a mail
- We are also working on next-to-leading-order processes. At this moment there is no code generation available though yet
- madgraph4gpu also allows the calculation of matrix elements also in single (float) precision but tests have shown that the generated physics results are not accurate enough. We do not recommend to use this mode.
Paste the examples below into a file and launch them by running ./bin/mg5_aMC <filename>
in the madgraph4gpu/MG5aMC/mg5amcnlo
of the madgraph4gpu repository
A simple set of commands for running the event generation for the p p > t t~ g
process on CPU:
generate p p > t t~ g
output madevent_simd PROC_pp_ttx
launch
set cudacpp_backend CPP
set vector_size 32
set nevents 250k
set sde_strategy 1
- The
simd
inoutput madevent_simd
triggers the code generation for CPU architectures and use their vector registers for parallisation -
set cudacpp_backend CPP
specifies the specific architecture backend. At the moment onlyCPP
is available. Internally the vector width available on the build machine will be used for compilation. In the future also specific vector widths will be available. -
set vector_size 32
sets the level of parallel execution. The minimum value is4
, the recommendation is to use a number high enough to fill the CPU vector register withn
C++ double precision numbers. E.g. the vector width of AVX2 is 256 bit which will fit 4 64-bit double precision numbers.- Over committing the hardware with setting the number higher is a good idea
-
set sde_strategy 1
shall be used for the time being when generating code via madgraph4gpu (explanation !!)
Switching to GPU generation (e.g. on an NVidia GPU) the input file changes to:
generate p p > t t~ g
output madevent_gpu PROC_pp_ttx
launch
set cudacpp_backend CUDA
set vector_size 8192
set nevents 250k
set sde_strategy 1
Additional comments on top of those for CPU generation:
- For running efficiently on GPUs with double precision calculations you need "high end" GPUs e.g. NVidia A100. Other "consumer grade" GPUs will not provide sufficient double precision calculation power.
- Use
output madevent_gpu
for generating code for GPU processing - Use
set cudacpp_backend CUDA
for processing on any NVidia GPU. -
set vector_size
to a sufficiently large numbers. The number should be a multiple multiple of the number of cores and modulo 0 the number of streaming multiprocessors (SM) of the GPU (e.g. 128 SMs on an NVidia GA100)
NB: The instructions for standalone mode may chnage in the future, we aim for a similar syntax as for madevent mode above. For the time being you can use the following syntax for code generation
generate p p > t t~ g
output standalone_cudacpp PROC_pp_ttx
this will generate the source code for using the matrix element calculations as a plugin e.g. for other generator packages. The interface to the hardware accelerated code is available for Fortran in file SubProcesses/fbridge.inc
C Create a Bridge and return its pointer
C - PBRIDGE: the memory address of the C++ Bridge
C - NEVT: the number of events in the Fortran arrays
C - NPAR: the number of external particles in the Fortran arrays (KEPT FOR SANITY CHECKS ONLY: remove it?)
C - NP4: the number of momenta components, usually 4, in the Fortran arrays (KEPT FOR SANITY CHECKS ONLY: remove it?)
INTERFACE
SUBROUTINE FBRIDGECREATE(PBRIDGE, NEVT, NPAR, NP4)
INTEGER*8 PBRIDGE
INTEGER*4 NEVT
INTEGER*4 NPAR
INTEGER*4 NP4
END SUBROUTINE FBRIDGECREATE
END INTERFACE
C Delete a Bridge.
C - PBRIDGE: the memory address of the C++ Bridge
INTERFACE
SUBROUTINE FBRIDGEDELETE(PBRIDGE)
INTEGER*8 PBRIDGE
END SUBROUTINE FBRIDGEDELETE
END INTERFACE
C Execute the matrix-element calculation "sequence" via a Bridge on GPU/CUDA or CUDA/C++.
C - PBRIDGE: the memory address of the C++ Bridge
C - MOMENTA: the input 4-momenta Fortran array
C - GS: the input Gs (running QCD coupling constant alphas) Fortran array
C - RNDHEL: the input random number Fortran array for helicity selection
C - RNDCOL: the input random number Fortran array for color selection
C - MES: the output matrix element Fortran array
C - SELHEL: the output selected helicity Fortran array
C - SELCOL: the output selected color Fortran array
INTERFACE
SUBROUTINE FBRIDGESEQUENCE_NOMULTICHANNEL(PBRIDGE, MOMENTA, GS, RNDHEL, RNDCOL, MES, SELHEL, SELCOL)
INTEGER*8 PBRIDGE
DOUBLE PRECISION MOMENTA(*)
DOUBLE PRECISION GS(*)
DOUBLE PRECISION RNDHEL(*)
DOUBLE PRECISION RNDCOL(*)
DOUBLE PRECISION MES(*)
INTEGER*4 SELHEL(*)
INTEGER*4 SELCOL(*)
END SUBROUTINE FBRIDGESEQUENCE_NOMULTICHANNEL
END INTERFACE
an example on how to use this interface is available e.g. in file SubProcesses/P1_[...]/fcheck_sa.f
Acronym | Info |
---|---|
madgraph4gpu | The project to speed up the Madgraph5_aMC@NLO event generator package by offloading parts of the upstream project to compute accelerators provided in https://github.com/madgraph5/madgraph4gpu |
mg5amcnlo | The upstream Madgraph5_aMC@NLO event generator package, as provided in https://github.com/mg5amcnlo/mg5amcnlo |