Skip to content

Madgraph4GPU User Guide

Stephan Hageboeck edited this page May 13, 2024 · 16 revisions

Last modification: 2 May 2024

TL;DR

generate 10k events for p p > t t~ g on a CPU:

git clone --recurse-submodules https://github.com/madgraph5/madgraph4gpu.git

cd madgraph4gpu/MG5aMC/mg5amcnlo

cat << EOF >> pp_ttxg.mg5
generate p p > t t~ g
output madgraph_simd
launch
EOF

./bin/mg5_aMC pp_ttxg.mg5

Introduction

The instructions below shall provide the necessary information for users of the madgraph4gpu repository to install the software, generate the source code for various physics processes and to run the event generation for those.

We have agreed with the upstream mg5amcnlo team to provide madgraph4gpu as a "plugin" to the project. The instructions below provide you the current way of retrieving and building the software. It is expected that the way of retrieving the software will change, the usage of it though shall be less impacted by the future evolution of this software.

If you run into any troubles with the guide below please contact [email protected]

Environment for building the software

We are building the software with various compiler versions and linux operating systems. Below you find a table of combinations which we have tested.

CPU with vector instructions

OS Compiler
Alma Linux 9.3 gcc 13.1.1

NVidia GPUs

OS CPU compiler NVidia GPU compiler
Alma Linux 9.3 gcc 13.1.1 nvcc 12.4

AMD GPUs

OS CPU compiler AMD GPU compiler

Retrieving the software

As the software is still under development we have not provided an official release of it yet. We always intend to keep the master branches in a working state. To retrieve the madgraph4gpu package together with the upstream mg5amcnlo generator software you can checkout the software with

git clone --recurse-submodules https://github.com/madgraph5/madgraph4gpu.git

The upstream mg5amcnlo package is currently being provided as a git submodule in the repository under madgraph4gpu/MG5aMC/mg5amcnlo. With the git clone above the proper version with the correct git hash of the submodule shall be checked out which is compatible with the madgraph4gpu project. In case of incompatibilities you can checkout the proper tip of the branch of this submodule with

cd madgraph4gpu/MG5aMC/mg5amcnlo
git checkout gpucpp

Generating code for a physics process and run the event generation

General remarks

The project aims to stay as close as possible to the original syntax of mg5amcnlo when it comes to generating physics processes. We have augmented the syntax where necessary e.g. to steer the generation of code for a certain hardware platform or the level of parallelisation for the event generation.

From a functionality point of view two modes for the code generation can be distinguished

  • madevent mode which allows the full fledged generation of events which can be generated together with the mg5amcnlo package which provides pieces such as the random number generation, phase space sampling, phase space integration and I/O.
  • standalone mode provides a reduced functionality and shall be done when e.g. used in connection with another event generator where the matrix element calculations are used as a "plugin"

Some further remarks:

  • madgraph4gpu is currently capable to generate code for standard model leading-order processes and run them.
  • We also provide or are working on a limited set of SUSY, HEFT and SMEFT processes. If you are interested in those please send us a mail
  • We are also working on next-to-leading-order processes. At this moment there is no code generation available though yet
  • madgraph4gpu also allows the calculation of matrix elements also in single (float) precision but tests have shown that the generated physics results are not accurate enough. We do not recommend to use this mode.

Paste the examples below into a file and launch them by running ./bin/mg5_aMC <filename> in the madgraph4gpu/MG5aMC/mg5amcnlo of the madgraph4gpu repository

Generate code and launch madgraph4gpu in madevent mode

CPU

A simple set of commands for running the event generation for the p p > t t~ g process on CPU:

generate p p > t t~ g
output madevent_simd PROC_pp_ttx
launch
set cudacpp_backend CPP 
set vector_size 32
set nevents 250k
set sde_strategy 1
  • The simd in output madevent_simd triggers the code generation for CPU architectures and use their vector registers for parallisation
  • set cudacpp_backend CPP specifies the specific architecture backend. At the moment only CPP is available. Internally the vector width available on the build machine will be used for compilation. In the future also specific vector widths will be available.
  • set vector_size 32 sets the level of parallel execution. The minimum value is 4, the recommendation is to use a number high enough to fill the CPU vector register with n C++ double precision numbers. E.g. the vector width of AVX2 is 256 bit which will fit 4 64-bit double precision numbers.
    • Over committing the hardware with setting the number higher is a good idea
  • set sde_strategy 1 shall be used for the time being when generating code via madgraph4gpu (explanation !!)

GPU

Switching to GPU generation (e.g. on an NVidia GPU) the input file changes to:

generate p p > t t~ g
output madevent_gpu PROC_pp_ttx
launch
set cudacpp_backend CUDA 
set vector_size 8192
set nevents 250k
set sde_strategy 1

Additional comments on top of those for CPU generation:

  • For running efficiently on GPUs with double precision calculations you need "high end" GPUs e.g. NVidia A100. Other "consumer grade" GPUs will not provide sufficient double precision calculation power.
  • Use output madevent_gpu for generating code for GPU processing
  • Use set cudacpp_backend CUDA for processing on any NVidia GPU.
  • set vector_size to a sufficiently large numbers. The number should be a multiple multiple of the number of cores and modulo 0 the number of streaming multiprocessors (SM) of the GPU (e.g. 128 SMs on an NVidia GA100)

Generate code and launch madgraph4gpu in standalone mode

NB: The instructions for standalone mode may chnage in the future, we aim for a similar syntax as for madevent mode above. For the time being you can use the following syntax for code generation

generate p p > t t~ g
output standalone_cudacpp PROC_pp_ttx

this will generate the source code for using the matrix element calculations as a plugin e.g. for other generator packages. The interface to the hardware accelerated code is available for Fortran in file SubProcesses/fbridge.inc

C Create a Bridge and return its pointer
C - PBRIDGE: the memory address of the C++ Bridge
C - NEVT:    the number of events in the Fortran arrays
C - NPAR:    the number of external particles in the Fortran arrays (KEPT FOR SANITY CHECKS ONLY: remove it?)
C - NP4:     the number of momenta components, usually 4, in the Fortran arrays (KEPT FOR SANITY CHECKS ONLY: remove it?)
      INTERFACE
         SUBROUTINE FBRIDGECREATE(PBRIDGE, NEVT, NPAR, NP4)
         INTEGER*8 PBRIDGE
         INTEGER*4 NEVT
         INTEGER*4 NPAR
         INTEGER*4 NP4
         END SUBROUTINE FBRIDGECREATE
      END INTERFACE
      
C Delete a Bridge.
C - PBRIDGE: the memory address of the C++ Bridge
      INTERFACE
         SUBROUTINE FBRIDGEDELETE(PBRIDGE)
         INTEGER*8 PBRIDGE
         END SUBROUTINE FBRIDGEDELETE
      END INTERFACE


C Execute the matrix-element calculation "sequence" via a Bridge on GPU/CUDA or CUDA/C++.
C - PBRIDGE: the memory address of the C++ Bridge
C - MOMENTA: the input 4-momenta Fortran array
C - GS:      the input Gs (running QCD coupling constant alphas) Fortran array
C - RNDHEL:  the input random number Fortran array for helicity selection
C - RNDCOL:  the input random number Fortran array for color selection
C - MES:     the output matrix element Fortran array
C - SELHEL:  the output selected helicity Fortran array
C - SELCOL:  the output selected color Fortran array
      INTERFACE
         SUBROUTINE FBRIDGESEQUENCE_NOMULTICHANNEL(PBRIDGE, MOMENTA, GS, RNDHEL, RNDCOL, MES, SELHEL, SELCOL)
         INTEGER*8 PBRIDGE
         DOUBLE PRECISION MOMENTA(*)
         DOUBLE PRECISION GS(*)
         DOUBLE PRECISION RNDHEL(*)
         DOUBLE PRECISION RNDCOL(*)
         DOUBLE PRECISION MES(*)
         INTEGER*4 SELHEL(*)
         INTEGER*4 SELCOL(*)
         END SUBROUTINE FBRIDGESEQUENCE_NOMULTICHANNEL
      END INTERFACE

an example on how to use this interface is available e.g. in file SubProcesses/P1_[...]/fcheck_sa.f

Nomenclature

Acronym Info
madgraph4gpu The project to speed up the Madgraph5_aMC@NLO event generator package by offloading parts of the upstream project to compute accelerators provided in https://github.com/madgraph5/madgraph4gpu
mg5amcnlo The upstream Madgraph5_aMC@NLO event generator package, as provided in https://github.com/mg5amcnlo/mg5amcnlo