[warps] last cleanup for now of divergence metrics - disable divergen… · madgraph5/madgraph4gpu@aaa28b7

Commit

[warps] last cleanup for now of divergence metrics - disable divergen…

…ce test

Note that the throughput degradation from divergence is real and reproducible.
Without divergence, now around 6.4E8 - against 5.7E8 with divergence.
It is very difficult to correlate the percent degradation in throughput to the metrics however.
In summary, one should just aim at 100% uniform execution.

On itscrd70.cern.ch (V100S-PCIE-32GB):
=========================================================================
Process                     = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 6.425099e+08                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     0.741551 sec
     2,589,547,187      cycles                    #    2.655 GHz
     3,537,039,425      instructions              #    1.37  insn per cycle
       1.044156654 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 120
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
                             : smsp__sass_branch_targets.sum                       53         2.89/usecond
                             : smsp__sass_branch_targets_threads_uniform.sum       53         2.89/usecond
                             : smsp__sass_branch_targets_threads_divergent.sum     0          0/second
                             : smsp__warps_launched.sum                            1
-------------------------------------------------------------------------
FP precision               = DOUBLE (nan=0)
EvtsPerSec[MatrixElems] (3)= ( 4.454874e+05                 )  sec^-1
MeanMatrixElemValue        = ( 5.532387e+01 +- 5.501866e+01 )  GeV^-4
TOTAL       :     0.602111 sec
     2,193,960,041      cycles                    #    2.654 GHz
     2,948,877,241      instructions              #    1.34  insn per cycle
       0.885704400 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 255
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
                             : smsp__sass_branch_targets.sum                       17,683     1.52/usecond
                             : smsp__sass_branch_targets_threads_uniform.sum       17,683     1.52/usecond
                             : smsp__sass_branch_targets_threads_divergent.sum     0          0/second
                             : smsp__warps_launched.sum                            1
=========================================================================

Loading branch information

valassi committed May 13, 2021

1 parent b51bee6 commit aaa28b7

epoch1/cuda/ee_mumu/SubProcesses/P1_Sigma_sm_epem_mupmum/CPPProcess.cc

-Original file line number
+Diff line change
@@ Expand Up / @@ -17,8 +17,8 @@ @@
     #include "CPPProcess.h"
     // Test ncu metrics for CUDA thread divergence
-    //#undef MGONGPU_TEST_DIVERGENCE
-    #define MGONGPU_TEST_DIVERGENCE 1
+    #undef MGONGPU_TEST_DIVERGENCE
+    //#define MGONGPU_TEST_DIVERGENCE 1
     //==========================================================================
     // Class member functions for calculating the matrix elements for
@@ Expand Down @@

0 comments on commit `aaa28b7`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `aaa28b7`

Commit

There are no files selected for viewing

0 comments on commit aaa28b7

0 comments on commit `aaa28b7`