Skip to content

Commit

Permalink
[warps] last cleanup for now of divergence metrics - disable divergen…
Browse files Browse the repository at this point in the history
…ce test

Note that the throughput degradation from divergence is real and reproducible.
Without divergence, now around 6.4E8 - against 5.7E8 with divergence.
It is very difficult to correlate the percent degradation in throughput to the metrics however.
In summary, one should just aim at 100% uniform execution.

On itscrd70.cern.ch (V100S-PCIE-32GB):
=========================================================================
Process                     = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221]
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
EvtsPerSec[MatrixElems] (3) = ( 6.425099e+08                 )  sec^-1
MeanMatrixElemValue         = ( 1.371706e-02 +- 3.270315e-06 )  GeV^0
TOTAL       :     0.741551 sec
     2,589,547,187      cycles                    #    2.655 GHz
     3,537,039,425      instructions              #    1.37  insn per cycle
       1.044156654 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 120
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
                             : smsp__sass_branch_targets.sum                       53         2.89/usecond
                             : smsp__sass_branch_targets_threads_uniform.sum       53         2.89/usecond
                             : smsp__sass_branch_targets_threads_divergent.sum     0          0/second
                             : smsp__warps_launched.sum                            1
-------------------------------------------------------------------------
FP precision               = DOUBLE (nan=0)
EvtsPerSec[MatrixElems] (3)= ( 4.454874e+05                 )  sec^-1
MeanMatrixElemValue        = ( 5.532387e+01 +- 5.501866e+01 )  GeV^-4
TOTAL       :     0.602111 sec
     2,193,960,041      cycles                    #    2.654 GHz
     2,948,877,241      instructions              #    1.34  insn per cycle
       0.885704400 seconds time elapsed
==PROF== Profiling "sigmaKin": launch__registers_per_thread 255
==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%
                             : smsp__sass_branch_targets.sum                       17,683     1.52/usecond
                             : smsp__sass_branch_targets_threads_uniform.sum       17,683     1.52/usecond
                             : smsp__sass_branch_targets_threads_divergent.sum     0          0/second
                             : smsp__warps_launched.sum                            1
=========================================================================
  • Loading branch information
valassi committed May 13, 2021
1 parent b51bee6 commit aaa28b7
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
#include "CPPProcess.h"

// Test ncu metrics for CUDA thread divergence
//#undef MGONGPU_TEST_DIVERGENCE
#define MGONGPU_TEST_DIVERGENCE 1
#undef MGONGPU_TEST_DIVERGENCE
//#define MGONGPU_TEST_DIVERGENCE 1

//==========================================================================
// Class member functions for calculating the matrix elements for
Expand Down

0 comments on commit aaa28b7

Please sign in to comment.