Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[warps] add also per-second divergence metrics and add a few comments
Note that I tried the 'stalled_barrier' metrics an dthey do not seem interesting On itscrd70.cern.ch (V100S-PCIE-32GB): ========================================================================= Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = DOUBLE (NaN/abnormal=0, zero=0) EvtsPerSec[MatrixElems] (3) = ( 5.711994e+08 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 0.745683 sec 2,603,540,638 cycles # 2.655 GHz 3,537,849,260 instructions # 1.36 insn per cycle 1.049477458 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 128 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 96.33% : smsp__sass_branch_targets.sum 109 4.18/usecond : smsp__sass_branch_targets_threads_uniform.sum 105 4.03/usecond : smsp__sass_branch_targets_threads_divergent.sum 4 153.37/msecond : smsp__warps_launched.sum 1 =========================================================================
- Loading branch information