Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use pinned CPU memory. I get a factor 1.5 better throughput!
time ./check.exe -p 2048 256 12 *********************************** NumIterations = 12 NumThreadsPerBlock = 256 NumBlocksPerGrid = 2048 ----------------------------------- NumberOfEntries = 12 TotalTimeInWaveFuncs = 2.784078e-02 sec MeanTimeInWaveFuncs = 2.320065e-03 sec StdDevTimeInWaveFuncs = 1.567047e-05 sec MinTimeInWaveFuncs = 2.310220e-03 sec MaxTimeInWaveFuncs = 2.370682e-03 sec ----------------------------------- ProcessID: = 23402 NProcesses = 1 NumMatrixElements = 6291456 MatrixElementsPerSec = 2.259799e+08 sec^-1 *********************************** NumMatrixElements = 6291456 MeanMatrixElemValue = 1.371745e-02 GeV^0 StdErrMatrixElemValue = 3.268633e-06 GeV^0 StdDevMatrixElemValue = 8.198638e-03 GeV^0 MinMatrixElemValue = 6.071582e-03 GeV^0 MaxMatrixElemValue = 3.374925e-02 GeV^0 real 0m4.633s user 0m4.055s sys 0m0.562s
- Loading branch information