Add tensilelite client performance args to hipblaslt-bench #1544
+867
−77
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
HIPBLASLT_BENCH_PERF=1
to print performance related args and frequencies.HIPBLASLT_BENCH_PERF_ALL=1
to print multi-XCD frequencies along with performance args.HIPBLASLT_BENCH_PERF=1 ./hipblaslt-bench -f matmul -m 16 -n 16 -k 4096 --transA T --transB N --a_type bf16_r --b_type bf16_r --c_type bf16_r --d_type bf16_r --activation_type none --compute_type f32_r
Output:- transA,transB,grouped_gemm,batch_count,m,n,k,alpha,lda,stride_a,beta,ldb,stride_b,ldc,stride_c,ldd,stride_d,a_type,b_type,c_type,d_type,compute_type,scaleA,scaleB,scaleC,scaleD,amaxD,activation_type,bias_vector,bias_type,Total Granularity,Tiles Per-CU,Tile-0 Granularity,Tile-1 Granularity,CU granularity,Wave granularity,#CU's,mem-read-bytes,mem-write-bytes,lowest-avg-freq,lowest-median-freq,avg-MCLK,median-MCLK,hipblaslt-Gflops,hipblaslt-GB/s,us T,N,0,1,16,16,4096,1,4096,65536,0,4096,65536,16,256,16,256,bf16_r,bf16_r,bf16_r,bf16_r,f32_r,0,0,0,0,0,none,0,bf16_r,0.0175439,0.0701754,1,1,0.0701754,0.25,228,262144,1536,0,0,1100,1100,123.362,14.3893,17