-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathnvprof_cudnn.txt
397 lines (374 loc) · 39.1 KB
/
nvprof_cudnn.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
WARNING: python and any of its children processes will be profiled.
Collecting data...
[09:01:45] ../src/relay/transforms/to_mixed_precision.cc:429: Warning: Op "layout_transform" not registered FTVMMixedPrecisionConversionType appears 485 times in graph.
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
18.6129 18.5385 19.5102 18.3957 0.2232
/home/masa/projects/dev/tvm/python/tvm/driver/build_module.py:235: UserWarning: Specifying name with IRModule input is useless
warnings.warn("Specifying name with IRModule input is useless")
Processing events...
Saving temporary "/tmp/nsys-report-5790-ca04-4a59-1065.qdstrm" file to disk...
Creating final output files...
Processing [0% ]Processing [1% ]Processing [0% ]Processing [=6% ]Processing [====11% ]Processing [===8% ]Processing [==7% ]Processing [=6% ]Processing [=5% ]Processing [4% ]Processing [3% ]Processing [4% ]Processing [3% ]Processing [4% ]Processing [=5% ]Processing [4% ]Processing [=5% ]Processing [=6% ]Processing [=5% ]Processing [=6% ]Processing [=5% ]Processing [===8% ]Processing [==7% ]Processing [=6% ]Processing [===9% ]Processing [===8% ]Processing [==7% ]Processing [===9% ]Processing [===8% ]Processing [===9% ]Processing [===8% ]Processing [===9% ]Processing [===8% ]Processing [===9% ]Processing [===8% ]Processing [===10% ]Processing [===9% ]Processing [===10% ]Processing [===9% ]Processing [===10% ]Processing [===9% ]Processing [===10% ]Processing [====11% ]Processing [====12% ]Processing [=====13% ]Processing [======14% ]Processing [======15% ]Processing [=======16% ]Processing [========17% ]Processing [========18% ]Processing [=========19% ]Processing [==========20% ]Processing [==========21% ]Processing [===========22% ]Processing [============23% ]Processing [============24% ]Processing [=============25% ]Processing [==============26% ]Processing [==============27% ]Processing [===============28% ]Processing [================29% ]Processing [================30% ]Processing [=================31% ]Processing [==================32% ]Processing [==================33% ]Processing [===================34% ]Processing [====================35% ]Processing [====================36% ]Processing [=====================37% ]Processing [======================38% ]Processing [======================39% ]Processing [=======================40% ]Processing [========================41% ]Processing [========================42% ]Processing [=========================43% ]Processing [==========================44% ]Processing [==========================45% ]Processing [===========================46% ]Processing [============================47% ]Processing [============================48% ]Processing [=============================49% ]Processing [==============================50% ]Processing [==============================51% ]Processing [===============================52% ]Processing [===============================53% ]Processing [================================54% ]Processing [=================================55% ]Processing [=================================56% ]Processing [==================================57% ]Processing [===================================58% ]Processing [===================================59% ]Processing [====================================60% ]Processing [=====================================61% ]Processing [=====================================62% ]Processing [======================================63% ]Processing [=======================================64% ]Processing [=======================================65% ]Processing [========================================66% ]Processing [=========================================67% ]Processing [=========================================68% ]Processing [==========================================69% ]Processing [===========================================70% ]Processing [===========================================71% ]Processing [============================================72% ]Processing [=============================================73% ]Processing [=============================================74% ]Processing [==============================================75% ]Processing [==============================================================100%]
Saved report file to "/tmp/nsys-report-5790-ca04-4a59-1065.qdrep"
Exporting 186723 events: [1% ]Exporting 186723 events: [2% ]Exporting 186723 events: [3% ]Exporting 186723 events: [4% ]Exporting 186723 events: [5% ]Exporting 186723 events: [=6% ]Exporting 186723 events: [=7% ]Exporting 186723 events: [==8% ]Exporting 186723 events: [==9% ]Exporting 186723 events: [==10% ]Exporting 186723 events: [==11% ]Exporting 186723 events: [===12% ]Exporting 186723 events: [===13% ]Exporting 186723 events: [====14% ]Exporting 186723 events: [====15% ]Exporting 186723 events: [=====16% ]Exporting 186723 events: [=====17% ]Exporting 186723 events: [======18% ]Exporting 186723 events: [======19% ]Exporting 186723 events: [=======20% ]Exporting 186723 events: [=======21% ]Exporting 186723 events: [========22% ]Exporting 186723 events: [========23% ]Exporting 186723 events: [=========24% ]Exporting 186723 events: [==========25% ]Exporting 186723 events: [==========26% ]Exporting 186723 events: [===========27% ]Exporting 186723 events: [===========28% ]Exporting 186723 events: [============29% ]Exporting 186723 events: [============30% ]Exporting 186723 events: [=============31% ]Exporting 186723 events: [=============32% ]Exporting 186723 events: [==============33% ]Exporting 186723 events: [==============34% ]Exporting 186723 events: [===============35% ]Exporting 186723 events: [===============36% ]Exporting 186723 events: [================37% ]Exporting 186723 events: [================38% ]Exporting 186723 events: [=================39% ]Exporting 186723 events: [=================40% ]Exporting 186723 events: [==================41% ]Exporting 186723 events: [==================42% ]Exporting 186723 events: [===================43% ]Exporting 186723 events: [===================44% ]Exporting 186723 events: [====================45% ]Exporting 186723 events: [====================46% ]Exporting 186723 events: [=====================47% ]Exporting 186723 events: [=====================48% ]Exporting 186723 events: [======================49% ]Exporting 186723 events: [=======================50% ]Exporting 186723 events: [=======================51% ]Exporting 186723 events: [========================52% ]Exporting 186723 events: [========================53% ]Exporting 186723 events: [=========================54% ]Exporting 186723 events: [=========================55% ]Exporting 186723 events: [==========================56% ]Exporting 186723 events: [==========================57% ]Exporting 186723 events: [===========================58% ]Exporting 186723 events: [===========================59% ]Exporting 186723 events: [============================60% ]Exporting 186723 events: [============================61% ]Exporting 186723 events: [=============================62% ]Exporting 186723 events: [=============================63% ]Exporting 186723 events: [==============================64% ]Exporting 186723 events: [==============================65% ]Exporting 186723 events: [===============================66% ]Exporting 186723 events: [===============================67% ]Exporting 186723 events: [================================68% ]Exporting 186723 events: [================================69% ]Exporting 186723 events: [=================================70% ]Exporting 186723 events: [=================================71% ]Exporting 186723 events: [==================================72% ]Exporting 186723 events: [==================================73% ]Exporting 186723 events: [===================================74% ]Exporting 186723 events: [====================================75% ]Exporting 186723 events: [====================================76% ]Exporting 186723 events: [=====================================77% ]Exporting 186723 events: [=====================================78% ]Exporting 186723 events: [======================================79% ]Exporting 186723 events: [======================================80% ]Exporting 186723 events: [=======================================81% ]Exporting 186723 events: [=======================================82% ]Exporting 186723 events: [========================================83% ]Exporting 186723 events: [========================================84% ]Exporting 186723 events: [=========================================85% ]Exporting 186723 events: [=========================================86% ]Exporting 186723 events: [==========================================87% ]Exporting 186723 events: [==========================================88% ]Exporting 186723 events: [===========================================89% ]Exporting 186723 events: [===========================================90% ]Exporting 186723 events: [============================================91% ]Exporting 186723 events: [============================================92% ]Exporting 186723 events: [=============================================93% ]Exporting 186723 events: [=============================================94% ]Exporting 186723 events: [==============================================95% ]Exporting 186723 events: [==============================================96% ]Exporting 186723 events: [===============================================97% ]Exporting 186723 events: [===============================================98% ]Exporting 186723 events: [================================================99% ]Exporting 186723 events: [================================================100%]
Exported successfully to
/tmp/nsys-report-5790-ca04-4a59-1065.sqlite
CUDA API Statistics:
Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name
------- --------------- --------- ------------ ------- ----------- ----------------------------
46.4 555,474,965 8 69,434,370.6 891 555,465,757 cudaStreamCreateWithFlags
18.6 222,180,419 524 424,008.4 1,372 5,756,843 cuModuleUnload
11.6 138,586,418 31,926 4,340.9 2,231 24,311 cudaLaunchKernel
10.7 128,575,554 55,080 2,334.3 1,746 15,708 cuLaunchKernel
7.9 94,290,073 662 142,432.1 6,617 89,607,705 cudaMemGetInfo
2.0 23,909,387 3,676 6,504.2 2,073 25,124 cudaMemsetAsync
1.2 14,117,098 651 21,685.3 3,258 389,879 cudaMemcpy
0.6 7,231,560 203 35,623.4 591 84,594 cudaStreamSynchronize
0.5 6,521,183 665 9,806.3 287 328,261 cudaFree
0.4 4,750,625 669 7,101.1 1,354 195,875 cudaMalloc
0.1 615,883 1 615,883.0 615,883 615,883 cuModuleLoadData
0.0 462,199 1 462,199.0 462,199 462,199 cudaHostAlloc
0.0 106,615 746 142.9 71 3,806 cuGetProcAddress
0.0 105,485 4 26,371.3 877 102,520 cudaStreamCreateWithPriority
0.0 9,816 30 327.2 253 1,092 cudaEventCreateWithFlags
0.0 2,903 1 2,903.0 2,903 2,903 cudaEventRecord
0.0 2,405 2 1,202.5 990 1,415 cuInit
CUDA Kernel Statistics:
Time(%) Total Time (ns) Instances Average Minimum Maximum Name
------- --------------- --------- -------- ------- ------- ----------------------------------------------------------------------------------------------------
6.3 76,006,493 3,264 23,286.3 19,968 31,936 void tensorTransformGeneric<__half, __half, float, true, false, false, (cudnnKernelDataType_t)0>(cu…
6.0 73,064,433 6,324 11,553.5 3,872 25,217 sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x32x64_stage5_warpsize2x2x1…
5.5 66,552,303 2,652 25,095.1 6,208 80,641 void cutlass_cudnn::Kernel<cutlass_tensorop_h16816fprop_optimized_64x128_32x6>(cutlass_tensorop_h16…
5.4 65,954,499 4,080 16,165.3 13,440 42,113 void implicit_convolve_hhgemm<__half, 0, 6, 6, 5, 4, 4, false, 1, true>(int, int, int, __half const…
5.1 61,626,211 4,284 14,385.2 9,569 23,392 void conv2d_c1_k1_nhwc_kernel<__half, __half, __half, __half, float, 3, 1, true, false>(float, cudn…
4.5 54,222,746 2,652 20,446.0 9,600 43,393 sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x2x1…
4.1 49,415,362 1,734 28,497.9 22,400 56,832 void cutlass_cudnn::Kernel<cutlass_tensorop_h16816fprop_optimized_256x128_32x3>(cutlass_tensorop_h1…
3.8 46,410,695 510 91,001.4 32,513 113,090 sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x1x…
3.6 43,982,117 1,428 30,799.8 30,017 35,520 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_14_kernel0
3.6 43,527,810 1,326 32,826.4 32,256 40,288 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_15_kernel0
3.0 36,703,325 408 89,959.1 89,057 94,881 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_2_kernel0
3.0 36,068,081 1,020 35,360.9 14,400 43,521 sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x1x…
2.5 29,944,486 1,326 22,582.6 21,888 27,489 tvmgen_default_fused_nn_pad_15_kernel0
2.3 27,890,533 408 68,359.1 57,729 78,752 void xmma_cudnn::gemm::kernel<xmma_cudnn::implicit_gemm::fprop_indexed::Kernel_traits<xmma_cudnn::A…
2.2 26,269,374 1,326 19,811.0 19,136 26,145 tvmgen_default_fused_cast_mean_3_kernel0
2.0 24,556,881 1,836 13,375.2 12,832 16,800 tvmgen_default_fused_cast_mean_5_kernel0
2.0 24,356,865 1,326 18,368.7 17,472 22,944 tvmgen_default_fused_multiply_nn_pad_3_kernel0
1.8 22,222,950 1,836 12,104.0 11,808 15,712 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_19_kernel0
1.8 21,849,117 1,836 11,900.4 11,648 14,560 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_18_kernel0
1.7 20,089,928 408 49,240.0 48,065 53,633 tvmgen_default_fused_nn_pad_4_kernel0
1.6 19,443,543 1,122 17,329.4 14,080 23,361 sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2…
1.5 18,297,448 1,836 9,965.9 9,536 17,409 tvmgen_default_fused_multiply_nn_pad_5_kernel0
1.5 17,902,958 1,836 9,751.1 9,376 13,792 tvmgen_default_fused_nn_pad_18_kernel0
1.3 15,848,449 408 38,844.2 38,272 41,120 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_4_kernel0
1.3 15,164,565 306 49,557.4 48,832 52,193 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_add_kernel0
0.9 11,462,765 408 28,095.0 27,648 30,240 tvmgen_default_fused_cast_mean_6_kernel0
0.9 10,928,221 612 17,856.6 17,088 19,520 tvmgen_default_fused_cast_mean_1_kernel0
0.9 10,529,495 612 17,205.1 16,704 18,752 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_9_kernel0
0.8 10,114,672 408 24,790.9 24,000 29,280 tvmgen_default_fused_nn_conv2d_add_add_kernel0
0.8 10,059,421 408 24,655.4 24,288 29,409 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_22_kernel0
0.8 9,746,611 408 23,888.8 23,520 25,280 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_21_kernel0
0.8 9,517,739 612 15,551.9 15,169 18,913 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_8_kernel0
0.8 9,272,291 3,264 2,840.8 2,656 7,840 void cudnn::ops::nhwcToNchwKernel<__half, __half, float, true, false, (cudnnKernelDataType_t)0>(cud…
0.8 9,216,176 1,326 6,950.4 6,624 9,344 tvmgen_default_fused_nn_conv2d_add_add_3_kernel0
0.7 8,272,317 306 27,033.7 26,272 28,256 tvmgen_default_fused_nn_pad_kernel0
0.7 8,260,906 408 20,247.3 19,744 20,896 tvmgen_default_fused_nn_pad_8_kernel0
0.6 7,437,980 612 12,153.6 11,776 13,249 tvmgen_default_fused_nn_pad_12_kernel0
0.6 7,291,674 1,734 4,205.1 3,808 12,896 tvmgen_default_fused_nn_conv2d_add_add_4_kernel0
0.5 6,287,749 408 15,411.1 14,624 16,480 tvmgen_default_fused_nn_pad_20_kernel0
0.5 5,561,678 1,428 3,894.7 3,617 4,416 tvmgen_default_fused_nn_pad_14_kernel0
0.4 5,260,699 408 12,893.9 12,576 13,664 tvmgen_default_fused_nn_pad_3_kernel0
0.4 5,187,188 1,836 2,825.3 2,688 3,328 tvmgen_default_fused_nn_pad_17_kernel0
0.4 4,722,932 102 46,303.3 45,824 49,121 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_kernel0
0.4 4,720,607 102 46,280.5 45,761 48,896 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_1_kernel0
0.4 4,562,802 408 11,183.3 10,849 11,776 tvmgen_default_fused_multiply_nn_pad_6_kernel0
0.4 4,394,994 102 43,088.2 42,432 45,057 tvmgen_default_fused_nn_dense_kernel0
0.4 4,277,909 612 6,990.0 6,112 11,680 tvmgen_default_fused_multiply_nn_pad_1_kernel0
0.4 4,265,094 1,836 2,323.0 2,176 7,136 tvmgen_default_fused_nn_conv2d_add_sigmoid_4_kernel0
0.4 4,250,462 408 10,417.8 10,176 11,008 tvmgen_default_fused_nn_conv2d_add_add_1_kernel0
0.3 4,079,595 1,836 2,222.0 2,047 2,592 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_20_kernel0
0.3 3,956,758 1,836 2,155.1 2,016 4,960 tvmgen_default_fused_cast_4_kernel0
0.3 3,910,418 102 38,337.4 37,921 40,321 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_5_kernel0
0.3 3,882,757 612 6,344.4 6,048 7,168 tvmgen_default_fused_nn_conv2d_add_add_2_kernel0
0.3 3,669,832 1,836 1,998.8 1,856 6,784 tvmgen_default_fused_cast_mean_5_kernel1
0.3 3,364,511 1,326 2,537.3 2,144 6,880 tvmgen_default_fused_cast_mean_3_kernel1
0.3 3,287,695 1,428 2,302.3 2,112 13,664 tvmgen_default_fused_nn_conv2d_add_sigmoid_3_kernel0
0.3 3,172,771 1,428 2,221.8 2,048 2,624 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_16_kernel0
0.3 3,088,638 102 30,280.8 29,888 31,968 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_12_kernel0
0.3 3,080,874 1,428 2,157.5 1,983 6,496 tvmgen_default_fused_cast_3_kernel0
0.3 3,043,330 408 7,459.1 7,168 8,096 tvmgen_default_fused_nn_pad_7_kernel0
0.2 2,962,623 204 14,522.7 2,560 31,776 void nhwcAddPaddingKernel<__half, __half, float, true, (cudnnKernelDataType_t)0>(int, int, int, int…
0.2 2,726,914 102 26,734.5 26,336 28,769 tvmgen_default_fused_cast_mean_2_kernel0
0.2 2,725,347 102 26,719.1 26,400 28,385 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_11_kernel0
0.2 2,550,841 102 25,008.2 24,480 25,984 tvmgen_default_fused_nn_pad_1_kernel0
0.2 2,536,286 714 3,552.2 3,328 12,000 tvmgen_default_fused_nn_pad_11_kernel0
0.2 2,502,976 102 24,539.0 23,969 25,056 tvmgen_default_fused_nn_pad_2_kernel0
0.2 2,476,064 204 12,137.6 10,176 13,728 void conv2d_c1_k1_nhwc_kernel<__half, __half, __half, __half, float, 3, 2, true, false>(float, cudn…
0.2 2,466,269 408 6,044.8 5,792 6,464 tvmgen_default_fused_nn_conv2d_add_add_5_kernel0
0.2 2,235,291 102 21,914.6 21,440 22,656 tvmgen_default_fused_nn_pad_10_kernel0
0.2 2,159,772 102 21,174.2 20,800 22,497 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_3_kernel0
0.2 2,045,215 102 20,051.1 19,776 21,408 tvmgen_default_fused_cast_transpose_nn_pad_layout_transform_kernel0
0.2 2,013,364 102 19,738.9 19,424 20,960 tvmgen_default_fused_nn_conv2d_add_kernel0
0.2 1,836,796 102 18,007.8 17,344 23,360 tvmgen_default_fused_nn_pad_16_kernel0
0.1 1,798,102 102 17,628.5 16,929 18,336 tvmgen_default_fused_nn_pad_13_kernel0
0.1 1,767,585 510 3,465.9 3,135 5,664 tvmgen_default_fused_nn_pad_19_kernel0
0.1 1,416,407 612 2,314.4 2,080 2,753 tvmgen_default_fused_nn_conv2d_add_sigmoid_1_kernel0
0.1 1,395,471 102 13,681.1 13,216 14,336 tvmgen_default_fused_nn_pad_5_kernel0
0.1 1,370,796 612 2,239.9 2,080 14,560 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_10_kernel0
0.1 1,320,370 612 2,157.5 1,952 6,913 tvmgen_default_fused_cast_1_kernel0
0.1 1,277,396 612 2,087.2 1,824 6,016 tvmgen_default_fused_cast_mean_1_kernel1
0.1 1,255,404 102 12,307.9 12,032 12,768 tvmgen_default_fused_multiply_nn_pad_2_kernel0
0.1 1,227,445 102 12,033.8 11,840 12,896 tvmgen_default_fused_cast_mean_7_kernel0
0.1 1,058,702 102 10,379.4 9,953 10,976 tvmgen_default_fused_nn_pad_6_kernel0
0.1 1,007,086 102 9,873.4 9,632 10,656 tvmgen_default_fused_cast_mean_kernel0
0.1 969,420 408 2,376.0 2,144 2,976 tvmgen_default_fused_nn_conv2d_add_sigmoid_5_kernel0
0.1 948,524 408 2,324.8 2,176 5,792 tvmgen_default_fused_cast_mean_6_kernel1
0.1 939,759 408 2,303.3 2,208 2,656 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_23_kernel0
0.1 937,219 102 9,188.4 8,961 9,920 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_6_kernel0
0.1 916,295 408 2,245.8 2,015 2,623 tvmgen_default_fused_cast_5_kernel0
0.1 904,300 102 8,865.7 8,640 9,568 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_24_kernel0
0.1 896,908 102 8,793.2 8,544 9,408 tvmgen_default_fused_cast_mean_4_kernel0
0.1 849,642 102 8,329.8 8,064 12,032 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_17_kernel0
0.1 781,196 102 7,658.8 7,456 8,256 tvmgen_default_fused_nn_conv2d_add_1_kernel0
0.1 663,080 204 3,250.4 2,304 4,352 void cudnn::ops::nchwToNhwcKernel<__half, __half, float, false, true, (cudnnKernelDataType_t)0>(cud…
0.1 640,643 102 6,280.8 5,984 6,848 tvmgen_default_fused_multiply_nn_pad_4_kernel0
0.0 545,152 102 5,344.6 5,184 5,920 tvmgen_default_fused_nn_conv2d_add_3_kernel0
0.0 501,482 102 4,916.5 4,608 5,280 tvmgen_default_fused_nn_pad_9_kernel0
0.0 497,343 102 4,875.9 4,672 5,600 tvmgen_default_fused_multiply_nn_pad_kernel0
0.0 479,049 102 4,696.6 4,480 5,057 tvmgen_default_fused_nn_conv2d_add_2_kernel0
0.0 451,911 102 4,430.5 4,224 6,752 tvmgen_default_fused_transpose_kernel0
0.0 445,448 102 4,367.1 4,256 4,800 tvmgen_default_fused_nn_conv2d_add_5_kernel0
0.0 348,420 102 3,415.9 3,232 3,776 tvmgen_default_fused_nn_conv2d_add_4_kernel0
0.0 256,894 102 2,518.6 2,144 2,848 tvmgen_default_fused_cast_mean_2_kernel1
0.0 243,136 102 2,383.7 2,208 2,560 tvmgen_default_fused_nn_conv2d_add_sigmoid_2_kernel0
0.0 234,850 102 2,302.5 2,176 2,624 tvmgen_default_fused_strided_slice_add_kernel0
0.0 233,830 102 2,292.5 2,144 2,529 tvmgen_default_fused_nn_conv2d_add_sigmoid_kernel0
0.0 226,946 102 2,225.0 2,144 2,496 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_7_kernel0
0.0 223,680 102 2,192.9 2,048 2,431 tvmgen_default_fused_nn_conv2d_add_sigmoid_multiply_13_kernel0
0.0 221,189 102 2,168.5 1,920 2,880 tvmgen_default_fused_cast_mean_7_kernel1
0.0 220,838 102 2,165.1 1,856 2,560 tvmgen_default_fused_cast_mean_4_kernel1
0.0 218,017 102 2,137.4 1,983 2,464 tvmgen_default_fused_cast_kernel0
0.0 216,101 102 2,118.6 1,983 2,432 tvmgen_default_fused_cast_2_kernel0
0.0 209,764 102 2,056.5 1,984 2,432 tvmgen_default_fused_cast_6_kernel0
0.0 203,264 102 1,992.8 1,856 2,432 tvmgen_default_fused_cast_mean_kernel1
CUDA Memory Operation Statistics (by time):
Time(%) Total Time (ns) Operations Average Minimum Maximum Operation
------- --------------- ---------- ------- ------- ------- ------------------
58.4 8,628,491 3,676 2,347.3 736 16,960 [CUDA memset]
41.6 6,145,728 650 9,455.0 767 338,660 [CUDA memcpy HtoD]
0.0 2,464 1 2,464.0 2,464 2,464 [CUDA memcpy DtoH]
CUDA Memory Operation Statistics (by size in KiB):
Total Operations Average Minimum Maximum Operation
------------- ---------- ------- ------- --------- ------------------
15.625 1 15.625 15.625 15.625 [CUDA memcpy DtoH]
1,312,047.750 3,676 356.923 0.375 6,328.125 [CUDA memset]
110,220.049 650 169.569 0.002 4,704.000 [CUDA memcpy HtoD]
Report file moved to "/home/masa/projects/dev/tvm-cutlass-eval/effnetv2/report4.qdrep"
Report file moved to "/home/masa/projects/dev/tvm-cutlass-eval/effnetv2/report4.sqlite"