-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnvProfO.txt
5181 lines (5178 loc) · 726 KB
/
nvProfO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
==7567== NVPROF is profiling process 7567, command: ./MN
==7567== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
==7567== Profiling application: ./MN
Processing Done !!!
==7567== Profiling result:
==7567== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "NVIDIA Tegra X2 (0)"
Kernel: executeEleventhLayer_PSC(double*, double*, double*, double*, double*, double*, double*)
1 inst_per_warp Instructions per warp 1.0520e+03 1.0520e+03 1.0520e+03
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 98.00% 98.00% 98.00%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 97.91% 97.91% 97.91%
1 inst_replay_overhead Instruction Replay Overhead 0.000523 0.000523 0.000523
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 17.533643 17.533643 17.533643
1 gst_transactions_per_request Global Store Transactions Per Request 31.360000 31.360000 31.360000
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 57903104 57903104 57903104
1 gst_transactions Global Store Transactions 200704 200704 200704
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 0 0 0
1 l2_read_transactions L2 Read Transactions 47624148 47624148 47624148
1 l2_write_transactions L2 Write Transactions 200733 200733 200733
1 global_hit_rate Global Hit Rate in unified l1/tex 17.90% 17.90% 17.90%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 12.398GB/s 12.398GB/s 12.398GB/s
1 gst_requested_throughput Requested Global Store Throughput 48.038MB/s 48.038MB/s 48.038MB/s
1 gld_throughput Global Load Throughput 44.446GB/s 44.446GB/s 44.446GB/s
1 gst_throughput Global Store Throughput 192.15MB/s 192.15MB/s 192.15MB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 17.84% 17.84% 17.84%
1 tex_cache_throughput Unified Cache Throughput 24.206GB/s 24.206GB/s 24.206GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 44.446GB/s 44.446GB/s 44.446GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 192.15MB/s 192.15MB/s 192.15MB/s
1 l2_read_throughput L2 Throughput (Reads) 44.526GB/s 44.526GB/s 44.526GB/s
1 l2_write_throughput L2 Throughput (Writes) 192.18MB/s 192.18MB/s 192.18MB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 27.89% 27.89% 27.89%
1 gst_efficiency Global Memory Store Efficiency 25.00% 25.00% 25.00%
1 tex_cache_transactions Unified Cache Transactions 25890816 25890816 25890816
1 flop_count_dp Floating Point Operations(Double Precision) 105570304 105570304 105570304
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 200704 200704 200704
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 52584448 52584448 52584448
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 200704 200704 200704
1 flop_count_sp Floating Point Operations(Single Precision) 401408 401408 401408
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 200704 200704 200704
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 200704 200704 200704
1 inst_executed Instructions Executed 6732800 6732800 6732800
1 inst_issued Instructions Issued 6736323 6736323 6736323
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.34% 0.34% 0.34%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 1.62% 1.62% 1.62%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 13.97% 13.97% 13.97%
1 stall_texture Issue Stall Reasons (Texture) 40.95% 40.95% 40.95%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 42.96% 42.96% 42.96%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.00% 0.00% 0.00%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.00% 0.00% 0.00%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 802816 802816 802816
1 inst_fp_64 FP Instructions(Double) 53387264 53387264 53387264
1 inst_integer Integer Instructions 11440128 11440128 11440128
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 1204224 1204224 1204224
1 inst_compute_ld_st Load/Store Instructions 103763968 103763968 103763968
1 inst_misc Misc Instructions 40341504 40341504 40341504
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 6518236 6518236 6518236
1 cf_issued Issued Control-Flow Instructions 44800 44800 44800
1 cf_executed Executed Control-Flow Instructions 44800 44800 44800
1 ldst_issued Issued Load/Store Instructions 13014784 13014784 13014784
1 ldst_executed Executed Load/Store Instructions 3328000 3328000 3328000
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 47539200 47539200 47539200
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.15% 0.15% 0.15%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.00% 0.00% 0.00%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 200704 200704 200704
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 sm_efficiency Multiprocessor Activity 99.65% 99.65% 99.65%
1 achieved_occupancy Achieved Occupancy 0.706845 0.706845 0.706845
1 ipc Executed IPC 0.083024 0.083024 0.083024
1 issued_ipc Issued IPC 0.086215 0.086215 0.086215
1 issue_slot_utilization Issue Slot Utilization 2.09% 2.09% 2.09%
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.082221 0.082221 0.082221
1 tex_utilization Unified Cache Utilization Low (2) Low (2) Low (2)
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Low (1) Low (1) Low (1)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (3) Low (3) Low (3)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 16.34% 16.34% 16.34%
1 l2_utilization L2 Cache Utilization Low (3) Low (3) Low (3)
1 dram_utilization Device Memory Utilization Mid (4) Mid (4) Mid (4)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
Kernel: executeNinthLayer_PSC(double*, double*, double*, double*, double*, double*, double*)
1 inst_per_warp Instructions per warp 557.000000 557.000000 557.000000
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 98.00% 98.00% 98.00%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 97.82% 97.82% 97.82%
1 inst_replay_overhead Instruction Replay Overhead 0.000429 0.000429 0.000429
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 17.428923 17.428923 17.428923
1 gst_transactions_per_request Global Store Transactions Per Request 31.360000 31.360000 31.360000
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 29001728 29001728 29001728
1 gst_transactions Global Store Transactions 200704 200704 200704
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 0 0 0
1 l2_read_transactions L2 Read Transactions 23818338 23818338 23818338
1 l2_write_transactions L2 Write Transactions 200733 200733 200733
1 global_hit_rate Global Hit Rate in unified l1/tex 18.00% 18.00% 18.00%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 13.939GB/s 13.939GB/s 13.939GB/s
1 gst_requested_throughput Requested Global Store Throughput 107.96MB/s 107.96MB/s 107.96MB/s
1 gld_throughput Global Load Throughput 49.972GB/s 49.972GB/s 49.972GB/s
1 gst_throughput Global Store Throughput 431.84MB/s 431.84MB/s 431.84MB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 17.87% 17.87% 17.87%
1 tex_cache_throughput Unified Cache Throughput 27.412GB/s 27.412GB/s 27.412GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 49.972GB/s 49.972GB/s 49.972GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 431.84MB/s 431.84MB/s 431.84MB/s
1 l2_read_throughput L2 Throughput (Reads) 50.047GB/s 50.047GB/s 50.047GB/s
1 l2_write_throughput L2 Throughput (Writes) 431.91MB/s 431.91MB/s 431.91MB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 27.89% 27.89% 27.89%
1 gst_efficiency Global Memory Store Efficiency 25.00% 25.00% 25.00%
1 tex_cache_transactions Unified Cache Transactions 13045760 13045760 13045760
1 flop_count_dp Floating Point Operations(Double Precision) 54190080 54190080 54190080
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 200704 200704 200704
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 26894336 26894336 26894336
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 200704 200704 200704
1 flop_count_sp Floating Point Operations(Single Precision) 401408 401408 401408
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 200704 200704 200704
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 200704 200704 200704
1 inst_executed Instructions Executed 3564800 3564800 3564800
1 inst_issued Instructions Issued 3566330 3566330 3566330
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.37% 0.37% 0.37%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 2.05% 2.05% 2.05%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 12.70% 12.70% 12.70%
1 stall_texture Issue Stall Reasons (Texture) 40.00% 40.00% 40.00%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 44.53% 44.53% 44.53%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.02% 0.02% 0.02%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.01% 0.01% 0.01%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 802816 802816 802816
1 inst_fp_64 FP Instructions(Double) 27697152 27697152 27697152
1 inst_integer Integer Instructions 8228864 8228864 8228864
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 802816 802816 802816
1 inst_compute_ld_st Load/Store Instructions 52383744 52383744 52383744
1 inst_misc Misc Instructions 21676032 21676032 21676032
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 3444619 3444619 3444619
1 cf_issued Issued Control-Flow Instructions 32000 32000 32000
1 cf_executed Executed Control-Flow Instructions 32000 32000 32000
1 ldst_issued Issued Load/Store Instructions 6592256 6592256 6592256
1 ldst_executed Executed Load/Store Instructions 1689600 1689600 1689600
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 23782400 23782400 23782400
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.31% 0.31% 0.31%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.01% 0.01% 0.01%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 200704 200704 200704
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 sm_efficiency Multiprocessor Activity 99.98% 99.98% 99.98%
1 achieved_occupancy Achieved Occupancy 0.756255 0.756255 0.756255
1 ipc Executed IPC 0.114781 0.114781 0.114781
1 issued_ipc Issued IPC 0.114830 0.114830 0.114830
1 issue_slot_utilization Issue Slot Utilization 2.77% 2.77% 2.77%
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.114197 0.114197 0.114197
1 tex_utilization Unified Cache Utilization Low (3) Low (3) Low (3)
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Low (2) Low (2) Low (2)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (3) Low (3) Low (3)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.01% 0.01% 0.01%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 21.67% 21.67% 21.67%
1 l2_utilization L2 Cache Utilization Mid (4) Mid (4) Mid (4)
1 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
Kernel: executeTwentySevenLayer_PSC(double*, double*, double*, double*, double*, double*, double*)
1 inst_per_warp Instructions per warp 4.0100e+03 4.0100e+03 4.0100e+03
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 76.56% 76.56% 76.56%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 76.54% 76.54% 76.54%
1 inst_replay_overhead Instruction Replay Overhead 0.000101 0.000101 0.000101
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 10.611111 10.611111 10.611111
1 gst_transactions_per_request Global Store Transactions Per Request 18.683105 18.683105 18.683105
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 44593152 44593152 44593152
1 gst_transactions Global Store Transactions 38263 38263 38263
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 0 0 0
1 l2_read_transactions L2 Read Transactions 24938245 24938245 24938245
1 l2_write_transactions L2 Write Transactions 38306 38306 38306
1 global_hit_rate Global Hit Rate in unified l1/tex 44.14% 44.14% 44.14%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 35.109GB/s 35.109GB/s 35.109GB/s
1 gst_requested_throughput Requested Global Store Throughput 33.727MB/s 33.727MB/s 33.727MB/s
1 gld_throughput Global Load Throughput 65.411GB/s 65.411GB/s 65.411GB/s
1 gst_throughput Global Store Throughput 102.88MB/s 102.88MB/s 102.88MB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 44.11% 44.11% 44.11%
1 tex_cache_throughput Unified Cache Throughput 77.241GB/s 77.241GB/s 77.241GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 32.773GB/s 32.773GB/s 32.773GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 102.88MB/s 102.88MB/s 102.88MB/s
1 l2_read_throughput L2 Throughput (Reads) 65.480GB/s 65.480GB/s 65.480GB/s
1 l2_write_throughput L2 Throughput (Writes) 102.99MB/s 102.99MB/s 102.99MB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 53.67% 53.67% 53.67%
1 gst_efficiency Global Memory Store Efficiency 32.78% 32.78% 32.78%
1 tex_cache_transactions Unified Cache Transactions 29417472 29417472 29417472
1 flop_count_dp Floating Point Operations(Double Precision) 103462912 103462912 103462912
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 50176 50176 50176
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 51681280 51681280 51681280
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 50176 50176 50176
1 flop_count_sp Floating Point Operations(Single Precision) 100352 100352 100352
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 50176 50176 50176
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 50176 50176 50176
1 inst_executed Instructions Executed 8212480 8212480 8212480
1 inst_issued Instructions Issued 8213302 8213302 8213302
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.64% 0.64% 0.64%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 4.47% 4.47% 4.47%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 22.58% 22.58% 22.58%
1 stall_texture Issue Stall Reasons (Texture) 31.00% 31.00% 31.00%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 41.18% 41.18% 41.18%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.00% 0.00% 0.00%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.00% 0.00% 0.00%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 200704 200704 200704
1 inst_fp_64 FP Instructions(Double) 51881984 51881984 51881984
1 inst_integer Integer Instructions 7526400 7526400 7526400
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 903168 903168 903168
1 inst_compute_ld_st Load/Store Instructions 103011328 103011328 103011328
1 inst_misc Misc Instructions 37632000 37632000 37632000
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 8006313 8006313 8006313
1 cf_issued Issued Control-Flow Instructions 38912 38912 38912
1 cf_executed Executed Control-Flow Instructions 38912 38912 38912
1 ldst_issued Issued Load/Store Instructions 14728192 14728192 14728192
1 ldst_executed Executed Load/Store Instructions 4210688 4210688 4210688
1 atomic_transactions Atomic Transactions 126 126 126
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 12481536 12481536 12481536
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.11% 0.11% 0.11%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.02% 0.02% 0.02%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 38263 38263 38263
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 sm_efficiency Multiprocessor Activity 99.33% 99.33% 99.33%
1 achieved_occupancy Achieved Occupancy 0.977826 0.977826 0.977826
1 ipc Executed IPC 0.326965 0.326965 0.326965
1 issued_ipc Issued IPC 0.328033 0.328033 0.328033
1 issue_slot_utilization Issue Slot Utilization 7.99% 7.99% 7.99%
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.330948 0.330948 0.330948
1 tex_utilization Unified Cache Utilization Mid (6) Mid (6) Mid (6)
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Low (3) Low (3) Low (3)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization High (7) High (7) High (7)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.00% 0.00% 0.00%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 51.15% 51.15% 51.15%
1 l2_utilization L2 Cache Utilization Mid (5) Mid (5) Mid (5)
1 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
Kernel: executeTwentySixLayer_DSC(double*, double*, double*, double*, double*, double*, double*)
1 inst_per_warp Instructions per warp 99.000000 99.000000 99.000000
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 76.56% 76.56% 76.56%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 76.56% 76.56% 76.56%
1 inst_replay_overhead Instruction Replay Overhead 0.002451 0.002451 0.002451
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 9.329545 9.329545 9.329545
1 gst_transactions_per_request Global Store Transactions Per Request 18.671875 18.671875 18.671875
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 420352 420352 420352
1 gst_transactions Global Store Transactions 38240 38240 38240
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 0 0 0
1 l2_read_transactions L2 Read Transactions 236853 236853 236853
1 l2_write_transactions L2 Write Transactions 38281 38281 38281
1 global_hit_rate Global Hit Rate in unified l1/tex 43.79% 43.79% 43.79%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 7.4663GB/s 7.4663GB/s 7.4663GB/s
1 gst_requested_throughput Requested Global Store Throughput 802.21MB/s 802.21MB/s 802.21MB/s
1 gld_throughput Global Load Throughput 14.757GB/s 14.757GB/s 14.757GB/s
1 gst_throughput Global Store Throughput 2.3882GB/s 2.3882GB/s 2.3882GB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 41.66% 41.66% 41.66%
1 tex_cache_throughput Unified Cache Throughput 19.697GB/s 19.697GB/s 19.697GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 7.3819GB/s 7.3819GB/s 7.3819GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 2.3882GB/s 2.3882GB/s 2.3882GB/s
1 l2_read_throughput L2 Throughput (Reads) 14.792GB/s 14.792GB/s 14.792GB/s
1 l2_write_throughput L2 Throughput (Writes) 2.3907GB/s 2.3907GB/s 2.3907GB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 50.60% 50.60% 50.60%
1 gst_efficiency Global Memory Store Efficiency 32.80% 32.80% 32.80%
1 tex_cache_transactions Unified Cache Transactions 315392 315392 315392
1 flop_count_dp Floating Point Operations(Double Precision) 1605632 1605632 1605632
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 200704 200704 200704
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 602112 602112 602112
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 200704 200704 200704
1 flop_count_sp Floating Point Operations(Single Precision) 100352 100352 100352
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 50176 50176 50176
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 50176 50176 50176
1 inst_executed Instructions Executed 202752 202752 202752
1 inst_issued Instructions Issued 203249 203249 203249
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.69% 0.69% 0.69%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 48.21% 48.21% 48.21%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 15.29% 15.29% 15.29%
1 stall_texture Issue Stall Reasons (Texture) 3.85% 3.85% 3.85%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 30.62% 30.62% 30.62%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.27% 0.27% 0.27%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.06% 0.06% 0.06%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 200704 200704 200704
1 inst_fp_64 FP Instructions(Double) 1103872 1103872 1103872
1 inst_integer Integer Instructions 1555456 1555456 1555456
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 150528 150528 150528
1 inst_compute_ld_st Load/Store Instructions 1154048 1154048 1154048
1 inst_misc Misc Instructions 802816 802816 802816
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 184817 184817 184817
1 cf_issued Issued Control-Flow Instructions 6144 6144 6144
1 cf_executed Executed Control-Flow Instructions 6144 6144 6144
1 ldst_issued Issued Load/Store Instructions 177152 177152 177152
1 ldst_executed Executed Load/Store Instructions 53248 53248 53248
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 118201 118201 118201
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 0.94% 0.94% 0.94%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.07% 0.07% 0.07%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 38240 38240 38240
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 sm_efficiency Multiprocessor Activity 99.20% 99.20% 99.20%
1 achieved_occupancy Achieved Occupancy 0.872485 0.872485 0.872485
1 ipc Executed IPC 0.507669 0.507669 0.507669
1 issued_ipc Issued IPC 0.519509 0.519509 0.519509
1 issue_slot_utilization Issue Slot Utilization 11.81% 11.81% 11.81%
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.511245 0.511245 0.511245
1 tex_utilization Unified Cache Utilization Mid (5) Mid (5) Mid (5)
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Low (3) Low (3) Low (3)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization High (9) High (9) High (9)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.10% 0.10% 0.10%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 51.30% 51.30% 51.30%
1 l2_utilization L2 Cache Utilization Mid (4) Mid (4) Mid (4)
1 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
Kernel: executeSecondLayer_DSC_partA(double*, double*, double*, double*, double*, double*, double*)
1 inst_per_warp Instructions per warp 108.000000 108.000000 108.000000
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 100.00% 100.00% 100.00%
1 inst_replay_overhead Instruction Replay Overhead 0.000448 0.000448 0.000448
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 15.454545 15.454545 15.454545
1 gst_transactions_per_request Global Store Transactions Per Request 32.000000 32.000000 32.000000
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 3133440 3133440 3133440
1 gst_transactions Global Store Transactions 294912 294912 294912
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 0 0 0
1 l2_read_transactions L2 Read Transactions 2780398 2780398 2780398
1 l2_write_transactions L2 Write Transactions 294941 294941 294941
1 global_hit_rate Global Hit Rate in unified l1/tex 11.47% 11.47% 11.47%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 3.8564GB/s 3.8564GB/s 3.8564GB/s
1 gst_requested_throughput Requested Global Store Throughput 419.82MB/s 419.82MB/s 419.82MB/s
1 gld_throughput Global Load Throughput 15.426GB/s 15.426GB/s 15.426GB/s
1 gst_throughput Global Store Throughput 1.6399GB/s 1.6399GB/s 1.6399GB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 10.48% 10.48% 10.48%
1 tex_cache_throughput Unified Cache Throughput 9.0196GB/s 9.0196GB/s 9.0196GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 15.426GB/s 15.426GB/s 15.426GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 1.6399GB/s 1.6399GB/s 1.6399GB/s
1 l2_read_throughput L2 Throughput (Reads) 15.461GB/s 15.461GB/s 15.461GB/s
1 l2_write_throughput L2 Throughput (Writes) 1.6401GB/s 1.6401GB/s 1.6401GB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 25.00% 25.00% 25.00%
1 gst_efficiency Global Memory Store Efficiency 25.00% 25.00% 25.00%
1 tex_cache_transactions Unified Cache Transactions 1622016 1622016 1622016
1 flop_count_dp Floating Point Operations(Double Precision) 9437184 9437184 9437184
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 1179648 1179648 1179648
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 3538944 3538944 3538944
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 1179648 1179648 1179648
1 flop_count_sp Floating Point Operations(Single Precision) 589824 589824 589824
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 294912 294912 294912
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 294912 294912 294912
1 inst_executed Instructions Executed 995328 995328 995328
1 inst_issued Instructions Issued 995774 995774 995774
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.70% 0.70% 0.70%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 8.24% 8.24% 8.24%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 32.96% 32.96% 32.96%
1 stall_texture Issue Stall Reasons (Texture) 16.75% 16.75% 16.75%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 28.99% 28.99% 28.99%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.03% 0.03% 0.03%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.06% 0.06% 0.06%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 1179648 1179648 1179648
1 inst_fp_64 FP Instructions(Double) 6488064 6488064 6488064
1 inst_integer Integer Instructions 11206656 11206656 11206656
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 884736 884736 884736
1 inst_compute_ld_st Load/Store Instructions 6782976 6782976 6782976
1 inst_misc Misc Instructions 5308416 5308416 5308416
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 885174 885174 885174
1 cf_issued Issued Control-Flow Instructions 27648 27648 27648
1 cf_executed Executed Control-Flow Instructions 27648 27648 27648
1 ldst_issued Issued Load/Store Instructions 930816 930816 930816
1 ldst_executed Executed Load/Store Instructions 258048 258048 258048
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 2774016 2774016 2774016
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 12.23% 12.23% 12.23%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.05% 0.05% 0.05%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 294912 294912 294912
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 sm_efficiency Multiprocessor Activity 99.79% 99.79% 99.79%
1 achieved_occupancy Achieved Occupancy 0.851153 0.851153 0.851153
1 ipc Executed IPC 0.279364 0.279364 0.279364
1 issued_ipc Issued IPC 0.279489 0.279489 0.279489
1 issue_slot_utilization Issue Slot Utilization 6.21% 6.21% 6.21%
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.283960 0.283960 0.283960
1 tex_utilization Unified Cache Utilization Low (3) Low (3) Low (3)
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Low (2) Low (2) Low (2)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization Mid (5) Mid (5) Mid (5)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.07% 0.07% 0.07%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 33.67% 33.67% 33.67%
1 l2_utilization L2 Cache Utilization Mid (5) Mid (5) Mid (5)
1 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
Kernel: executeSecondLayer_DSC_partB(double*, double*, double*, double*, double*, double*, double*)
1 inst_per_warp Instructions per warp 104.000000 104.000000 104.000000
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 100.00% 100.00% 100.00%
1 inst_replay_overhead Instruction Replay Overhead 0.001701 0.001701 0.001701
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 15.454545 15.454545 15.454545
1 gst_transactions_per_request Global Store Transactions Per Request 32.000000 32.000000 32.000000
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 609280 609280 609280
1 gst_transactions Global Store Transactions 57344 57344 57344
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 0 0 0
1 l2_read_transactions L2 Read Transactions 325327 325327 325327
1 l2_write_transactions L2 Write Transactions 57373 57373 57373
1 global_hit_rate Global Hit Rate in unified l1/tex 46.76% 46.76% 46.76%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 3.8115GB/s 3.8115GB/s 3.8115GB/s
1 gst_requested_throughput Requested Global Store Throughput 414.94MB/s 414.94MB/s 414.94MB/s
1 gld_throughput Global Load Throughput 9.1680GB/s 9.1680GB/s 9.1680GB/s
1 gst_throughput Global Store Throughput 1.6209GB/s 1.6209GB/s 1.6209GB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 42.74% 42.74% 42.74%
1 tex_cache_throughput Unified Cache Throughput 8.9147GB/s 8.9147GB/s 8.9147GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 9.1680GB/s 9.1680GB/s 9.1680GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 1.6209GB/s 1.6209GB/s 1.6209GB/s
1 l2_read_throughput L2 Throughput (Reads) 9.1955GB/s 9.1955GB/s 9.1955GB/s
1 l2_write_throughput L2 Throughput (Writes) 1.6217GB/s 1.6217GB/s 1.6217GB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 41.57% 41.57% 41.57%
1 gst_efficiency Global Memory Store Efficiency 25.00% 25.00% 25.00%
1 tex_cache_transactions Unified Cache Transactions 315392 315392 315392
1 flop_count_dp Floating Point Operations(Double Precision) 1835008 1835008 1835008
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 229376 229376 229376
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 688128 688128 688128
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 229376 229376 229376
1 flop_count_sp Floating Point Operations(Single Precision) 114688 114688 114688
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 57344 57344 57344
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 57344 57344 57344
1 inst_executed Instructions Executed 186368 186368 186368
1 inst_issued Instructions Issued 186685 186685 186685
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.93% 0.93% 0.93%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 33.11% 33.11% 33.11%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 19.13% 19.13% 19.13%
1 stall_texture Issue Stall Reasons (Texture) 11.09% 11.09% 11.09%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 29.28% 29.28% 29.28%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.19% 0.19% 0.19%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.06% 0.06% 0.06%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 229376 229376 229376
1 inst_fp_64 FP Instructions(Double) 1261568 1261568 1261568
1 inst_integer Integer Instructions 2064384 2064384 2064384
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 172032 172032 172032
1 inst_compute_ld_st Load/Store Instructions 1318912 1318912 1318912
1 inst_misc Misc Instructions 917504 917504 917504
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 170554 170554 170554
1 cf_issued Issued Control-Flow Instructions 5376 5376 5376
1 cf_executed Executed Control-Flow Instructions 5376 5376 5376
1 ldst_issued Issued Load/Store Instructions 179200 179200 179200
1 ldst_executed Executed Load/Store Instructions 48384 48384 48384
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 324352 324352 324352
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 6.16% 6.16% 6.16%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.05% 0.05% 0.05%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 57344 57344 57344
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 sm_efficiency Multiprocessor Activity 99.04% 99.04% 99.04%
1 achieved_occupancy Achieved Occupancy 0.885565 0.885565 0.885565
1 ipc Executed IPC 0.373646 0.373646 0.373646
1 issued_ipc Issued IPC 0.383282 0.383282 0.383282
1 issue_slot_utilization Issue Slot Utilization 8.75% 8.75% 8.75%
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.381838 0.381838 0.381838
1 tex_utilization Unified Cache Utilization Mid (4) Mid (4) Mid (4)
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Low (2) Low (2) Low (2)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization High (7) High (7) High (7)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.09% 0.09% 0.09%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 46.10% 46.10% 46.10%
1 l2_utilization L2 Cache Utilization Mid (4) Mid (4) Mid (4)
1 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
Kernel: executeThirdLayer_PSC_partA(double*, double*, double*, double*, double*, double*, double*)
1 inst_per_warp Instructions per warp 187.000000 187.000000 187.000000
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 100.00% 100.00% 100.00%
1 inst_replay_overhead Instruction Replay Overhead 0.000273 0.000273 0.000273
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 17.176471 17.176471 17.176471
1 gst_transactions_per_request Global Store Transactions Per Request 32.000000 32.000000 32.000000
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 21528576 21528576 21528576
1 gst_transactions Global Store Transactions 589824 589824 589824
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 0 0 0
1 l2_read_transactions L2 Read Transactions 19569730 19569730 19569730
1 l2_write_transactions L2 Write Transactions 589853 589853 589853
1 global_hit_rate Global Hit Rate in unified l1/tex 9.25% 9.25% 9.25%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 9.3452GB/s 9.3452GB/s 9.3452GB/s
1 gst_requested_throughput Requested Global Store Throughput 288.89MB/s 288.89MB/s 288.89MB/s
1 gld_throughput Global Load Throughput 37.381GB/s 37.381GB/s 37.381GB/s
1 gst_throughput Global Store Throughput 1.1285GB/s 1.1285GB/s 1.1285GB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 9.00% 9.00% 9.00%
1 tex_cache_throughput Unified Cache Throughput 19.184GB/s 19.184GB/s 19.184GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 37.381GB/s 37.381GB/s 37.381GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 1.1285GB/s 1.1285GB/s 1.1285GB/s
1 l2_read_throughput L2 Throughput (Reads) 37.442GB/s 37.442GB/s 37.442GB/s
1 l2_write_throughput L2 Throughput (Writes) 1.1285GB/s 1.1285GB/s 1.1285GB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 25.00% 25.00% 25.00%
1 gst_efficiency Global Memory Store Efficiency 25.00% 25.00% 25.00%
1 tex_cache_transactions Unified Cache Transactions 10027008 10027008 10027008
1 flop_count_dp Floating Point Operations(Double Precision) 46006272 46006272 46006272
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 589824 589824 589824
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 22413312 22413312 22413312
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 589824 589824 589824
1 flop_count_sp Floating Point Operations(Single Precision) 1179648 1179648 1179648
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 589824 589824 589824
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 589824 589824 589824
1 inst_executed Instructions Executed 3446784 3446784 3446784
1 inst_issued Instructions Issued 3447741 3447741 3447741
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.40% 0.40% 0.40%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 2.36% 2.36% 2.36%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 19.06% 19.06% 19.06%
1 stall_texture Issue Stall Reasons (Texture) 29.25% 29.25% 29.25%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 44.88% 44.88% 44.88%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.01% 0.01% 0.01%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.02% 0.02% 0.02%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 2359296 2359296 2359296
1 inst_fp_64 FP Instructions(Double) 24772608 24772608 24772608
1 inst_integer Integer Instructions 20054016 20054016 20054016
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 1769472 1769472 1769472
1 inst_compute_ld_st Load/Store Instructions 40697856 40697856 40697856
1 inst_misc Misc Instructions 20643840 20643840 20643840
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 3318715 3318715 3318715
1 cf_issued Issued Control-Flow Instructions 55296 55296 55296
1 cf_executed Executed Control-Flow Instructions 55296 55296 55296
1 ldst_issued Issued Load/Store Instructions 5253120 5253120 5253120
1 ldst_executed Executed Load/Store Instructions 1363968 1363968 1363968
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 19537920 19537920 19537920
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 4.02% 4.02% 4.02%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.02% 0.02% 0.02%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 589824 589824 589824
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 sm_efficiency Multiprocessor Activity 99.81% 99.81% 99.81%
1 achieved_occupancy Achieved Occupancy 0.913249 0.913249 0.913249
1 ipc Executed IPC 0.161893 0.161893 0.161893
1 issued_ipc Issued IPC 0.161094 0.161094 0.161094
1 issue_slot_utilization Issue Slot Utilization 3.88% 3.88% 3.88%
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.163963 0.163963 0.163963
1 tex_utilization Unified Cache Utilization Low (3) Low (3) Low (3)
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Low (2) Low (2) Low (2)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization Low (3) Low (3) Low (3)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.02% 0.02% 0.02%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 26.84% 26.84% 26.84%
1 l2_utilization L2 Cache Utilization Mid (5) Mid (5) Mid (5)
1 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
Kernel: executeSecondLayer_DSC_partC(double*, double*, double*, double*, double*, double*, double*)
1 inst_per_warp Instructions per warp 101.000000 101.000000 101.000000
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 100.00% 100.00% 100.00%
1 inst_replay_overhead Instruction Replay Overhead 0.001702 0.001702 0.001702
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 15.454545 15.454545 15.454545
1 gst_transactions_per_request Global Store Transactions Per Request 32.000000 32.000000 32.000000
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 522240 522240 522240
1 gst_transactions Global Store Transactions 49152 49152 49152
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 0 0 0
1 l2_read_transactions L2 Read Transactions 278783 278783 278783
1 l2_write_transactions L2 Write Transactions 49181 49181 49181
1 global_hit_rate Global Hit Rate in unified l1/tex 46.76% 46.76% 46.76%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 2.3160GB/s 2.3160GB/s 2.3160GB/s
1 gst_requested_throughput Requested Global Store Throughput 252.13MB/s 252.13MB/s 252.13MB/s
1 gld_throughput Global Load Throughput 5.5706GB/s 5.5706GB/s 5.5706GB/s
1 gst_throughput Global Store Throughput 0.9849GB/s 0.9849GB/s 0.9849GB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 42.74% 42.74% 42.74%
1 tex_cache_throughput Unified Cache Throughput 5.4168GB/s 5.4168GB/s 5.4168GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 5.5706GB/s 5.5706GB/s 5.5706GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 0.9849GB/s 0.9849GB/s 0.9849GB/s
1 l2_read_throughput L2 Throughput (Reads) 5.5860GB/s 5.5860GB/s 5.5860GB/s
1 l2_write_throughput L2 Throughput (Writes) 0.9854GB/s 0.9854GB/s 0.9854GB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 41.57% 41.57% 41.57%
1 gst_efficiency Global Memory Store Efficiency 25.00% 25.00% 25.00%
1 tex_cache_transactions Unified Cache Transactions 270336 270336 270336
1 flop_count_dp Floating Point Operations(Double Precision) 1572864 1572864 1572864
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 196608 196608 196608
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 589824 589824 589824
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 196608 196608 196608
1 flop_count_sp Floating Point Operations(Single Precision) 98304 98304 98304
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 49152 49152 49152
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 49152 49152 49152
1 inst_executed Instructions Executed 155136 155136 155136
1 inst_issued Instructions Issued 155400 155400 155400
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.75% 0.75% 0.75%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 31.16% 31.16% 31.16%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 19.01% 19.01% 19.01%
1 stall_texture Issue Stall Reasons (Texture) 11.67% 11.67% 11.67%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 28.91% 28.91% 28.91%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 1.87% 1.87% 1.87%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.07% 0.07% 0.07%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 196608 196608 196608
1 inst_fp_64 FP Instructions(Double) 1081344 1081344 1081344
1 inst_integer Integer Instructions 1622016 1622016 1622016
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 147456 147456 147456
1 inst_compute_ld_st Load/Store Instructions 1130496 1130496 1130496
1 inst_misc Misc Instructions 786432 786432 786432
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 135432 135432 135432
1 cf_issued Issued Control-Flow Instructions 4608 4608 4608
1 cf_executed Executed Control-Flow Instructions 4608 4608 4608
1 ldst_issued Issued Load/Store Instructions 153600 153600 153600
1 ldst_executed Executed Load/Store Instructions 41472 41472 41472
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 278016 278016 278016
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 6.53% 6.53% 6.53%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.04% 0.04% 0.04%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 49152 49152 49152
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 sm_efficiency Multiprocessor Activity 99.00% 99.00% 99.00%
1 achieved_occupancy Achieved Occupancy 0.862104 0.862104 0.862104
1 ipc Executed IPC 0.361807 0.361807 0.361807
1 issued_ipc Issued IPC 0.362423 0.362423 0.362423
1 issue_slot_utilization Issue Slot Utilization 7.90% 7.90% 7.90%
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.345142 0.345142 0.345142
1 tex_utilization Unified Cache Utilization Mid (4) Mid (4) Mid (4)
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Low (2) Low (2) Low (2)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization High (7) High (7) High (7)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.09% 0.09% 0.09%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 45.65% 45.65% 45.65%
1 l2_utilization L2 Cache Utilization Mid (4) Mid (4) Mid (4)
1 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1)
1 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)
Kernel: executeThirdLayer_PSC_partB(double*, double*, double*, double*, double*, double*, double*)
1 inst_per_warp Instructions per warp 183.000000 183.000000 183.000000
1 branch_efficiency Branch Efficiency 100.00% 100.00% 100.00%
1 warp_execution_efficiency Warp Execution Efficiency 100.00% 100.00% 100.00%
1 warp_nonpred_execution_efficiency Warp Non-Predicated Execution Efficiency 100.00% 100.00% 100.00%
1 inst_replay_overhead Instruction Replay Overhead 0.000473 0.000473 0.000473
1 shared_load_transactions_per_request Shared Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 shared_store_transactions_per_request Shared Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 local_load_transactions_per_request Local Memory Load Transactions Per Request 0.000000 0.000000 0.000000
1 local_store_transactions_per_request Local Memory Store Transactions Per Request 0.000000 0.000000 0.000000
1 gld_transactions_per_request Global Load Transactions Per Request 17.176471 17.176471 17.176471
1 gst_transactions_per_request Global Store Transactions Per Request 32.000000 32.000000 32.000000
1 shared_store_transactions Shared Store Transactions 0 0 0
1 shared_load_transactions Shared Load Transactions 0 0 0
1 local_load_transactions Local Load Transactions 0 0 0
1 local_store_transactions Local Store Transactions 0 0 0
1 gld_transactions Global Load Transactions 4186112 4186112 4186112
1 gst_transactions Global Store Transactions 114688 114688 114688
1 sysmem_read_transactions System Memory Read Transactions 0 0 0
1 sysmem_write_transactions System Memory Write Transactions 0 0 0
1 l2_read_transactions L2 Read Transactions 1966743 1966743 1966743
1 l2_write_transactions L2 Write Transactions 114717 114717 114717
1 global_hit_rate Global Hit Rate in unified l1/tex 53.08% 53.08% 53.08%
1 local_hit_rate Local Hit Rate 0.00% 0.00% 0.00%
1 gld_requested_throughput Requested Global Load Throughput 7.5144GB/s 7.5144GB/s 7.5144GB/s
1 gst_requested_throughput Requested Global Store Throughput 232.29MB/s 232.29MB/s 232.29MB/s
1 gld_throughput Global Load Throughput 15.539GB/s 15.539GB/s 15.539GB/s
1 gst_throughput Global Store Throughput 929.18MB/s 929.18MB/s 929.18MB/s
1 local_memory_overhead Local Memory Overhead 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 51.67% 51.67% 51.67%
1 tex_cache_throughput Unified Cache Throughput 15.426GB/s 15.426GB/s 15.426GB/s
1 l2_tex_read_throughput L2 Throughput (Texture Reads) 15.539GB/s 15.539GB/s 15.539GB/s
1 l2_tex_write_throughput L2 Throughput (Texture Writes) 929.18MB/s 929.18MB/s 929.18MB/s
1 l2_read_throughput L2 Throughput (Reads) 15.561GB/s 15.561GB/s 15.561GB/s
1 l2_write_throughput L2 Throughput (Writes) 929.41MB/s 929.41MB/s 929.41MB/s
1 sysmem_read_throughput System Memory Read Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 sysmem_write_throughput System Memory Write Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_load_throughput Local Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 local_store_throughput Local Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_load_throughput Shared Memory Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 shared_store_throughput Shared Memory Store Throughput 0.00000B/s 0.00000B/s 0.00000B/s
1 gld_efficiency Global Memory Load Efficiency 48.36% 48.36% 48.36%
1 gst_efficiency Global Memory Store Efficiency 25.00% 25.00% 25.00%
1 tex_cache_transactions Unified Cache Transactions 1949696 1949696 1949696
1 flop_count_dp Floating Point Operations(Double Precision) 8945664 8945664 8945664
1 flop_count_dp_add Floating Point Operations(Double Precision Add) 114688 114688 114688
1 flop_count_dp_fma Floating Point Operations(Double Precision FMA) 4358144 4358144 4358144
1 flop_count_dp_mul Floating Point Operations(Double Precision Mul) 114688 114688 114688
1 flop_count_sp Floating Point Operations(Single Precision) 229376 229376 229376
1 flop_count_sp_add Floating Point Operations(Single Precision Add) 0 0 0
1 flop_count_sp_fma Floating Point Operations(Single Precision FMA) 114688 114688 114688
1 flop_count_sp_mul Floating Point Operation(Single Precision Mul) 0 0 0
1 flop_count_sp_special Floating Point Operations(Single Precision Special) 114688 114688 114688
1 inst_executed Instructions Executed 655872 655872 655872
1 inst_issued Instructions Issued 656182 656182 656182
1 stall_inst_fetch Issue Stall Reasons (Instructions Fetch) 0.34% 0.34% 0.34%
1 stall_exec_dependency Issue Stall Reasons (Execution Dependency) 4.30% 4.30% 4.30%
1 stall_memory_dependency Issue Stall Reasons (Data Request) 19.64% 19.64% 19.64%
1 stall_texture Issue Stall Reasons (Texture) 34.00% 34.00% 34.00%
1 stall_sync Issue Stall Reasons (Synchronization) 0.00% 0.00% 0.00%
1 stall_other Issue Stall Reasons (Other) 36.91% 36.91% 36.91%
1 stall_constant_memory_dependency Issue Stall Reasons (Immediate constant) 0.05% 0.05% 0.05%
1 stall_pipe_busy Issue Stall Reasons (Pipe Busy) 0.02% 0.02% 0.02%
1 shared_efficiency Shared Memory Efficiency 0.00% 0.00% 0.00%
1 inst_fp_32 FP Instructions(Single) 458752 458752 458752
1 inst_fp_64 FP Instructions(Double) 4816896 4816896 4816896
1 inst_integer Integer Instructions 3670016 3670016 3670016
1 inst_bit_convert Bit-Convert Instructions 0 0 0
1 inst_control Control-Flow Instructions 344064 344064 344064
1 inst_compute_ld_st Load/Store Instructions 7913472 7913472 7913472
1 inst_misc Misc Instructions 3784704 3784704 3784704
1 inst_inter_thread_communication Inter-Thread Instructions 0 0 0
1 issue_slots Issue Slots 616756 616756 616756
1 cf_issued Issued Control-Flow Instructions 10752 10752 10752
1 cf_executed Executed Control-Flow Instructions 10752 10752 10752
1 ldst_issued Issued Load/Store Instructions 1017856 1017856 1017856
1 ldst_executed Executed Load/Store Instructions 261632 261632 261632
1 atomic_transactions Atomic Transactions 0 0 0
1 atomic_transactions_per_request Atomic Transactions Per Request 0.000000 0.000000 0.000000
1 l2_atomic_throughput L2 Throughput (Atomic requests) 0.00000B/s 0.00000B/s 0.00000B/s
1 l2_atomic_transactions L2 Transactions (Atomic requests) 0 0 0
1 l2_tex_read_transactions L2 Transactions (Texture Reads) 1964032 1964032 1964032
1 stall_memory_throttle Issue Stall Reasons (Memory Throttle) 4.72% 4.72% 4.72%
1 stall_not_selected Issue Stall Reasons (Not Selected) 0.01% 0.01% 0.01%
1 l2_tex_write_transactions L2 Transactions (Texture Writes) 114688 114688 114688
1 flop_count_hp Floating Point Operations(Half Precision) 0 0 0
1 flop_count_hp_add Floating Point Operations(Half Precision Add) 0 0 0
1 flop_count_hp_mul Floating Point Operation(Half Precision Mul) 0 0 0
1 flop_count_hp_fma Floating Point Operations(Half Precision FMA) 0 0 0
1 inst_fp_16 HP Instructions(Half) 0 0 0
1 sm_efficiency Multiprocessor Activity 99.86% 99.86% 99.86%
1 achieved_occupancy Achieved Occupancy 0.910355 0.910355 0.910355
1 ipc Executed IPC 0.258633 0.258633 0.258633
1 issued_ipc Issued IPC 0.258756 0.258756 0.258756
1 issue_slot_utilization Issue Slot Utilization 6.08% 6.08% 6.08%
1 eligible_warps_per_cycle Eligible Warps Per Active Cycle 0.263448 0.263448 0.263448
1 tex_utilization Unified Cache Utilization Mid (5) Mid (5) Mid (5)
1 shared_utilization Shared Memory Utilization Idle (0) Idle (0) Idle (0)
1 ldst_fu_utilization Load/Store Function Unit Utilization Low (1) Low (1) Low (1)
1 cf_fu_utilization Control-Flow Function Unit Utilization Low (1) Low (1) Low (1)
1 special_fu_utilization Special Function Unit Utilization Low (1) Low (1) Low (1)
1 tex_fu_utilization Texture Function Unit Utilization Low (3) Low (3) Low (3)
1 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (1) Low (1) Low (1)
1 double_precision_fu_utilization Double-Precision Function Unit Utilization Mid (5) Mid (5) Mid (5)
1 flop_hp_efficiency FLOP Efficiency(Peak Half) 0.00% 0.00% 0.00%
1 flop_sp_efficiency FLOP Efficiency(Peak Single) 0.03% 0.03% 0.03%
1 flop_dp_efficiency FLOP Efficiency(Peak Double) 41.53% 41.53% 41.53%
1 l2_utilization L2 Cache Utilization Mid (4) Mid (4) Mid (4)
1 dram_utilization Device Memory Utilization Low (1) Low (1) Low (1)