Skip to content

OpenCL-Benchmark v1.8

Latest
Compare
Choose a tag to compare
@ProjectPhysX ProjectPhysX released this 01 Mar 08:19
· 1 commit to master since this release
  • INT8 benchmark will now measure dp4a throughput on all supported AMD/Intel/Nvidia GPUs
  • fixed compiling on macOS with new OpenCL headers
  • updated OpenCL-Wrapper
 |----------------.------------------------------------------------------------|
 | Device ID      | 2                                                          |
 | Device Name    | AMD Instinct MI210                                         |
 | Device Vendor  | Advanced Micro Devices, Inc.                               |
 | Device Driver  | 3625.0 (HSA1.1,LC) (Linux)                                 |
 | OpenCL Version | OpenCL C 2.0                                               |
 | Compute Units  | 104 at 1700 MHz (6656 cores, 22.630 TFLOPs/s)              |
 | Memory, Cache  | 65520 MB VRAM, 16 KB global / 64 KB local                  |
 | Buffer Limits  | 65520 MB global, 67092480 KB constant                      |
 |----------------'------------------------------------------------------------|
 | Info: OpenCL C code successfully compiled.                                  |
 | FP64  compute                                        17.681 TFLOPs/s (2/3 ) |
 | FP32  compute                                        20.007 TFLOPs/s ( 1x ) |
 | FP16  compute                                        39.594 TFLOPs/s ( 2x ) |
 | INT64 compute                                         1.515  TIOPs/s (1/16) |
 | INT32 compute                                         9.877  TIOPs/s (1/2 ) |
 | INT16 compute                                        19.532  TIOPs/s ( 1x ) |
-| INT8  compute                                        10.082  TIOPs/s (1/2 ) |
+| INT8  compute                                        36.307  TIOPs/s ( 2x ) |
 | Memory Bandwidth ( coalesced read      )                        993.82 GB/s |
 | Memory Bandwidth ( coalesced      write)                        999.76 GB/s |
 | Memory Bandwidth (misaligned read      )                       1325.91 GB/s |
 | Memory Bandwidth (misaligned      write)                        635.20 GB/s |
 | PCIe   Bandwidth (send                 )                         28.72 GB/s |
 | PCIe   Bandwidth (   receive           )                         28.51 GB/s |
 | PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   28.61 GB/s |
 |-----------------------------------------------------------------------------|
 |----------------.------------------------------------------------------------|
 | Device ID      | 0                                                          |
 | Device Name    | Intel(R) Arc(TM) B580 Graphics                             |
 | Device Vendor  | Intel(R) Corporation                                       |
 | Device Driver  | 32.0.101.6559 (Windows)                                    |
 | OpenCL Version | OpenCL C 3.0                                               |
 | Compute Units  | 160 at 2850 MHz (2560 cores, 14.592 TFLOPs/s)              |
 | Memory, Cache  | 12187 MB VRAM, 18432 KB global / 128 KB local              |
 | Buffer Limits  | 11944 MB global, 12230900 KB constant                      |
 |----------------'------------------------------------------------------------|
 | Info: OpenCL C code successfully compiled.                                  |
 | FP64  compute                                         0.896 TFLOPs/s (1/16) |
 | FP32  compute                                        14.249 TFLOPs/s ( 1x ) |
 | FP16  compute                                        26.547 TFLOPs/s ( 2x ) |
 | INT64 compute                                         0.636  TIOPs/s (1/24) |
 | INT32 compute                                         4.556  TIOPs/s (1/3 ) |
 | INT16 compute                                        37.082  TIOPs/s ( 2x ) |
-| INT8  compute                                        24.424  TIOPs/s ( 2x ) |
+| INT8  compute                                        48.668  TIOPs/s ( 4x ) |
 | Memory Bandwidth ( coalesced read      )                        574.09 GB/s |
 | Memory Bandwidth ( coalesced      write)                        468.07 GB/s |
 | Memory Bandwidth (misaligned read      )                        796.23 GB/s |
 | Memory Bandwidth (misaligned      write)                        383.15 GB/s |
 | PCIe   Bandwidth (send                 )                          4.99 GB/s |
 | PCIe   Bandwidth (   receive           )                          4.87 GB/s |
 | PCIe   Bandwidth (        bidirectional)            (Gen3 x16)    5.11 GB/s |
 |-----------------------------------------------------------------------------|
 |----------------.------------------------------------------------------------|
 | Device ID      | 0                                                          |
 | Device Name    | NVIDIA H100 80GB HBM3                                      |
 | Device Vendor  | NVIDIA Corporation                                         |
 | Device Driver  | 565.57.01 (Linux)                                          |
 | OpenCL Version | OpenCL C 1.2                                               |
 | Compute Units  | 132 at 1980 MHz (16896 cores, 66.908 TFLOPs/s)             |
 | Memory, Cache  | 81105 MB VRAM, 4224 KB global / 48 KB local                |
 | Buffer Limits  | 20276 MB global, 64 KB constant                            |
 |----------------'------------------------------------------------------------|
 | Info: OpenCL C code successfully compiled.                                  |
 | FP64  compute                                        31.184 TFLOPs/s (1/2 ) |
 | FP32  compute                                        62.908 TFLOPs/s ( 1x ) |
 | FP16  compute                                       123.749 TFLOPs/s ( 2x ) |
 | INT64 compute                                         3.227  TIOPs/s (1/24) |
 | INT32 compute                                        32.946  TIOPs/s (1/2 ) |
 | INT16 compute                                        30.901  TIOPs/s (1/2 ) |
-| INT8  compute                                        30.582  TIOPs/s (1/2 ) |
+| INT8  compute                                       103.204  TIOPs/s ( 2x ) |
 | Memory Bandwidth ( coalesced read      )                       3025.53 GB/s |
 | Memory Bandwidth ( coalesced      write)                       3055.98 GB/s |
 | Memory Bandwidth (misaligned read      )                       2102.44 GB/s |
 | Memory Bandwidth (misaligned      write)                        314.25 GB/s |
 | PCIe   Bandwidth (send                 )                         10.53 GB/s |
 | PCIe   Bandwidth (   receive           )                         11.47 GB/s |
 | PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   10.91 GB/s |
 |-----------------------------------------------------------------------------|