Skip to content

Commit

Permalink
Docs: H200 benchmarks
Browse files Browse the repository at this point in the history
  • Loading branch information
ashvardanian committed Jan 31, 2025
1 parent fad513b commit 4b877b9
Showing 1 changed file with 13 additions and 2 deletions.
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ All the library dependencies, including GTest, GBench, Intel oneTBB, FMT, and Th
You are expected to build this on an x86 machine with CUDA drivers installed.

```sh
cmake -B build_release
cmake --build build_release --config Release
cmake -B build_release -D CMAKE_BUILD_TYPE=Release # Generate the build files
cmake --build build_release --config Release # Build the project
build_release/reduce_bench # Run all benchmarks
build_release/reduce_bench --benchmark_filter="cuda" # Only CUDA-related
PARALLEL_REDUCTIONS_LENGTH=1024 build_release/reduce_bench # Set a different input size
Expand Down Expand Up @@ -136,6 +136,17 @@ Observations:
- 2.2 TB/s using vanilla CUDA approaches.
- 3 TB/s using CUB.

On Nvidia H200 GPUs, the numbers are even higher:

```sh
-------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
cuda/cub/min_time:10.000/real_time 254609 ns 254607 ns 54992 bytes/s=4.21723T/s error,%=0
cuda/thrust/min_time:10.000/real_time 319709 ns 316368 ns 43846 bytes/s=3.3585T/s error,%=0
cuda/thrust/interleaving/min_time:10.000/real_time 318598 ns 314996 ns 43956 bytes/s=3.37021T/s error,%=0
```

### AWS Zen4 `m7a.metal-48xl`

On AWS Zen4 `m7a.metal-48xl` instances with GCC 12, one may expect the following results:
Expand Down

0 comments on commit 4b877b9

Please sign in to comment.