Releases · ashvardanian/ParallelReductionsBenchmark · GitHub

31 Jan 19:48

ashvardanian

v0.3.4: Nvidia H200 benchmarks Latest

Latest

Release: v0.3.4 [skip ci]

Patch

Docs: H200 benchmarks (4b877b9)
Fix: Missing blasint type (fad513b)

Assets 2

24 Jan 15:01

ashvardanian

Release v0.3.3

Release: v0.3.3 [skip ci]

Patch

Fix: Measuring AVX-512 throughput (a301c39)

Assets 2

19 Jan 23:05

ashvardanian

Release v0.3.2

Release: v0.3.2 [skip ci]

Patch

Docs: Need CUDA Toolkit (6ca84ed)
Improve: Disambiguate buffer length (86a9794)
Fix: dataset_t move-constructor (434cad7)
Fix: memset entire dataset (e70d6a8)
Fix: Pattern-matching version in CMake (458342d)

Assets 2

19 Jan 19:31

ashvardanian

Release v0.3.1

Release: v0.3.1 [skip ci]

Patch

Docs: List notable features (c340f4c)

Assets 2

19 Jan 19:23

ashvardanian

Release v0.3.0

Release: v0.3.0 [skip ci]

Minor

Add: BLAS with zero-stride (c39768e)
Add: Latency Hiding in AVX-512 (9a5a61a)

Patch

Fix: fmt::print thousand separator (6250d0f)
Fix: Detecting NUMA (8da88f6)
Improve: Uniform benchmark naming (567d2e8)
Improve: Scaling CUDA kernels (e57eff0)
Make: Find libnuma (d0f74e4)
Improve: Huge Pages and code style (aeabdce)
Improve: setzero over set1 intrinsics (1fbcd94)
Fix: Handling SSE tail (2731049)
Make: Load STL symbols in GDB (f91941c)
Fix: Calling unrolled AVX-512 variant (b5d4070)
Fix: Misaligned non-temporal loads to ZMM (eed4f57)

Assets 2

18 Jan 10:23

ashvardanian

Release v0.2.0

Release: v0.2.0 [skip ci]

Minor

Add: Metal draft (db0dc92)

Patch

Docs: File headers (e1ac216)
Make: Switch to .cuh extension for CUDA (37c0581)
Make: Drop deprecated assets (f311a61)
Fix: Missing fmt include for OpenCL (0c0bc0f)
Fix: Feature-testing __cpp_lib_execution (0c80b36)
Make: Switch to .cuh extension for CUDA (818851f)

Assets 2