Skip to content

Releases: ashvardanian/ParallelReductionsBenchmark

v0.3.4: Nvidia H200 benchmarks

31 Jan 19:48
Compare
Choose a tag to compare

Release: v0.3.4 [skip ci]

Patch

Release v0.3.3

24 Jan 15:01
Compare
Choose a tag to compare

Release: v0.3.3 [skip ci]

Patch

  • Fix: Measuring AVX-512 throughput (a301c39)

Release v0.3.2

19 Jan 23:05
Compare
Choose a tag to compare

Release: v0.3.2 [skip ci]

Patch

  • Docs: Need CUDA Toolkit (6ca84ed)
  • Improve: Disambiguate buffer length (86a9794)
  • Fix: dataset_t move-constructor (434cad7)
  • Fix: memset entire dataset (e70d6a8)
  • Fix: Pattern-matching version in CMake (458342d)

Release v0.3.1

19 Jan 19:31
Compare
Choose a tag to compare

Release: v0.3.1 [skip ci]

Patch

  • Docs: List notable features (c340f4c)

Release v0.3.0

19 Jan 19:23
Compare
Choose a tag to compare

Release: v0.3.0 [skip ci]

Minor

  • Add: BLAS with zero-stride (c39768e)
  • Add: Latency Hiding in AVX-512 (9a5a61a)

Patch

  • Fix: fmt::print thousand separator (6250d0f)
  • Fix: Detecting NUMA (8da88f6)
  • Improve: Uniform benchmark naming (567d2e8)
  • Improve: Scaling CUDA kernels (e57eff0)
  • Make: Find libnuma (d0f74e4)
  • Improve: Huge Pages and code style (aeabdce)
  • Improve: setzero over set1 intrinsics (1fbcd94)
  • Fix: Handling SSE tail (2731049)
  • Make: Load STL symbols in GDB (f91941c)
  • Fix: Calling unrolled AVX-512 variant (b5d4070)
  • Fix: Misaligned non-temporal loads to ZMM (eed4f57)

Release v0.2.0

18 Jan 10:23
Compare
Choose a tag to compare

Release: v0.2.0 [skip ci]

Minor

Patch

  • Docs: File headers (e1ac216)
  • Make: Switch to .cuh extension for CUDA (37c0581)
  • Make: Drop deprecated assets (f311a61)
  • Fix: Missing fmt include for OpenCL (0c0bc0f)
  • Fix: Feature-testing __cpp_lib_execution (0c80b36)
  • Make: Switch to .cuh extension for CUDA (818851f)