Releases: ashvardanian/ParallelReductionsBenchmark
Releases · ashvardanian/ParallelReductionsBenchmark
v0.3.4: Nvidia H200 benchmarks
Release: v0.3.4 [skip ci]
Patch
Release v0.3.3
Release: v0.3.3 [skip ci]
Patch
- Fix: Measuring AVX-512 throughput (a301c39)
Release v0.3.2
Release: v0.3.2 [skip ci]
Patch
- Docs: Need CUDA Toolkit (6ca84ed)
- Improve: Disambiguate buffer length (86a9794)
- Fix:
dataset_t
move-constructor (434cad7)
- Fix:
memset
entire dataset (e70d6a8)
- Fix: Pattern-matching version in CMake (458342d)
Release v0.3.1
Release: v0.3.1 [skip ci]
Patch
- Docs: List notable features (c340f4c)
Release v0.3.0
Release: v0.3.0 [skip ci]
Minor
- Add: BLAS with zero-stride (c39768e)
- Add: Latency Hiding in AVX-512 (9a5a61a)
Patch
- Fix:
fmt::print
thousand separator (6250d0f)
- Fix: Detecting NUMA (8da88f6)
- Improve: Uniform benchmark naming (567d2e8)
- Improve: Scaling CUDA kernels (e57eff0)
- Make: Find
libnuma
(d0f74e4)
- Improve: Huge Pages and code style (aeabdce)
- Improve:
setzero
over set1
intrinsics (1fbcd94)
- Fix: Handling SSE tail (2731049)
- Make: Load STL symbols in GDB (f91941c)
- Fix: Calling unrolled AVX-512 variant (b5d4070)
- Fix: Misaligned non-temporal loads to ZMM (eed4f57)
Release v0.2.0
Release: v0.2.0 [skip ci]
Minor
Patch
- Docs: File headers (e1ac216)
- Make: Switch to
.cuh
extension for CUDA (37c0581)
- Make: Drop deprecated assets (f311a61)
- Fix: Missing
fmt
include for OpenCL (0c0bc0f)
- Fix: Feature-testing
__cpp_lib_execution
(0c80b36)
- Make: Switch to
.cuh
extension for CUDA (818851f)