Releases: ROCm/rocBLAS
Releases · ROCm/rocBLAS
rocBLAS 4.3.0 for ROCm 6.3.1
rocBLAS code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.
rocBLAS 4.3.0 for ROCm 6.3.0
Added
- Level 3 and EX functions have an additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
Changed
- amdclang is used as the default compiler instead of hipcc
- Internal performance scripts use amd-smi instead of the deprecated rocm-smi
Optimized
- Improved performance of Level 2 gbmv
- Improved performance of Level 2 gemv for float and double precisions for problem sizes (TransA == N && m==n && m % 128 == 0) measured on a gfx942 GPU
Resolved issues
- Fixed stbsv_strided_batched_64 Fortran binding
Upcoming changes
- rocblas_Xgemm_kernel_name APIs are deprecated
rocBLAS 4.2.4 for ROCm 6.2.4
Additions
- GFX1151 Support
rocBLAS 4.2.1 for ROCm 6.2.2
rocBLAS code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.
rocBLAS 4.2.1 for ROCm 6.2.1
Removals
- Remove Device_Memory_Allocation.pdf link in documentation
Fixes
- Fixed error/warn message during rocblas_set_stream() call
rocBLAS 4.2.0 for ROCm 6.2.0
Additions
- Level 2 functions and level 3 trsm have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
- Cache flush timing for gemm_batched_ex, gemm_strided_batched_ex, axpy
- Benchmark class for common timing code
- An environment variable "ROCBLAS_DEFAULT_ATOMICS_MODE" to set default atomics mode during creation of 'rocblas_handle'
- Extended dot_ex to support single-precision (fp32_r) input and double-precision (fp64_r) output and compute types
Optimizations
- Improved performance of Level 1 dot_batched and dot_strided_batched for all precisions. Performance enhanced by 6 times for bigger problem sizes measured on MI210 GPU
Changes
- Linux AOCL dependency updated to release 4.2 gcc build
- Windows vcpkg dependencies updated to release 2024.02.14
- Increased default device workspace from 32 to 128 MiB for architecture gfx9xx with xx >= 40
Deprecations
- rocblas_gemm_ex3, gemm_batched_ex3 and gemm_strided_batched_ex3 are deprecated and will be removed in the next major release of rocBLAS. Please refer to hipBLASLt for future 8 bit float usage https://github.com/ROCm/hipBLASLt
rocBLAS 4.1.2 for ROCm 6.1.2
Fixes
- Fixes BF16 TT get_solutions
Optimizations
- Tune gfx942 BBS TN, TT
rocBLAS 4.1.0 for ROCm 6.1.1
rocBLAS code for ROCm 6.1.1 did not change. The library was rebuilt for the updated ROCm 6.1.1 stack.
rocBLAS 4.1.0 for ROCm 6.1.0
Additions
- Level 1 and Level 1 Extension functions have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments.
- Cache flush timing for gemm_ex.
Changes
- Some Level 2 function argument names have changed 'm' to 'n' to match legacy BLAS, there was no change in implementation.
- Standardized the use of non-blocking streams for copying results from device to host.
Fixes
- Fixed host-pointer mode reductions for non-blocking streams.
rocBLAS 4.0.0 for ROCm 6.0.2
rocBLAS code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.