Skip to content

Releases: NVIDIA/AMGX

v2.4.0

25 Oct 11:58
2b4762f
Compare
Choose a tag to compare

Changes:

  • Increased maximum CUDA version to 12.2, and now supporting HPC SDK 23.7
  • Fixed issue preventing parameters from being updated after config initialisation
  • Restructured all source files (now under src) and removed plugin feature
  • Replaced custom memory pool with cudaMallocAsync when defining USE_CUDAMALLOCASYNC
  • Changed the cuSPARSE SpMV algorithm choice to CUSPARSE_CSRMV_ALG1, which should improve solve performance for recent versions of cuSPARSE
  • Added single-kernel csrmv that is invoked when total number of rows in the local matrix falls below 3 times the number of SMs on the target GPUs
  • Changes to thrust
    - Increased thrust version to 2.1.0
    - Added specific tested version of thrust as a submodule, please use git clone --recursive to pull AmgX from v2.4.0 onwards
    - Wrapped thrust in namespace to avoid shared library sharing issues referenced here https://github.com/NVIDIA/thrust/releases/tag/1.14.0
    - Removed many superfluous points of synchronisation introduced by thrust
  • Improved performance of writing matrices to file
  • Improved Clang compatibility
  • Add a divergence check, providing new config parameter rel_div_tolerance
  • Added compile-time definition to avoid exception handling, in order to improve experience when debugging (DISABLE_EXCEPTION_HANDLING)
  • Fixed multiple synchronisation issues that can show up on newer GPU architectures (sm_70+)
  • Fixed partition reordering for block_sizes > 1
  • Fixed build issue that arose when AmgX is built as a subproject
  • Fixed issue with OpenMP and NO_MPI linking
  • Replaced some inline asm with intrinsics
  • Fixed issue with exact_coarse_solve grid sizing
  • Fixed issue with use_sum_stopping_criteria
  • Fixed SIGFPE that could occur when the initial norm is 0
  • Added a new API call AMGX_matrix_check_symmetry, that tests if a matrix is structurally or completely symmetric

Tested configurations:

Linux x86-64:
-- Ubuntu 20.04, Ubuntu 22.04
-- NVHPC 23.7, GCC 9.4.0, GCC 12.1
-- OpenMPI 4.0.x
-- CUDA 11.2, 11.8, 12.2
-- A100, H100

Note that while AMGX has support for building in Windows, testing on Windows is very limited.

v2.3.0

29 Jun 23:24
Compare
Choose a tag to compare

Changes:

  • Increased minimum CMake version to 3.18 and adapted to use CUDA as a language, making it possible to compile with HPC SDK
  • Improved performance of compute_values_kernel by ~1.3x
  • Optimised block tuning for aggressive coarsening
  • Added an exact coarse solve, accessible via the default scope flag "exact_coarse_solve"
  • Fixed issue where latency hiding could be enabled/disabled asymmetrically across available ranks
  • Fixed bug with SpGEMM fallback that deleted cuSPARSE handle incorrectly
  • Fixed bug with use of shared memory in estimate_c_hat_kernel

Tested configurations:

  • Linux x86-64:
    -- Ubuntu 20.04, Ubuntu 18.04
    -- gcc 7.4.0, gcc 9.3.0
    -- OpenMPI 4.0.x
    -- CUDA 11.0, 11.2
  • Windows 10 x86-64:
    -- MS Visual Studio 2019 (msvc 19.28)
    -- MS MPI v10.1.2
    -- CUDA 11.0

Note that while AMGX has support for building in Windows, testing on Windows is very limited.

v2.2.0

06 Apr 21:05
Compare
Choose a tag to compare

AMGX v2.2.0

Changes:

  • Fixing GPU Direct support (now correct results and better perf)
  • Fixing latency hiding (general perf and couple bugfixes in some specific cases)
  • Tunings for Volta for agg and classical setup phase
  • Gauss-siedel perf improvements on Volta+
  • Ampere support
  • Minor bugfixes and enhancements including reported/requested by community

Tested configurations:

  • Linux x86-64:
    -- Ubuntu 20.04, Ubuntu 18.04
    -- gcc 7.4.0, gcc 9.3.0
    -- OpenMPI 4.0.x
    -- CUDA 10.2, 11.0, 11.2
  • Windows 10 x86-64:
    -- MS Visual Studio 2019 (msvc 19.28)
    -- MS MPI v10.1.2
    -- CUDA 10.2, 11.0

Note that while AMGX has support for building in Windows, testing on Windows is very limited.

v2.1.0

21 Mar 00:54
Compare
Choose a tag to compare

AMGX v2.1.0

Changelog:

  • Added new API that allows user to provide distributed matrix partitioning information in a new way - offset to the partition's first row in a matrix. Works only if partitions own continuous rows in matrix. Added example case for this new API (see examples/amgx_mpi_capi_cla.c)
  • Distributed code improvements

Tested configurations:

  • gcc 7.5, gcc 6.4
  • CUDA 9.0, CUDA 10.0
  • OpenMPI 4.0
  • NVIDIA V100

Note that while AMGX has support for building in Windows, it is not actively tested and may malfunction or has issues building the library.

v2.0.1

21 Mar 00:36
6025603
Compare
Choose a tag to compare

AMGX v2.0.1

Thanks to community a lot of updates/fixes on initial release were added and incorporated to this release.

Changelog:

  • Dropped CUDA 7.x support and added CUDA 10.0 support
  • Build fixes when using MS VS 2015 and 2017
  • Build fixes when using CUDA 8 and CUDA 9
  • Fixed DEBUG build configuration
  • Minor code improvements

Tested configurations:

  • Windows
    • MSVS 2017 (fixed and reported by @ftvkun), MSVS 2015 (fixed and reported by @ibaned)
  • Linux
    • CUDA 9.0, CUDA 10.0
    • gcc 4.8, gcc 5.3
    • OpenMPI 3.1.3
  • Tested on Volta GPU arch

v2.0.0

21 Mar 00:16
Compare
Choose a tag to compare

AMGX first open-source release, versioned v2.0.0

Tested build configurations:

  • Linux: gcc 4.8.3, OpenMPI
  • Windows: MS VS 2015, MPICH
  • CUDA versions 7.0 - 9.0

Tested on Kepler - Volta GPU families.

API's documentation and examples contained document in doc directory