The package HYPAMAS is a thread-safe, high-performance, robust software for solving large sparse nonsymmetric linear systems of equations A*X=b using an either direct or iterative method on shared-memory machines with a multi-core processor. HYPAMAS is implemented in pure C using POSIX threads for parallelization, and hand-code BLAS for efficient numerical calculation. So it is straightforward to use HYPAMAS without linking other software packages.
In the direct method, HYPAMAS contains three numerical kernels to update LU factorization:
- sparse left-looking LU factorization based on sparse BLAS-1.
- sparse left-looking LU factorization based on sparse BLAS-1 and standard dense BLAS-2.
- sparse left-right looking LU factorization based on sparse BLAS-1, standard dense BLAS-2, and BLAS-3.
In the direct method, HYPAMAS can use sequential/parallel forward elimination and backward substitution. To be suitable for Newton-Raphson iteration, HYPAMAS supports a re-factorization to perform A=LU without numerical pivoting permutation.
In the iterative method, HYPAMAS uses incomplete LU(ILU) factorization as the right preconditioner for the generalized minimal residual method(GMRES). It also supports versatile ILU algorithms:
- sparse left-looking ILU factorization based on threshold dropping(ILUT) based on sparse BLAS-1 and standard dens BLAS-2.
- sparse left-looking ILU factorization based on threshold dropping with partial pivoting(ILUTP) based on sparse BLAS-1 and standard dens BLAS-2.
GMRES is temporarily only in sequential implementation, and ILU has the parallel feature.
HYPAMAS has the following improvement features:
- Automatical thread control.
- Adaptive algorithm selection.
- Advanced acceleration technology.
- Accurate and attractive preconditioner.
- HYPAMAS can automatically control thread to determine whether using parallel computation. This feature can be turned off in parameter
iparm[kIparmAutoParallelOff]
then HYPAMAS will execute the program by the given number of used threads.- Forward elimination(Ly=b) and backward substitution(Ux=y) performs much fewer floating-point operations per second(FLOPS) than a numerical LU factorization, it is not guaranteed that parallelization of triangular solves can gain performance improvements. So it is always recommended that triangular solves are sequential.
- HYPAMAS only supports the sparse matrix A stored in a compressed sparse row format(CSR). If the sparse matrix given is stored in a compressed sparse column(CSC), HYPAMAS should solve ATx=b instead of Ax=b. This option is controled in parameter
iparm[kIparmSolveTranspose]
.
From the top-level directory of HYPAMAS, type:
- cd demo
- make
- ./benchmark rajat19.mtx 6
This series of commands solve the matrix rajat19.mtx
based on LU factorization with the used number of threads equal to 6
.
It is available to download the benchmark test set from the website SuiteSparse Matrix Collection[12]. HYPAMAS is deliberately well-devised to solve the matrix obtained from the Newton-Raphson
iteration, e.g. Circuit Simulation Problem typically in SPICE-like
simulators. It is worth mentioning that HYPAMAS only temporarily supports the Matrix Market exchange format, not the MATLAB
and the Rutherford Boeing format.
HYPAMAS is benchmarked against KLU on a Linux system equipped with an Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz architecture, which is specified with 6 physical cores and Hyper-Threading yielding 12 logical threads, and 32GB RAM. The test matrices come from the website SuiteSparse Matrix Collection (formerly the University of Florida Sparse Matrix Collection). HYPAMAS is a cache-friendly application that performs computationally intensive work with fine-tuned floating-point operations, using hyper-threading maybe degrade the performance because of the high usage rate of CPU resources already utilized and the competition for the caches' access running on the logical processors[13]. Therefore, our benchmarks are only used up to 6 threads instead of 12 threads.
The scalability of parallel refactorization is much better attributable to the explicit elimination tree refactorization used. The following two figures plot the performance improvement factor against KLU respectively in the factorization and refactorization phase, both including solving phase. It should be clarified that GMRES includes ILU preconditioning and iteration time.
OS: linux
CPU: AVX instruction set or higher
gcc: 8.4
- 👋 Hi, I’m Penguin.
- 👀 I’m interested in high-performance computation, especially in matrix solvers.
- 🌱 I’m currently learning direct & iterative matrix solvers.
- 💞️ I’m looking to collaborate on a nice team.
- 📫 You can contact me by email if there is any issue or any bug.