Skip to content

HYPAMAS: Hybrid parallel matrix solver for solving large sparse nonsymmetric linear systems of equations A*X=b.

Notifications You must be signed in to change notification settings

Hypamas/HYPAMAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 

Repository files navigation

HYPAMAS: Hybrid Parallel Matrix Solver for Linear Systems

The package HYPAMAS is a thread-safe, high-performance, robust software for solving large sparse nonsymmetric linear systems of equations A*X=b using an either direct or iterative method on shared-memory machines with a multi-core processor. HYPAMAS is implemented in pure C using POSIX threads for parallelization, and hand-code BLAS for efficient numerical calculation. So it is straightforward to use HYPAMAS without linking other software packages.

In the direct method, HYPAMAS contains three numerical kernels to update LU factorization:

  1. sparse left-looking LU factorization based on sparse BLAS-1.
  2. sparse left-looking LU factorization based on sparse BLAS-1 and standard dense BLAS-2.
  3. sparse left-right looking LU factorization based on sparse BLAS-1, standard dense BLAS-2, and BLAS-3.

In the direct method, HYPAMAS can use sequential/parallel forward elimination and backward substitution. To be suitable for Newton-Raphson iteration, HYPAMAS supports a re-factorization to perform A=LU without numerical pivoting permutation.

In the iterative method, HYPAMAS uses incomplete LU(ILU) factorization as the right preconditioner for the generalized minimal residual method(GMRES). It also supports versatile ILU algorithms:

  1. sparse left-looking ILU factorization based on threshold dropping(ILUT) based on sparse BLAS-1 and standard dens BLAS-2.
  2. sparse left-looking ILU factorization based on threshold dropping with partial pivoting(ILUTP) based on sparse BLAS-1 and standard dens BLAS-2.

GMRES is temporarily only in sequential implementation, and ILU has the parallel feature.

HYPAMAS has the following improvement features:

  1. Automatical thread control.
  2. Adaptive algorithm selection.
  3. Advanced acceleration technology.
  4. Accurate and attractive preconditioner.

Details:

  1. HYPAMAS can automatically control thread to determine whether using parallel computation. This feature can be turned off in parameter iparm[kIparmAutoParallelOff] then HYPAMAS will execute the program by the given number of used threads.
  2. Forward elimination(Ly=b) and backward substitution(Ux=y) performs much fewer floating-point operations per second(FLOPS) than a numerical LU factorization, it is not guaranteed that parallelization of triangular solves can gain performance improvements. So it is always recommended that triangular solves are sequential.
  3. HYPAMAS only supports the sparse matrix A stored in a compressed sparse row format(CSR). If the sparse matrix given is stored in a compressed sparse column(CSC), HYPAMAS should solve ATx=b instead of Ax=b. This option is controled in parameter iparm[kIparmSolveTranspose].

Benchmark:

From the top-level directory of HYPAMAS, type:

  1. cd demo
  2. make
  3. ./benchmark rajat19.mtx 6

This series of commands solve the matrix rajat19.mtx based on LU factorization with the used number of threads equal to 6.
It is available to download the benchmark test set from the website SuiteSparse Matrix Collection[12]. HYPAMAS is deliberately well-devised to solve the matrix obtained from the Newton-Raphson iteration, e.g. Circuit Simulation Problem typically in SPICE-like simulators. It is worth mentioning that HYPAMAS only temporarily supports the Matrix Market exchange format, not the MATLAB and the Rutherford Boeing format.

HYPAMAS is benchmarked against KLU on a Linux system equipped with an Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz architecture, which is specified with 6 physical cores and Hyper-Threading yielding 12 logical threads, and 32GB RAM. The test matrices come from the website SuiteSparse Matrix Collection (formerly the University of Florida Sparse Matrix Collection). HYPAMAS is a cache-friendly application that performs computationally intensive work with fine-tuned floating-point operations, using hyper-threading maybe degrade the performance because of the high usage rate of CPU resources already utilized and the competition for the caches' access running on the logical processors[13]. Therefore, our benchmarks are only used up to 6 threads instead of 12 threads.

The scalability of parallel refactorization is much better attributable to the explicit elimination tree refactorization used. The following two figures plot the performance improvement factor against KLU respectively in the factorization and refactorization phase, both including solving phase. It should be clarified that GMRES includes ILU preconditioning and iteration time.
factorization & solution refactorization & solution

Requirements:

OS: linux
CPU: AVX instruction set or higher
gcc: 8.4

References:

[1] J. R. Gilbert and T. Peierls. Sparse partial pivoting in time proportional to arithmetic operations. SIAM J. Scientific and Statistical Computing, 9:862-874, 1988.
[2] S. C. Eisenstat and J. W. H. Liu. Exploiting structural symmetry in a sparse partial pivoting code. SIAM J. Scientific and Statistical Computing, 14:253-257, 1993.
[3] Xiaoye S. Li. Sparse gaussian elimination on high performance computers. EECS Department, University of California, Berkeley, 1996.
[4] T. A. Davis and E. P. Natarajan. Algorithm 907: KLU, a direct sparse solver for circuit simulation problems. ACM Trans. Mathematical Software. 37(3), 36:1-36:17, 2010.
[5] X. Chen, Y. Wang and H. Yang. NICSLU: An adaptive sparse matrix solver for parallel circuit simulation. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems. 32(2), 261-274, 2013.
[6] I. S. Duff and J. Koster. On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J. Matrix Analysis and Applications. 22(4), 973-996, 2000.
[7] P. R. Amestoy, T. A. Davis and I. S. Duff. Algorithm 837: AMD, an approximate minimum degree ordering algorithm. ACM Trans. Mathematical Software. 30(3), 381-388, 2004.
[8] Y. Saad. ILUT: A dual threshold incomplete LU factorization. Numerical Linear Algebra with Applications, 1(4), 387–402, 1994.
[9] Xiaoye S. Li and M. Shao. A supernodal approach to incomplete LU factorization with partial pivoting. ACM Trans. Mathematical Software. 37(4), 43:1-43:20, 2011.
[10] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Scientific and Statistical Computing, 7(3), 856-869, 1986.
[11] Y. Saad. Iterative methods for sparse linear systems. 2nd Edition. SIAM. 2003.
[12] T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. ACM Trans. Mathematical Software. 38(1) 1:1-1:25, 2011.
[13] T. L. Tau, A. Rizwan, H. Jenwei, M. Victor and R. Reza. An empirical study of Hyper-Threading in high performance computing clusters. Technical Report, 2002.

Authors:

  • 👋 Hi, I’m Penguin.
  • 👀 I’m interested in high-performance computation, especially in matrix solvers.
  • 🌱 I’m currently learning direct & iterative matrix solvers.
  • 💞️ I’m looking to collaborate on a nice team.
  • 📫 You can contact me by email if there is any issue or any bug.

About

HYPAMAS: Hybrid parallel matrix solver for solving large sparse nonsymmetric linear systems of equations A*X=b.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published