Performance of parallel Cholesky factorization algorithms using BLAS
โ Scribed by Glenn R. Luecke; Jae Heon Yun; Philip W. Smith
- Publisher
- Springer US
- Year
- 1992
- Tongue
- English
- Weight
- 854 KB
- Volume
- 6
- Category
- Article
- ISSN
- 0920-8542
No coin nor oath required. For personal study only.
โฆ Synopsis
This paper considers four parallel Cholesky factorization algorithms, including SPOTRF from the February 1992 release of LAPACK, each of which call parallel Level 2 or 3 BLAS, or both. A fifth parallel Cholesky algorithm that calls serial Level 3 BLAS is also described. The efficiency of these five algorithms on the CRAY-2, CRAY Y-MP/832, Hitachi Data Systems EX 80, and IBM 3090-600J is evaluated and compared with a vendoroptimized parallel Cholesky factorization algorithm. The fifth parallel Cholesky algorithm that calls serial Level 3 BLAS provided the best performance of all algorithms that called BLAS routines. In fact, this algorithm outperformed the Cray-optimized libsci routine (SPOTRF) by 13-44 %, depending on the problem size and the number of processors used.
๐ SIMILAR VOLUMES
Rapid computation of the QR factorization of a matrix is fundamental to many scientific and engineering problems. The paper presents a family of algorithms parameterized by the number of processors available P, arithmetic grain aggregation parameters gl ,@, . . . ,gp, and communication grain aggrega
In this paper we study various implementations of Cholesky factorization on SIMD architectures. A submatrix algorithm is implemented on the MasPar MP-2 using both block and torus-wrap data mappings. Both LL T and LDL T (square root free) implementations of the algorithm are investigated. The executi