Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N a ), where 2 < a [ 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N a /log N processors. Such a parallel
Load-Balanced Sparse Matrix–Vector Multiplication on Parallel Computers
✍ Scribed by Sorin G. Nastea; Ophir Frieder; Tarek El-Ghazawi
- Publisher
- Elsevier Science
- Year
- 1997
- Tongue
- English
- Weight
- 560 KB
- Volume
- 46
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
✦ Synopsis
We considered the load-balanced multiplication of a large sparse matrix with a large sequence of vectors on parallel computers. We propose a method that combines fast load-balancing with efficient message-passing techniques to alleviate computational and inter-node communications challenges. The performance of the proposed method was evaluated on benchmark as well as on synthetically generated matrices and compared with the current work. It is shown that, by using our approach, a tangible improvement over prior work can be obtained, particularly for very sparse and skewed matrices. Moreover, it is also shown that I/O overhead for this problem can be efficiently amortized through I/O latency hiding and overall load-balancing.
📜 SIMILAR VOLUMES
The matrix-vector product kernel can represent most of the computation in a gradient iterative solver. Thus, an efficient solver requires that the matrix-vector product kernel be fast. We show that standard approaches with Fortran or C may not deliver good performance and present a strategy involvin
We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution-independent matrix multiplication algorithm) for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio