Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N a ), where 2 < a [ 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N a /log N processors. Such a parallel
SUMMA: scalable universal matrix multiplication algorithm
โ Scribed by Van De Geijn, R. A.; Watts, J.
- Publisher
- John Wiley and Sons
- Year
- 1997
- Tongue
- English
- Weight
- 341 KB
- Volume
- 9
- Category
- Article
- ISSN
- 1040-3108
No coin nor oath required. For personal study only.
โฆ Synopsis
In the paper we give a straightforward, highly efficient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system.
๐ SIMILAR VOLUMES
The purpose of this paper is to present an algorithm for matrix multiplication based on a formula discovered by Pan [7]. For matrices of order up to 10 000, the nearly optimum tuning of the algorithm results in a rather clear non-recursive one-or two-level structure with the operation count comparab
We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution-independent matrix multiplication algorithm) for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio
In self-consistent field (SCF) calculations the construction of the Fock matrix is most time-consuming step. The Fock matrix construction may formally be seen as a matrix-vector multiplication, where the matrix is the supermatrix, Tikl, and the vector is the first-order density matrix, yi. This form
We present a new application of the recursiยจe T-matrix algorithm to calculate the scattered field from a single or multiple metallic cylinders of arbitrary shapes. Using the equiยจalence theorem, each metallic object is replaced with small metallic cylinders along its perimeter; then scattered fields
## Abstract In this paper we study the impact of the simultaneous exploitation of dataโ and taskโparallelism, so called mixedโparallelism, on the Strassen and Winograd matrix multiplication algorithms. This work takes place in the context of Grid computing and, in particular, in the ClientโAgent(s)