✦ LIBER ✦

SUMMA: scalable universal matrix multiplication algorithm

✍ Scribed by Van De Geijn, R. A.; Watts, J.

Publisher: John Wiley and Sons
Year: 1997
Tongue: English
Weight: 341 KB
Volume: 9
Category: Article
ISSN: 1040-3108
DOI: 10.1002/(sici)1096-9128(199704)9:4<255::aid-cpe250>3.0.co;2-2

No coin nor oath required. For personal study only.

✦ Synopsis

In the paper we give a straightforward, highly efficient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system.

📜 SIMILAR VOLUMES

Scalable Parallel Matrix Multiplication

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

✍ Keqin Li 📂 Article 📅 2001 🏛 Elsevier Science 🌐 English ⚖ 392 KB

Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N a ), where 2 < a [ 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N a /log N processors. Such a parallel

A practical algorithm for faster matrix

A practical algorithm for faster matrix multiplication

✍ Igor Kaporin 📂 Article 📅 1999 🏛 John Wiley and Sons 🌐 English ⚖ 69 KB

The purpose of this paper is to present an algorithm for matrix multiplication based on a formula discovered by Pan [7]. For matrices of order up to 10 000, the nearly optimum tuning of the algorithm results in a rather clear non-recursive one-or two-level structure with the operation count comparab

A new parallel matrix multiplication alg

A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

✍ Choi, Jaeyoung 📂 Article 📅 1998 🏛 John Wiley and Sons 🌐 English ⚖ 139 KB 👁 3 views

We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution-independent matrix multiplication algorithm) for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio

A submatrix algorithm for the matrix-vec

A submatrix algorithm for the matrix-vector multiplication of very large matrices

✍ Roland Lindh; Per-Årke Malmquist 📂 Article 📅 1989 🏛 John Wiley and Sons 🌐 English ⚖ 179 KB

In self-consistent field (SCF) calculations the construction of the Fock matrix is most time-consuming step. The Fock matrix construction may formally be seen as a matrix-vector multiplication, where the matrix is the supermatrix, Tikl, and the vector is the first-order density matrix, yi. This form

Recursive T-matrix algorithm for multipl

Recursive T-matrix algorithm for multiple metallic cylinders

✍ Adnan Şahin; Eric L. Miller 📂 Article 📅 1997 🏛 John Wiley and Sons 🌐 English ⚖ 205 KB

We present a new application of the recursi¨e T-matrix algorithm to calculate the scattered field from a single or multiple metallic cylinders of arbitrary shapes. Using the equi¨alence theorem, each metallic object is replaced with small metallic cylinders along its perimeter; then scattered fields

Impact of mixed-parallelism on parallel

Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms

✍ F. Desprez; F. Suter 📂 Article 📅 2004 🏛 John Wiley and Sons 🌐 English ⚖ 277 KB

## Abstract In this paper we study the impact of the simultaneous exploitation of data‐ and task‐parallelism, so called mixed‐parallelism, on the Strassen and Winograd matrix multiplication algorithms. This work takes place in the context of Grid computing and, in particular, in the Client–Agent(s)