๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

SUMMA: scalable universal matrix multiplication algorithm

โœ Scribed by Van De Geijn, R. A.; Watts, J.


Publisher
John Wiley and Sons
Year
1997
Tongue
English
Weight
341 KB
Volume
9
Category
Article
ISSN
1040-3108

No coin nor oath required. For personal study only.

โœฆ Synopsis


In the paper we give a straightforward, highly efficient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system.


๐Ÿ“œ SIMILAR VOLUMES


Scalable Parallel Matrix Multiplication
โœ Keqin Li ๐Ÿ“‚ Article ๐Ÿ“… 2001 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 392 KB

Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N a ), where 2 < a [ 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N a /log N processors. Such a parallel

A practical algorithm for faster matrix
โœ Igor Kaporin ๐Ÿ“‚ Article ๐Ÿ“… 1999 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 69 KB

The purpose of this paper is to present an algorithm for matrix multiplication based on a formula discovered by Pan [7]. For matrices of order up to 10 000, the nearly optimum tuning of the algorithm results in a rather clear non-recursive one-or two-level structure with the operation count comparab

A new parallel matrix multiplication alg
โœ Choi, Jaeyoung ๐Ÿ“‚ Article ๐Ÿ“… 1998 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 139 KB ๐Ÿ‘ 3 views

We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution-independent matrix multiplication algorithm) for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio

A submatrix algorithm for the matrix-vec
โœ Roland Lindh; Per-ร…rke Malmquist ๐Ÿ“‚ Article ๐Ÿ“… 1989 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 179 KB

In self-consistent field (SCF) calculations the construction of the Fock matrix is most time-consuming step. The Fock matrix construction may formally be seen as a matrix-vector multiplication, where the matrix is the supermatrix, Tikl, and the vector is the first-order density matrix, yi. This form

Recursive T-matrix algorithm for multipl
โœ Adnan ลžahin; Eric L. Miller ๐Ÿ“‚ Article ๐Ÿ“… 1997 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 205 KB

We present a new application of the recursiยจe T-matrix algorithm to calculate the scattered field from a single or multiple metallic cylinders of arbitrary shapes. Using the equiยจalence theorem, each metallic object is replaced with small metallic cylinders along its perimeter; then scattered fields

Impact of mixed-parallelism on parallel
โœ F. Desprez; F. Suter ๐Ÿ“‚ Article ๐Ÿ“… 2004 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 277 KB

## Abstract In this paper we study the impact of the simultaneous exploitation of dataโ€ and taskโ€parallelism, so called mixedโ€parallelism, on the Strassen and Winograd matrix multiplication algorithms. This work takes place in the context of Grid computing and, in particular, in the Clientโ€“Agent(s)