✦ LIBER ✦

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

✍ Scribed by Keqin Li

Publisher: Elsevier Science
Year: 2001
Tongue: English
Weight: 392 KB
Volume: 61
Category: Article
ISSN: 0743-7315
DOI: 10.1006/jpdc.2001.1768

No coin nor oath required. For personal study only.

✦ Synopsis

Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N a ), where 2 < a [ 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N a /log N processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1 [ p [ N a /log N, multiplying two N × N matrices can be performed by a DMPC with p processors in O(N a /p) time, i.e., linear speedup and cost optimality can be achieved in the range [1..N a /log N]. This unifies all known algorithms for matrix multiplication on DMPC, standard or nonstandard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. For instance, for all 1 [ p [ N a /log N, multiplying two N × N matrices can be performed by p processors connected by a hypercubic network in O(N a /p+(N 2 /p 2/a )(log p) 2(a -1)/a ) time, which implies that if p=O(N a /(log N) 2(a -1)/(a -2) ), linear speedup can be achieved. Such a parallelization is highly scalable. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically.

📜 SIMILAR VOLUMES

A new parallel matrix multiplication alg

A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

✍ Choi, Jaeyoung 📂 Article 📅 1998 🏛 John Wiley and Sons 🌐 English ⚖ 139 KB 👁 3 views

We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution-independent matrix multiplication algorithm) for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio

Evaluating recursive filters on distribu

Evaluating recursive filters on distributed memory parallel computers

✍ Stpiczyński, Przemysław 📂 Article 📅 2006 🏛 John Wiley and Sons 🌐 English ⚖ 112 KB

Parallel operation of CartaBlanca on sha

Parallel operation of CartaBlanca on shared and distributed memory computers

✍ N. T. Padial-Collins; W. B. VanderHeyden; D. Z. Zhang; E. D. Dendy; D. Livescu 📂 Article 📅 2003 🏛 John Wiley and Sons 🌐 English ⚖ 219 KB

Parallel implementation of a ray tracing

Parallel implementation of a ray tracing algorithm for distributed memory parallel computers

✍ Lee, Tong-Yee; Raghavendra, C. S.; Nicholas, John B. 📂 Article 📅 1997 🏛 John Wiley and Sons 🌐 English ⚖ 145 KB 👁 3 views

Ray tracing is a well known technique to generate life-like images. Unfortunately, ray tracing complex scenes can require large amounts of CPU time and memory storage. Distributed memory parallel computers with large memory capacities and high processing speeds are ideal candidates to perform ray tr

Numerical simulation of fluid dynamic pr

Numerical simulation of fluid dynamic problems on distributed memory parallel computers

✍ Pirozzi, Maria A. 📂 Article 📅 1997 🏛 John Wiley and Sons 🌐 English ⚖ 91 KB 👁 3 views

This paper describes the parallel implementation of a numerical model for the simulation of problems from fluid dynamics on distributed memory multiprocessors. The basic procedure is to apply a fully explicit upwind finite difference approximation on a staggered grid. A theoretical time complexity a

Lagged Fibonacci Random Number Generator

Lagged Fibonacci Random Number Generators for Distributed Memory Parallel Computers

✍ Srinivas Aluru 📂 Article 📅 1997 🏛 Elsevier Science 🌐 English ⚖ 192 KB

To parallelize applications that require the use of random numbers, an efficient and good quality parallel random number generator is required. In this paper, we study the parallelization of lagged Fibonacci generators for distributed memory parallel computers. Two popular ways of generating a rando