𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

✍ Scribed by Keqin Li


Publisher
Elsevier Science
Year
2001
Tongue
English
Weight
392 KB
Volume
61
Category
Article
ISSN
0743-7315

No coin nor oath required. For personal study only.

✦ Synopsis


Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N a ), where 2 < a [ 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N a /log N processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1 [ p [ N a /log N, multiplying two N Γ— N matrices can be performed by a DMPC with p processors in O(N a /p) time, i.e., linear speedup and cost optimality can be achieved in the range [1..N a /log N]. This unifies all known algorithms for matrix multiplication on DMPC, standard or nonstandard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. For instance, for all 1 [ p [ N a /log N, multiplying two N Γ— N matrices can be performed by p processors connected by a hypercubic network in O(N a /p+(N 2 /p 2/a )(log p) 2(a -1)/a ) time, which implies that if p=O(N a /(log N) 2(a -1)/(a -2) ), linear speedup can be achieved. Such a parallelization is highly scalable. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically.


πŸ“œ SIMILAR VOLUMES


A new parallel matrix multiplication alg
✍ Choi, Jaeyoung πŸ“‚ Article πŸ“… 1998 πŸ› John Wiley and Sons 🌐 English βš– 139 KB πŸ‘ 3 views

We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution-independent matrix multiplication algorithm) for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio

Parallel implementation of a ray tracing
✍ Lee, Tong-Yee; Raghavendra, C. S.; Nicholas, John B. πŸ“‚ Article πŸ“… 1997 πŸ› John Wiley and Sons 🌐 English βš– 145 KB πŸ‘ 3 views

Ray tracing is a well known technique to generate life-like images. Unfortunately, ray tracing complex scenes can require large amounts of CPU time and memory storage. Distributed memory parallel computers with large memory capacities and high processing speeds are ideal candidates to perform ray tr

Numerical simulation of fluid dynamic pr
✍ Pirozzi, Maria A. πŸ“‚ Article πŸ“… 1997 πŸ› John Wiley and Sons 🌐 English βš– 91 KB πŸ‘ 3 views

This paper describes the parallel implementation of a numerical model for the simulation of problems from fluid dynamics on distributed memory multiprocessors. The basic procedure is to apply a fully explicit upwind finite difference approximation on a staggered grid. A theoretical time complexity a

Lagged Fibonacci Random Number Generator
✍ Srinivas Aluru πŸ“‚ Article πŸ“… 1997 πŸ› Elsevier Science 🌐 English βš– 192 KB

To parallelize applications that require the use of random numbers, an efficient and good quality parallel random number generator is required. In this paper, we study the parallelization of lagged Fibonacci generators for distributed memory parallel computers. Two popular ways of generating a rando