Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N a ), where 2 < a [ 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N a /log N processors. Such a parallel
On the Scalability of Asynchronous Parallel Computations
โ Scribed by D.C. Marinescu; J.R. Rice
- Publisher
- Elsevier Science
- Year
- 1994
- Tongue
- English
- Weight
- 753 KB
- Volume
- 22
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
โฆ Synopsis
This paper investigates the time lost in a parallel computation due to sequential and duplicated work, communication, and blocking, and proposes characterizations of parallel algorithms based upon the communication complexity and the blocking model. It discusses the impact of the processor's architecture upon the measured speedup. It shows that a large speedup may be due to an inefficient sequential computation, rather than to an efficient parallel computation. A model of parallel computation which takes into account sequential and duplicated work, communication and control, and blocking is presented. It parametrizes scalability using three functions of the number (P) of processors: (E(P)=) the number of communication events, (f(P)=) the fraction of sequential and duplicated work plus the algorithmic blocking, (I(P)=) the instruction execution rate. The characteristic function (E(P)) is the most important as in many computations (f(P)) and (I(P)) are nearly constant. The model is used to predict the asymptotic behavior, the maximum speedup, and the optimal number of processors. A 3-D FFT algorithm and a Chebyshev iterative algorithm are used to illustrate the concepts introduced. 1994 Academic Press. Inc.
๐ SIMILAR VOLUMES
An asynchronous parallel algorithm is developed in this paper. It is based on the newly developed sequential pseudo-excitation method for structural non-stationary random responses, used with the high precision direct (HPD-S) integration method. Examples run on a Transtech Paramid MIMD parallel comp
The paper will present a performance analysis of a 4-processor asynchronous parallel computer based on Texas Instruments 990/10 minicomputers, recently commissioned at Loughborough University. A description of the implementation of parallel computing on the system, which was originally 4 independent
This paper describes a framework for parallel computing in a locally confined, scalable computing cluster (SCC) using Java agents. The framework consists of a programming model with agents and asynchronous invocations, and a scheme for adaptive placement of multiple agents in an SCC. Our scheme is g
A two-dimensional \((h, p)\) finite element scheme for distributed parallel computation is developed. The approach is based on an element-by-element domain decomposition and is implemented on the nCUBE2 system. Example problems are used to demonstrate performance of the algorithm for a range of \((h