We present three parallel implementations of the Karatsuba algorithm for long integer multiplication on a distributed memory architecture and discuss the experimental results obtained on a Paragon computer. The first two implementations have both time complexity O(n) on n log 2 3 processors, but pre
Performance analysis of explicit group parallel algorithms for distributed memory multicomputer
โ Scribed by Kok Fu Ng; Norhashidah Hj. Mohd Ali
- Publisher
- Elsevier Science
- Year
- 2008
- Tongue
- English
- Weight
- 663 KB
- Volume
- 34
- Category
- Article
- ISSN
- 0167-8191
No coin nor oath required. For personal study only.
โฆ Synopsis
Since their introduction, the four-point explicit group (EG) and explicit decoupled group (EDG) methods in solving elliptic PDE's have been implemented on various parallel computing architectures such as shared memory parallel computer and distributed computer systems. However, no detailed study on the performance analysis of these algorithms was done in any of these implementations. In this paper we developed performance models for these explicit group methods and present detailed study of their hypothetical implementation on two distributed memory multicomputers with different computation speed and communication bandwidth. Detailed performance analysis based on these models predicted different theoretical performance if the methods were implemented on the clusters. This was confirmed by the experimental results performed on the two distinct clusters. Theoretical analysis and experimental results indicated that both explicit group methods are scalable with respect to number of processors and the problem size.
๐ SIMILAR VOLUMES
Several space sharing policies have been proposed for distributed-memory multicomputer systems. We consider adaptive space sharing policies, as these policies provide a better performance than fixed and static policies by taking system load and user requirements into account. In this paper we propos
Ray tracing is a well known technique to generate life-like images. Unfortunately, ray tracing complex scenes can require large amounts of CPU time and memory storage. Distributed memory parallel computers with large memory capacities and high processing speeds are ideal candidates to perform ray tr
We develop and experiment with a new parallel algorithm to approximate the maximum weight cut in a weighted undirected graph. Our implementation starts with the recent (serial) algorithm of Goemans and Williamson for this problem. We consider several different versions of this algorithm, varying the