We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution-independent matrix multiplication algorithm) for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communicatio
Parallel implementation of a ray tracing algorithm for distributed memory parallel computers
β Scribed by Lee, Tong-Yee; Raghavendra, C. S.; Nicholas, John B.
- Publisher
- John Wiley and Sons
- Year
- 1997
- Tongue
- English
- Weight
- 145 KB
- Volume
- 9
- Category
- Article
- ISSN
- 1040-3108
No coin nor oath required. For personal study only.
β¦ Synopsis
Ray tracing is a well known technique to generate life-like images. Unfortunately, ray tracing complex scenes can require large amounts of CPU time and memory storage. Distributed memory parallel computers with large memory capacities and high processing speeds are ideal candidates to perform ray tracing. However, the computational cost of rendering pixels and patterns of data access cannot be predicted until runtime. To parallelize such an application efficiently on distributed memory parallel computers, the issues of database distribution, dynamic data management and dynamic load balancing must be addressed. In this paper, we present a parallel implementation of a ray tracing algorithm on the Intel Delta parallel computer. In our database distribution, a small fraction of database is duplicated on each processor, while the remaining part is evenly distributed among groups of processors. In the system, there are multiple copies of the entire database in the memory of groups of processors. Dynamic data management is acheived by an ALRU cache scheme which can exploit image coherence to reduce data movements in ray tracing consecutive pixels. We balance load among processors by distributing subimages to processors in a global fashion based on previous workload requests. The success of our implementation depends crucially on a number of parameters which are experimentally evaluated.
π SIMILAR VOLUMES
This paper describes the parallel implementation of a numerical model for the simulation of problems from fluid dynamics on distributed memory multiprocessors. The basic procedure is to apply a fully explicit upwind finite difference approximation on a staggered grid. A theoretical time complexity a
This paper presents two parallel realizations of sparse distributed memory (SDM) on a treeshaped computer. The original model of SDM is introduced in terms of generalized computer memory and artificial neural networks (ANNs). For parallellization purposes, addressing, storage and retrieval operation
An efficient algorithm for implementing the finite-element ( ) time-domain FETD method on parallel computers is presented. An unconditionally stable implicit FETD algorithm is combined with the ( ) finite-element tearing and interconnecting FETI method. This domain decomposition algorithm con¨erges
In recent years several implementations of molecular dynamics Ε½ . Ε½ . MD codes have been reported on multiple instruction multiple data MIMD machines. However, very few implementations of MD codes on single instruction Ε½ . multiple data SIMD machines have been reported. The difficulty in using pair