An increasing number of scientific programs exhibit two forms of parallelism, often in a nested fashion. At the outer level, the application comprises coarse-grained task parallelism, with dependencies between tasks reflected by an acyclic graph. At the inner level, each node of the graph is a data-
Optimal Use of Mixed Task and Data Parallelism for Pipelined Computations
โ Scribed by Jaspal Subhlok; Gary Vondran
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 360 KB
- Volume
- 60
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
โฆ Synopsis
This paper addresses optimal mapping of parallel programs composed of a chain of data parallel tasks onto the processors of a parallel system. The input to the programs is a stream of data sets, each of which is processed in order by the chain of tasks. This computation structure, also referred to as a data parallel pipeline, is common in several application domains, including digital signal processing, image processing, and computer vision. The parameters of the performance for such stream processing are latency (the time to process an individual data set) and throughput (the aggregate rate at which data sets are processed). These two criteria are distinct since multiple data sets can be pipelined or processed in parallel. The central contribution of this research is a new algorithm to determine a processor mapping for a chain of tasks that optimizes latency in the presence of a throughput constraint. We also discuss how this algorithm can be applied to solve the converse problem of optimizing throughput with a latency constraint. The problem formulation uses a general and realistic model of intertask communication and addresses the entire problem of mapping, which includes clustering tasks into modules, assigning of processors to modules, and possible replicating of modules. The main algorithms are based on dynamic programming and their execution time complexity is polynomial in the number of processors and tasks. The entire framework is implemented as an automatic mapping tool in the Fx parallelizing compiler for a dialect of High Performance Fortran.
๐ SIMILAR VOLUMES
In this paper we discuss the runtime support required for the parallelization of unstructured data-parallel applications on nonuniform and adaptive environments. We describe several optimization techniques for fast remapping of data and for reducing the amount of communications between machines when
Arrays are mapped to processors through a two-step process-alignment followed by distribution-in data-parallel languages such as High Performance Fortran. This process of mapping creates disjoint pieces of the array that are locally owned by each processor. An HPF compiler that generates code for ar
Data from a litter matched tumorigenesis experiment arc analysed using a gencralised h e a r mixed model (GLMM) approach to the analysis of clustered survival data in which there is a dependence of failure time observations withii the same litter. Maximum likelihood (ML.) and residual maximum likeli
The past decade has seen explosive growth in database technology and the amount of data collected. Advances in data collection, use of bar codes in commercial outlets, and the computerization of business transactions have flooded us with lots of data. We have an unprecedented opportunity to analyze