In this paper we discuss the runtime support required for the parallelization of unstructured data-parallel applications on nonuniform and adaptive environments. We describe several optimization techniques for fast remapping of data and for reducing the amount of communications between machines when
A Comparison of Implementation Strategies for Nonuniform Data-Parallel Computations
β Scribed by Salvatore Orlando; Raffaele Perego
- Publisher
- Elsevier Science
- Year
- 1998
- Tongue
- English
- Weight
- 348 KB
- Volume
- 52
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
β¦ Synopsis
Data-parallel languages allow programmers to easily express parallel computations by means of high-level constructs. To reduce overheads, the compiler partitions the computations among the processors at compile-time, on the basis of the static data distribution suggested by the programmer. When execution costs are nonuniform and unpredictable, some processors may be assigned more work than others. Workload imbalance can be mitigated by cyclically distributing data and associated computations or by employing adaptive strategies which build a more balanced schedule at run-time, on the basis of the actual execution costs. This paper discusses static and hybrid (static+dynamic) scheduling strategies which can be used to balance the workloads derived from the execution of nonuniform parallel loops. A multidimensional flame simulation kernel has been used to evaluate different implementation strategies on a Cray T3E. We fed the benchmark code with synthetic input data sets built on the basis of a load imbalance model and we report and compare the results obtained.
π SIMILAR VOLUMES
Ray tracing is a well known technique to generate life-like images. Unfortunately, ray tracing complex scenes can require large amounts of CPU time and memory storage. Distributed memory parallel computers with large memory capacities and high processing speeds are ideal candidates to perform ray tr
Although logic languages, due to their nnn-declarative nature, are widely proclaimed to be conducive in theory to parallel implementation, in fact there appears to be insufficient practical evidence to stimulate further developments in this field. The paper puts forward various complications which a