I present global design principles for the implementation of High Energy Physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of High Energy Physics tasks is identified with granularity varying from a few times 10
Termination detection in data-driven parallel computations/applications
โ Scribed by Ashfaq A. Khokhar; Susanne E. Hambrusch; Erturk Kocalar
- Publisher
- Elsevier Science
- Year
- 2003
- Tongue
- English
- Weight
- 233 KB
- Volume
- 63
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
โฆ Synopsis
High-performance computing applications with data-driven communication and computation characteristics require synchronization routines in the form of eureka, barrier, or termination synchronization. In this paper, we consider termination synchronization for two different execution models, the AP and the APS model. In the AP model, processors are either active or passive and a passive processor can be made active by another active processor. In the APS model, processors can also be in a server state. A passive processor entering the server state does not become active again. In addition, a server processor cannot change the status of other processors. We describe and analyze solutions for both models and present experimental work highlighting the differences between the models. We show that in almost all situations the use of an AP algorithm to detect termination in an APS environment will result in loss of performance. Our experimental work on the Cray T3E provides insight into where and why this performance loss occurs.
๐ SIMILAR VOLUMES
One central problem in the execution of parallel nested loops with non-ane bounds is the precise scanning (i.e., enumeration) of the points in their iteration space and the detection of their termination. Scanning schemes have been proposed for both shared-memory and distributed-memory implementatio
In this paper we discuss the runtime support required for the parallelization of unstructured data-parallel applications on nonuniform and adaptive environments. We describe several optimization techniques for fast remapping of data and for reducing the amount of communications between machines when
In this paper we present a decentralized remapping method for data parallel applications on distributed memory multiprocessors. The method uses a generalized dimension exchange (GDE) algorithm periodically during the execution of an application to balance (remap) the system's workload. We implemente