A key measure of the performance of a distributed memory parallel program is the communication overhead. On most current parallel systems, sending data from a local to a remote processor still takes one or two orders of magnitude longer than the time to access data on a local processor. The behavior
Communication Optimizations for Parallel C Programs
โ Scribed by Yingchun Zhu; Laurie J. Hendren
- Publisher
- Elsevier Science
- Year
- 1999
- Tongue
- English
- Weight
- 661 KB
- Volume
- 58
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
โฆ Synopsis
This paper presents algorithms for reducing the communication overhead for parallel C programs that use dynamically allocated data structures. The framework consists of an analysis phase called possible-placement analysis, and a transformation phase called communication selection. The fundamental idea of possible-placement analysis is to find all possible points for insertion of remote memory operations. Remote reads are propagated upwards, whereas remote writes are propagated downwards. Based on the results of the possibleplacement analysis, the communication selection transformation selects the ``best'' place for inserting the communication and determines if pipelining or blocking of communication should be performed. The framework has been implemented in the EARTH-McCAT optimizing C compiler, and experimental results are presented for five pointer-intensive benchmarks running on the EARTH-MANNA distributed-memory parallel processor. These experiments show that the communication optimization can provide performance improvements of up to 160 over the unoptimized benchmarks.
๐ SIMILAR VOLUMES
Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We demonstrate a storage scheme for an array \(A\) affinely aligned to a template that is distributed across \(p\) processors wit
We present compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space. Any type of code motion on explicitly parallel programs requires a new kind of analysis to ensure that operations reordered on one processor cannot be observed by another.