๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Communication Optimizations for Parallel C Programs

โœ Scribed by Yingchun Zhu; Laurie J. Hendren


Publisher
Elsevier Science
Year
1999
Tongue
English
Weight
661 KB
Volume
58
Category
Article
ISSN
0743-7315

No coin nor oath required. For personal study only.

โœฆ Synopsis


This paper presents algorithms for reducing the communication overhead for parallel C programs that use dynamically allocated data structures. The framework consists of an analysis phase called possible-placement analysis, and a transformation phase called communication selection. The fundamental idea of possible-placement analysis is to find all possible points for insertion of remote memory operations. Remote reads are propagated upwards, whereas remote writes are propagated downwards. Based on the results of the possibleplacement analysis, the communication selection transformation selects the ``best'' place for inserting the communication and determines if pipelining or blocking of communication should be performed. The framework has been implemented in the EARTH-McCAT optimizing C compiler, and experimental results are presented for five pointer-intensive benchmarks running on the EARTH-MANNA distributed-memory parallel processor. These experiments show that the communication optimization can provide performance improvements of up to 160 over the unoptimized benchmarks.


๐Ÿ“œ SIMILAR VOLUMES


Compile-Time Estimation of Communication
โœ Thomas Fahringer ๐Ÿ“‚ Article ๐Ÿ“… 1996 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 416 KB

A key measure of the performance of a distributed memory parallel program is the communication overhead. On most current parallel systems, sending data from a local to a remote processor still takes one or two orders of magnitude longer than the time to access data on a local processor. The behavior

Generating Local Addresses and Communica
โœ S. Chatterjee; J.R. Gilbert; F.J.E. Long; R. Schreiber; S.H. Teng ๐Ÿ“‚ Article ๐Ÿ“… 1995 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 988 KB

Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We demonstrate a storage scheme for an array \(A\) affinely aligned to a template that is distributed across \(p\) processors wit

Analyses and Optimizations for Shared Ad
โœ Arvind Krishnamurthy; Katherine Yelick ๐Ÿ“‚ Article ๐Ÿ“… 1996 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 423 KB

We present compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space. Any type of code motion on explicitly parallel programs requires a new kind of analysis to ensure that operations reordered on one processor cannot be observed by another.