✦ LIBER ✦

Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines

✍ Scribed by S.K.S. Gupta; S.D. Kaushik; C.-H. Huang; P. Sadayappan

Publisher: Elsevier Science
Year: 1996
Tongue: English
Weight: 502 KB
Volume: 32
Category: Article
ISSN: 0743-7315
DOI: 10.1006/jpdc.1996.0011

No coin nor oath required. For personal study only.

✦ Synopsis

data-parallelism in these languages. Array expressions involve array sections which consist of array elements from a lower index to an upper index at a fixed stride. In order to generate high-performance target code, compilers for distributed-memory machines should produce efficient code for array statements involving distributed arrays.

Compilation of array statements with distributed arrays for parallel execution requires partitioning of the computation among the processors. Many compilers use the ownercomputes rule to partition the computation in such a manner that the processor owning the array element performs the computation which modifies it. The computation performed on a processor may involve array elements resident on other processors. In the generated code, all the nonlocal data needed by a processor is fetched into temporary arrays in a processor's local memory using interprocessor communication. In order to reduce the communication overhead, each processor first determines all the data it needs to send to and receive from other processors and then performs the required communication. This reduces the communication overhead by aggregating all the data movements needed from one processor to another into a single message. After communication, each processor then performs the computation in its local address space.

The overhead to perform the data movement consists of determination of the following data index sets and processor sets for each processor p:

• Send processor set of processor p: set of processors to which p has to send data.

• Send data index set of processor p to processor q: indices of the array elements which are resident on p but are needed by q.

• Receive processor set of processor p: set of processors from which p has to receive data.

• Receive data index set of processor p from processor q: indices of the array elements which are needed by p but are resident on q.

Closed form characterization of these sets would reduce the overhead of packing data into messages on the sending processor and unpacking data at the receiving processor. If the arrays have only block or cyclic distributions, then the data index sets and the processor sets can be characterized

📜 SIMILAR VOLUMES

Efficient Index Set Generation for Compi

Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines

✍ S.D. Kaushik; C.-H. Huang; P. Sadayappan 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 316 KB

programming environment which allows annotation of single address space programs with distribution directives specifying the mapping of arrays to processors of a distributed-memory machine. The compiler is responsible for partitioning the arrays and generating SPMD messagepassing node code for the a

Optimizations for Efficient Array Redist

Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers

✍ Shankar Ramaswamy; Barbara Simons; Prithviraj Banerjee 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 333 KB

Efficient communication sets generation

Efficient communication sets generation for block–cyclic distribution on distributed-memory machines

✍ Tsung-Chuan Huang; Liang-Cheng Shiu 📂 Article 📅 2003 🏛 Elsevier Science 🌐 English ⚖ 316 KB

Compiler Algorithms for Optimizing Local

Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines

✍ M. Kandemir; J. Ramanujam; A. Choudhary 📂 Article 📅 2000 🏛 Elsevier Science 🌐 English ⚖ 608 KB

Distributed-memory message-passing machines deliver scalable performance but are difficult to program. Shared-memory machines, on the other hand, are easier to program but obtaining scalable performance with large number of processors is difficult. Recently, scalable machines based on logically shar

Compilation and Communication Strategies

Compilation and Communication Strategies for Out-of-Core Programs on Distributed Memory Machines

✍ Rajesh Bordawekar; Alok Choudhary; J. Ramanujam 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 518 KB

program. The need for high performance I/O is so significant that almost all the present generation parallel computers such as the Paragon, SP-2, and nCUBE2. provide some kind of hardware and software support for parallel I/O [dRC94]. Data parallel languages such as High Performance Fortran (HPF) [

Multigrid efficiency for complex flow si

Multigrid efficiency for complex flow simulations on distributed memory machines

✍ Guy Lonsdale; Anton Schüller 📂 Article 📅 1993 🏛 Elsevier Science 🌐 English ⚖ 662 KB