𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines

✍ Scribed by S.K.S. Gupta; S.D. Kaushik; C.-H. Huang; P. Sadayappan


Publisher
Elsevier Science
Year
1996
Tongue
English
Weight
502 KB
Volume
32
Category
Article
ISSN
0743-7315

No coin nor oath required. For personal study only.

✦ Synopsis


data-parallelism in these languages. Array expressions involve array sections which consist of array elements from a lower index to an upper index at a fixed stride. In order to generate high-performance target code, compilers for distributed-memory machines should produce efficient code for array statements involving distributed arrays.

Compilation of array statements with distributed arrays for parallel execution requires partitioning of the computation among the processors. Many compilers use the ownercomputes rule to partition the computation in such a manner that the processor owning the array element performs the computation which modifies it. The computation performed on a processor may involve array elements resident on other processors. In the generated code, all the nonlocal data needed by a processor is fetched into temporary arrays in a processor's local memory using interprocessor communication. In order to reduce the communication overhead, each processor first determines all the data it needs to send to and receive from other processors and then performs the required communication. This reduces the communication overhead by aggregating all the data movements needed from one processor to another into a single message. After communication, each processor then performs the computation in its local address space.

The overhead to perform the data movement consists of determination of the following data index sets and processor sets for each processor p:

β€’ Send processor set of processor p: set of processors to which p has to send data.

β€’ Send data index set of processor p to processor q: indices of the array elements which are resident on p but are needed by q.

β€’ Receive processor set of processor p: set of processors from which p has to receive data.

β€’ Receive data index set of processor p from processor q: indices of the array elements which are needed by p but are resident on q.

Closed form characterization of these sets would reduce the overhead of packing data into messages on the sending processor and unpacking data at the receiving processor. If the arrays have only block or cyclic distributions, then the data index sets and the processor sets can be characterized


πŸ“œ SIMILAR VOLUMES


Efficient Index Set Generation for Compi
✍ S.D. Kaushik; C.-H. Huang; P. Sadayappan πŸ“‚ Article πŸ“… 1996 πŸ› Elsevier Science 🌐 English βš– 316 KB

programming environment which allows annotation of single address space programs with distribution directives specifying the mapping of arrays to processors of a distributed-memory machine. The compiler is responsible for partitioning the arrays and generating SPMD messagepassing node code for the a

Compiler Algorithms for Optimizing Local
✍ M. Kandemir; J. Ramanujam; A. Choudhary πŸ“‚ Article πŸ“… 2000 πŸ› Elsevier Science 🌐 English βš– 608 KB

Distributed-memory message-passing machines deliver scalable performance but are difficult to program. Shared-memory machines, on the other hand, are easier to program but obtaining scalable performance with large number of processors is difficult. Recently, scalable machines based on logically shar

Compilation and Communication Strategies
✍ Rajesh Bordawekar; Alok Choudhary; J. Ramanujam πŸ“‚ Article πŸ“… 1996 πŸ› Elsevier Science 🌐 English βš– 518 KB

program. The need for high performance I/O is so significant that almost all the present generation parallel computers such as the Paragon, SP-2, and nCUBE2. provide some kind of hardware and software support for parallel I/O [dRC94]. Data parallel languages such as High Performance Fortran (HPF) [