programming environment which allows annotation of single address space programs with distribution directives specifying the mapping of arrays to processors of a distributed-memory machine. The compiler is responsible for partitioning the arrays and generating SPMD messagepassing node code for the a
Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines
β Scribed by S.K.S. Gupta; S.D. Kaushik; C.-H. Huang; P. Sadayappan
- Publisher
- Elsevier Science
- Year
- 1996
- Tongue
- English
- Weight
- 502 KB
- Volume
- 32
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
β¦ Synopsis
data-parallelism in these languages. Array expressions involve array sections which consist of array elements from a lower index to an upper index at a fixed stride. In order to generate high-performance target code, compilers for distributed-memory machines should produce efficient code for array statements involving distributed arrays.
Compilation of array statements with distributed arrays for parallel execution requires partitioning of the computation among the processors. Many compilers use the ownercomputes rule to partition the computation in such a manner that the processor owning the array element performs the computation which modifies it. The computation performed on a processor may involve array elements resident on other processors. In the generated code, all the nonlocal data needed by a processor is fetched into temporary arrays in a processor's local memory using interprocessor communication. In order to reduce the communication overhead, each processor first determines all the data it needs to send to and receive from other processors and then performs the required communication. This reduces the communication overhead by aggregating all the data movements needed from one processor to another into a single message. After communication, each processor then performs the computation in its local address space.
The overhead to perform the data movement consists of determination of the following data index sets and processor sets for each processor p:
β’ Send processor set of processor p: set of processors to which p has to send data.
β’ Send data index set of processor p to processor q: indices of the array elements which are resident on p but are needed by q.
β’ Receive processor set of processor p: set of processors from which p has to receive data.
β’ Receive data index set of processor p from processor q: indices of the array elements which are needed by p but are resident on q.
Closed form characterization of these sets would reduce the overhead of packing data into messages on the sending processor and unpacking data at the receiving processor. If the arrays have only block or cyclic distributions, then the data index sets and the processor sets can be characterized
π SIMILAR VOLUMES
Distributed-memory message-passing machines deliver scalable performance but are difficult to program. Shared-memory machines, on the other hand, are easier to program but obtaining scalable performance with large number of processors is difficult. Recently, scalable machines based on logically shar
program. The need for high performance I/O is so significant that almost all the present generation parallel computers such as the Paragon, SP-2, and nCUBE2. provide some kind of hardware and software support for parallel I/O [dRC94]. Data parallel languages such as High Performance Fortran (HPF) [