✦ LIBER ✦

Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines

✍ Scribed by M. Kandemir; J. Ramanujam; A. Choudhary

Publisher: Elsevier Science
Year: 2000
Tongue: English
Weight: 608 KB
Volume: 60
Category: Article
ISSN: 0743-7315
DOI: 10.1006/jpdc.2000.1639

No coin nor oath required. For personal study only.

✦ Synopsis

Distributed-memory message-passing machines deliver scalable performance but are difficult to program. Shared-memory machines, on the other hand, are easier to program but obtaining scalable performance with large number of processors is difficult. Recently, scalable machines based on logically shared physically distributed memory have been designed and implemented. While some of the performance issues like parallelism and locality are common to different parallel architectures, issues such as data distribution are unique to specific architectures. One of the most important challenges compiler writers face is the design of compilation techniques that can work well on a variety of architectures. In this paper, we propose an algorithm that can be employed by optimizing compilers for different types of parallel architectures. Our optimization algorithm does the following: (1) transforms loop nests such that, where possible, the iterations of the outermost loops can be run in parallel across processors; (2) optimizes memory locality by carefully distributing each array across processors; (3) optimizes interprocessor communication using message vectorization whenever possible; and (4) optimizes cache locality by assigning appropriate storage layout for each array and by transforming the iteration space. Depending on the

📜 SIMILAR VOLUMES

Efficient support for fine-grain paralle

Efficient support for fine-grain parallelism on shared-memory machines

✍ LOWENTHAL, DAVID K.; FREEH, VINCENT W.; ANDREWS, GREGORY R. 📂 Article 📅 1998 🏛 John Wiley and Sons 🌐 English ⚖ 134 KB 👁 2 views

A coarse-grain parallel program typically has one thread (task) per processor, whereas a finegrain program has one thread for each independent unit of work. Although there are several advantages to fine-grain parallelism, conventional wisdom is that coarse-grain parallelism is more efficient. This p

Parallel Algorithms for Perceptual Group

Parallel Algorithms for Perceptual Grouping on Distributed Memory Machines

✍ Yongwha Chung; Cho-Li Wang; Viktor K. Prasanna 📂 Article 📅 1998 🏛 Elsevier Science 🌐 English ⚖ 923 KB

Global Optimization for Mapping Parallel

Global Optimization for Mapping Parallel Image Processing Tasks on Distributed Memory Machines

✍ Cheolwhan Lee; Yuan-Fang Wang; Tao Yang 📂 Article 📅 1997 🏛 Elsevier Science 🌐 English ⚖ 565 KB

Many parallel algorithms and library routines for computer vision and image processing (CVIP) tasks on distributed-memory multiprocessors are available. The typical image distribution may use column, row, and block based mapping. Integrating a set of library routines for a CVIP application requires

Optimal and Efficient Algorithms for Sum

Optimal and Efficient Algorithms for Summing and Prefix Summing on Parallel Machines

✍ Eunice E. Santos 📂 Article 📅 2002 🏛 Elsevier Science 🌐 English ⚖ 360 KB

The problem of designing efficient parallel algorithms for summing and prefix summing for certain classes of the LogP model is studied. We present optimal algorithms for summing and show that any optimal summing algorithm must have a certain inherent structure. Moreover, we present optimal or near-o

Compilation and Communication Strategies

Compilation and Communication Strategies for Out-of-Core Programs on Distributed Memory Machines

✍ Rajesh Bordawekar; Alok Choudhary; J. Ramanujam 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 518 KB

program. The need for high performance I/O is so significant that almost all the present generation parallel computers such as the Paragon, SP-2, and nCUBE2. provide some kind of hardware and software support for parallel I/O [dRC94]. Data parallel languages such as High Performance Fortran (HPF) [

Parallel MP2-energy evaluation: Simulate

Parallel MP2-energy evaluation: Simulated shared memory approach on distributed memory parallel machines

✍ Limaye, Ajay C. 📂 Article 📅 1997 🏛 John Wiley and Sons 🌐 English ⚖ 155 KB 👁 1 views

A parallel algorithm for four-index transformation and MP2 energy evaluation, Ž . for distributed memory parallel MIMD machines is presented. The underlying serial algorithm for the present parallel effort is the four-index transform. The scheme works through parallelization over AO integrals and, t