<p>Distributed-memory multiprocessing systems (DMS), such as Intel's hypercubes, the Paragon, Thinking Machine's CM-5, and the Meiko Computing Surface, have rapidly gained user acceptance and promise to deliver the computing power required to solve the grand challenge problems of Science and Enginee
Automatic Performance Prediction of Parallel Programs
β Scribed by Thomas Fahringer (auth.)
- Publisher
- Springer US
- Year
- 1996
- Tongue
- English
- Leaves
- 278
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Automatic Performance Prediction of Parallel Programs presents a unified approach to the problem of automatically estimating the performance of parallel computer programs. The author focuses primarily on distributed memory multiprocessor systems, although large portions of the analysis can be applied to shared memory architectures as well.
The author introduces a novel and very practical approach for predicting some of the most important performance parameters of parallel programs, including work distribution, number of transfers, amount of data transferred, network contention, transfer time, computation time and number of cache misses. This approach is based on advanced compiler analysis that carefully examines loop iteration spaces, procedure calls, array subscript expressions, communication patterns, data distributions and optimizing code transformations at the program level; and the most important machine specific parameters including cache characteristics, communication network indices, and benchmark data for computational operations at the machine level.
The material has been fully implemented as part of P3T, which is an integrated automatic performance estimator of the Vienna Fortran Compilation System (VFCS), a state-of-the-art parallelizing compiler for Fortran77, Vienna Fortran and a subset of High Performance Fortran (HPF) programs.
A large number of experiments using realistic HPF and Vienna Fortran code examples demonstrate highly accurate performance estimates, and the ability of the described performance prediction approach to successfully guide both programmer and compiler in parallelizing and optimizing parallel programs.
A graphical user interface is described and displayed that visualizes each program source line together with the corresponding parameter values. P3T uses color-coded performance visualization to immediately identify hot spots in the parallel program. Performance data can be filtered and displayed at various levels of detail. Colors displayed by the graphical user interface are visualized in greyscale.
Automatic Performance Prediction of Parallel Programs also includes coverage of fundamental problems of automatic parallelization for distributed memory multicomputers, a description of the basic parallelization strategy and a large variety of optimizing code transformations as included under VFCS.
β¦ Table of Contents
Front Matter....Pages i-xix
Introduction....Pages 1-13
Model....Pages 15-45
Sequential Program Parameters....Pages 47-71
Parallel Program Parameters....Pages 73-189
Experiments....Pages 191-214
Related Work....Pages 215-225
Conclusions....Pages 227-233
Back Matter....Pages 235-271
β¦ Subjects
Processor Architectures
π SIMILAR VOLUMES
<p><em>Performance Evaluation, Prediction and Visualization in Parallel</em><em>Systems</em> presents a comprehensive and systematic discussion of theoretics, methods, techniques and tools for performance evaluation, prediction and visualization of parallel systems. Chapter 1 gives a short overview
<p><span>This textbook focuses on practical parallel C++ programming at the graduate student level. In particular, it shows the APIs and related language features in the C++ 17 and C++ 20 standards, covering both single node and distributed systems. It shows that with the parallel features in the C+
<p><span>This textbook focuses on practical parallel C++ programming at the graduate student level. In particular, it shows the APIs and related language features in the C++ 17 and C++ 20 standards, covering both single node and distributed systems. It shows that with the parallel features in the C+