✦ LIBER ✦

A performance debugging tool for high performance Fortran programs

✍ Scribed by Suzuoka, Takashi; Subhlok, Jaspal; Gross, Thomas

Publisher: John Wiley and Sons
Year: 1997
Tongue: English
Weight: 276 KB
Volume: 9
Category: Article
ISSN: 1040-3108
DOI: 10.1002/(sici)1096-9128(199710)9:10<927::aid-cpe278>3.0.co;2-2

No coin nor oath required. For personal study only.

✦ Synopsis

Parallel languages allow the programmer to express parallelism at a high level. The management of parallelism and the generation of interprocessor communication is left to the compiler and the runtime system. This approach to parallel programming is particularly attractive if a suitable widely accepted parallel language is available. High Performance Fortran (HPF) has emerged as the first popular machine independent parallel language, and remarkable progress has been made towards compiling HPF efficiently. However, the performance of HPF programs is often poor and unpredictable, and obtaining adequate performance is a major stumbling block that must be overcome if HPF is to gain widespread acceptance. The programmer is often in the dark about how to improve the performance of an HPF program since poor performance can be attributed to a variety of reasons, including poor choice of algorithm, limited use of parallelism, or an inefficient data mapping.

This paper presents a profiling tool that allows the programmer to identify the regions of the program that execute inefficiently, and to focus on the potential causes of poor performance. The central idea is to distinguish the code that is executing efficiently from the code that is executing poorly. Efficient code uses all processors of a parallel system to make progress, while inefficient code causes processors to wait, execute replicated code, idle, communicate, or perform compiler bookkeeping. We designate the latter code as non-scalable, since adding more processors generally does not lead to improved performance for such code. By analogy, the former code is called scalable. The tool presented here separates a program into scalable and non-scalable components and identifies the causes of non-scalability of different components. We show that compiler information is the key to dividing the execution times into logical categories that are meaningful to the programmer. We present the design and implementation of a profiler that is integrated with Fx, a compiler for a variant of HPF. The paper includes two examples that demonstrate how the data reported by the profiler are used to identify and resolve performance bugs in parallel programs.

📜 SIMILAR VOLUMES

High Performance Python Practical Performant Programming for Humans

✍ Safari, an O'Reilly Media Company.;Gorelick, Micha;Ozsvald, Ian 📂 Fiction 📅 2020 🏛 O'Reilly Media, Inc. 🌐 en-US ⚖ 228 KB 👁 2 views

Your Python code may run correctly, but you need it to run faster. By exploring the fundamental theory behind design choices, the updated edition of this practical guide, expanded and enhanced for Python 3, helps you gain a deeper understanding of Python’s implementation. You’ll learn how to locate

High-Performance Heterogeneous Computing

High-Performance Heterogeneous Computing || Programming Systems for High-Performance Heterogeneous Computing

✍ Lastovetsky, Alexey L.; Dongarra, Jack J. 📂 Article 📅 2009 🏛 John Wiley & Sons, Inc. 🌐 English ⚖ 300 KB

an Analytical Overview Of The State Of The Art, Open Problems, And Future Trends In Heterogeneous Parallel And Distributed Computing this Book Provides An Overview Of The Ongoing Academic Research, Development, And Uses Of Heterogeneous Parallel And Distributed Computing In The Context Of Scientifi

Just-in-time optimizations for high-perf

Just-in-time optimizations for high-performance Java programs

✍ Cierniak, Michał; Li, Wei 📂 Article 📅 1997 🏛 John Wiley and Sons 🌐 English ⚖ 93 KB 👁 2 views

Our previous experience with an off-line Java optimizer has shown that some traditional algorithms used in compilers are too slow for a JIT compiler. In this paper we propose and implement faster ways of performing analyses needed for our optimizations. For instance, we have replaced reaching defini

A portable browser for performance progr

A portable browser for performance programming

✍ Cierniak, Michał; Srinivas, Suresh 📂 Article 📅 1997 🏛 John Wiley and Sons 🌐 English ⚖ 75 KB 👁 1 views

We present jCITE, a performance tuning tool for scientific applications. By combining the static information produced by the compiler with the profile data from real program execution, jCITE can be used to quickly understand the performance bottlenecks. The compiler information allows great understa

A benchmark suite for high performance J

A benchmark suite for high performance Java

✍ Bull, J. M. ;Smith, L. A. ;Westhead, M. D. ;Henty, D. S. ;Davey, R. A. 📂 Article 📅 2000 🏛 John Wiley and Sons 🌐 English ⚖ 267 KB

Performance issues for frontal schemes o

Performance issues for frontal schemes on a cache-based high-performance computer

✍ K. A. Cliffe; I. S. Duff; J. A. Scott 📂 Article 📅 1998 🏛 John Wiley and Sons 🌐 English ⚖ 150 KB 👁 1 views

We consider the implementation of a frontal code for the solution of large sparse unsymmetric linear systems on a high-performance computer where data must be in the cache before arithmetic operations can be performed on it. In particular, we show how we can modify the frontal solution algorithm to