Your Python code may run correctly, but you need it to run faster. By exploring the fundamental theory behind design choices, the updated edition of this practical guide, expanded and enhanced for Python 3, helps you gain a deeper understanding of Pythonβs implementation. Youβll learn how to locate
A performance debugging tool for high performance Fortran programs
β Scribed by Suzuoka, Takashi; Subhlok, Jaspal; Gross, Thomas
- Publisher
- John Wiley and Sons
- Year
- 1997
- Tongue
- English
- Weight
- 276 KB
- Volume
- 9
- Category
- Article
- ISSN
- 1040-3108
No coin nor oath required. For personal study only.
β¦ Synopsis
Parallel languages allow the programmer to express parallelism at a high level. The management of parallelism and the generation of interprocessor communication is left to the compiler and the runtime system. This approach to parallel programming is particularly attractive if a suitable widely accepted parallel language is available. High Performance Fortran (HPF) has emerged as the first popular machine independent parallel language, and remarkable progress has been made towards compiling HPF efficiently. However, the performance of HPF programs is often poor and unpredictable, and obtaining adequate performance is a major stumbling block that must be overcome if HPF is to gain widespread acceptance. The programmer is often in the dark about how to improve the performance of an HPF program since poor performance can be attributed to a variety of reasons, including poor choice of algorithm, limited use of parallelism, or an inefficient data mapping.
This paper presents a profiling tool that allows the programmer to identify the regions of the program that execute inefficiently, and to focus on the potential causes of poor performance. The central idea is to distinguish the code that is executing efficiently from the code that is executing poorly. Efficient code uses all processors of a parallel system to make progress, while inefficient code causes processors to wait, execute replicated code, idle, communicate, or perform compiler bookkeeping. We designate the latter code as non-scalable, since adding more processors generally does not lead to improved performance for such code. By analogy, the former code is called scalable. The tool presented here separates a program into scalable and non-scalable components and identifies the causes of non-scalability of different components. We show that compiler information is the key to dividing the execution times into logical categories that are meaningful to the programmer. We present the design and implementation of a profiler that is integrated with Fx, a compiler for a variant of HPF. The paper includes two examples that demonstrate how the data reported by the profiler are used to identify and resolve performance bugs in parallel programs.
π SIMILAR VOLUMES
an Analytical Overview Of The State Of The Art, Open Problems, And Future Trends In Heterogeneous Parallel And Distributed Computing this Book Provides An Overview Of The Ongoing Academic Research, Development, And Uses Of Heterogeneous Parallel And Distributed Computing In The Context Of Scientifi
Our previous experience with an off-line Java optimizer has shown that some traditional algorithms used in compilers are too slow for a JIT compiler. In this paper we propose and implement faster ways of performing analyses needed for our optimizations. For instance, we have replaced reaching defini
We present jCITE, a performance tuning tool for scientific applications. By combining the static information produced by the compiler with the profile data from real program execution, jCITE can be used to quickly understand the performance bottlenecks. The compiler information allows great understa
We consider the implementation of a frontal code for the solution of large sparse unsymmetric linear systems on a high-performance computer where data must be in the cache before arithmetic operations can be performed on it. In particular, we show how we can modify the frontal solution algorithm to