This paper proposes a new method for the problem of minimizing the execution time of nested for-loops using a tiling transformation. In our approach, we are interested not only in tile size and shape according to the required communication to computation ratio, but also in overall completion time. W
Tiling on systems with communication/computation overlap
β Scribed by Calland, Pierre-Yves; Dongarra, Jack; Robert, Yves
- Book ID
- 101219470
- Publisher
- John Wiley and Sons
- Year
- 1999
- Tongue
- English
- Weight
- 123 KB
- Volume
- 11
- Category
- Article
- ISSN
- 1040-3108
No coin nor oath required. For personal study only.
β¦ Synopsis
In the framework of fully permutable loops, tiling is a compiler technique (also known as 'loop blocking') that has been extensively studied as a source-to-source program transformation. Little work has been devoted to the mapping and scheduling of the tiles on to physical parallel processors. We present several new results in the context of limited computational resources and assuming communication-computation overlap. In particular, under some reasonable assumptions, we derive the optimal mapping and scheduling of tiles to physical processors.
π SIMILAR VOLUMES
However, the speedup achieved through parallelism is often lower in modern systems. It is no surprise, then, that developers of compilers for data-parallel languages have hypothesized the importance of optimizations that overlap communications with computations in order to reduce execution times and
We outline the essential features of a Linux PC cluster which is now being developed at National Taiwan University, and discuss how to optimize its hardware and software for lattice QCD with overlap Dirac quarks. At present, the cluster constitutes of 30 nodes, with each node consisting of one Penti