Run-Time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures
✍ Scribed by Cong Fu; Tao Yang
- Publisher
- Elsevier Science
- Year
- 1997
- Tongue
- English
- Weight
- 372 KB
- Volume
- 42
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
✦ Synopsis
Automatic scheduling for directed acyclic graphs (DAG) and its applications for coarse-grained irregular problems such as large n-body simulation have been studied in the literature. However, solving irregular problems with mixed granularities such as sparse matrix factorization is challenging since it requires efficient run-time support to execute a DAG schedule. In this paper, we investigate run-time optimization techniques for executing general asynchronous DAG schedules on distributed memory machines and discuss an approach for exploiting parallelism from commuting operations in the DAG model. Our solution tightly integrates the run-time scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying. We present a consistency model incorporating the above optimizations, and take advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse matrix factorizations and triangular equation solving for which actual speedups are difficult to obtain. We provide a detailed experimental study on Meiko CS-2 to show that the automatically scheduled code has achieved good performance for these difficult problems, and the run-time overhead is small compared to total execution times.