✦ LIBER ✦

Models and Scheduling Algorithms for Mixed Data and Task Parallel Programs

✍ Scribed by Soumen Chakrabarti; James Demmel; Katherine Yelick

Publisher: Elsevier Science
Year: 1997
Tongue: English
Weight: 400 KB
Volume: 47
Category: Article
ISSN: 0743-7315
DOI: 10.1006/jpdc.1997.1413

No coin nor oath required. For personal study only.

✦ Synopsis

An increasing number of scientific programs exhibit two forms of parallelism, often in a nested fashion. At the outer level, the application comprises coarse-grained task parallelism, with dependencies between tasks reflected by an acyclic graph. At the inner level, each node of the graph is a data-parallel operation on arrays. Designers of languages, compilers, and runtime systems are building mechanisms to support such applications by providing processor groups and array remapping capabilities. In this paper we explore how to supplement these mechanisms with policy. What properties of an application, its data size, and the parallel machine determine the maximum potential gains from using both kinds of parallelism? It turns out that large gains can be expected only for specific task graph structures. For such applications, what are practical and effective ways to allocate processors to the nodes of the task graph? In principle one could solve the NP-complete problem of finding the best possible allocation of arbitrary processor subsets to nodes in the task graph. Instead of this, our analysis and simulations show that a simple switched scheduling paradigm, which alternates between pure task and pure data parallelism, provides nearly optimal performance for the task graphs considered here. Furthermore, our scheme is much simpler to implement, has less overhead than the optimal allocation, and would be attractive even if the optimal allocation was free to compute. To evaluate switching in real applications, we implemented a switching task scheduler in the parallel numerical library ScaLAPACK and used it in a nonsymmetric eigenvalue program. Even for fairly large input sizes, the efficiency improves by factors of 1.5 on the Intel Paragon and 2.5 on the IBM SP-2. The remapping and scheduling overhead is negligible, between 0.5 and 5%.

📜 SIMILAR VOLUMES

Optimal Use of Mixed Task and Data Paral

Optimal Use of Mixed Task and Data Parallelism for Pipelined Computations

✍ Jaspal Subhlok; Gary Vondran 📂 Article 📅 2000 🏛 Elsevier Science 🌐 English ⚖ 360 KB

This paper addresses optimal mapping of parallel programs composed of a chain of data parallel tasks onto the processors of a parallel system. The input to the programs is a stream of data sets, each of which is processed in order by the chain of tasks. This computation structure, also referred to a

Special Issue on Data Parallel Algorithm

Special Issue on Data Parallel Algorithms and Programming: Guest Editor′s Introduction

✍ J. Jaja; P.Y. Wang 📂 Article 📅 1994 🏛 Elsevier Science 🌐 English ⚖ 323 KB

Optimizing layer-based scheduling algori

Optimizing layer-based scheduling algorithms for parallel tasks with dependencies

✍ R. Kunis; G. Rünger 📂 Article 📅 2010 🏛 John Wiley and Sons 🌐 English ⚖ 485 KB

Compiling data-parallel programs for clu

Compiling data-parallel programs for clusters of SMPs

✍ Siegfried Benkner; Thomas Brandes 📂 Article 📅 2004 🏛 John Wiley and Sons 🌐 English ⚖ 233 KB

A Border-based Coordination Language for

A Border-based Coordination Language for Integrating Task and Data Parallelism

✍ Manuel Dı́az; Bartolomé Rubio; Enrique Soler; José M. Troya 📂 Article 📅 2002 🏛 Elsevier Science 🌐 English ⚖ 448 KB

This paper presents BCL, a border-based coordination language focused on the solution of numerical applications. Our approach provides a simple parallelism model. Coordination and computational aspects are clearly separated. The former are established using the coordination language and the latter a

Parallel proximal-point algorithms for m

Parallel proximal-point algorithms for mixed finite element models of flow in the subsurface

✍ Alduncin, Gonzalo ;Vera-Guzmán, Norberto 📂 Article 📅 2003 🏛 John Wiley and Sons 🌐 English ⚖ 421 KB 👁 1 views

## Abstract Parallel proximal‐point algorithms for mixed finite element models of flow in the subsurface are presented. The applied methodology corresponds to operator splitting and non‐overlapping domain decomposition methods, combined with resolvent or proximation characterizations and proximal‐p