๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Models and Scheduling Algorithms for Mixed Data and Task Parallel Programs

โœ Scribed by Soumen Chakrabarti; James Demmel; Katherine Yelick


Publisher
Elsevier Science
Year
1997
Tongue
English
Weight
400 KB
Volume
47
Category
Article
ISSN
0743-7315

No coin nor oath required. For personal study only.

โœฆ Synopsis


An increasing number of scientific programs exhibit two forms of parallelism, often in a nested fashion. At the outer level, the application comprises coarse-grained task parallelism, with dependencies between tasks reflected by an acyclic graph. At the inner level, each node of the graph is a data-parallel operation on arrays. Designers of languages, compilers, and runtime systems are building mechanisms to support such applications by providing processor groups and array remapping capabilities. In this paper we explore how to supplement these mechanisms with policy. What properties of an application, its data size, and the parallel machine determine the maximum potential gains from using both kinds of parallelism? It turns out that large gains can be expected only for specific task graph structures. For such applications, what are practical and effective ways to allocate processors to the nodes of the task graph? In principle one could solve the NP-complete problem of finding the best possible allocation of arbitrary processor subsets to nodes in the task graph. Instead of this, our analysis and simulations show that a simple switched scheduling paradigm, which alternates between pure task and pure data parallelism, provides nearly optimal performance for the task graphs considered here. Furthermore, our scheme is much simpler to implement, has less overhead than the optimal allocation, and would be attractive even if the optimal allocation was free to compute. To evaluate switching in real applications, we implemented a switching task scheduler in the parallel numerical library ScaLAPACK and used it in a nonsymmetric eigenvalue program. Even for fairly large input sizes, the efficiency improves by factors of 1.5 on the Intel Paragon and 2.5 on the IBM SP-2. The remapping and scheduling overhead is negligible, between 0.5 and 5%.


๐Ÿ“œ SIMILAR VOLUMES


Optimal Use of Mixed Task and Data Paral
โœ Jaspal Subhlok; Gary Vondran ๐Ÿ“‚ Article ๐Ÿ“… 2000 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 360 KB

This paper addresses optimal mapping of parallel programs composed of a chain of data parallel tasks onto the processors of a parallel system. The input to the programs is a stream of data sets, each of which is processed in order by the chain of tasks. This computation structure, also referred to a

A Border-based Coordination Language for
โœ Manuel Dฤฑฬaz; Bartolomรฉ Rubio; Enrique Soler; Josรฉ M. Troya ๐Ÿ“‚ Article ๐Ÿ“… 2002 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 448 KB

This paper presents BCL, a border-based coordination language focused on the solution of numerical applications. Our approach provides a simple parallelism model. Coordination and computational aspects are clearly separated. The former are established using the coordination language and the latter a

Parallel proximal-point algorithms for m
โœ Alduncin, Gonzalo ;Vera-Guzmรกn, Norberto ๐Ÿ“‚ Article ๐Ÿ“… 2003 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 421 KB ๐Ÿ‘ 1 views

## Abstract Parallel proximalโ€point algorithms for mixed finite element models of flow in the subsurface are presented. The applied methodology corresponds to operator splitting and nonโ€overlapping domain decomposition methods, combined with resolvent or proximation characterizations and proximalโ€p