✦ LIBER ✦

Evaluation of Collective I/O Implementations on Parallel Architectures

✍ Scribed by Phillip M. Dickens; Rajeev Thakur

Publisher: Elsevier Science
Year: 2001
Tongue: English
Weight: 269 KB
Volume: 61
Category: Article
ISSN: 0743-7315
DOI: 10.1006/jpdc.2000.1733

No coin nor oath required. For personal study only.

✦ Synopsis

In this paper, we evaluate the impact on performance of various implementation techniques for collective IÂO operations, and we do so across four important parallel architectures. We show that a naive implementation of collective IÂ0 does not result in significant performance gains for any of the architectures, but that an optimized implementation does provide excellent performance across all of the platforms under study. Furthermore, we demonstrate that there exists a single implementation strategy that provides the best performance for all four computational platforms. Next, we evaluate implementation techniques for thread-based collective IÂO operations. We show that the most obvious implementation technique, which is to spawn a thread to execute the whole collective IÂO operation in the background, frequently provides the worst performance, often performing much worse than just executing the collective IÂO routine entirely in the foreground. To improve performance, we explore an alternate approach where part of the collective IÂO operation is performed in the background, and part is performed in the foreground. We demonstrate that this implementation technique can provide significant performance gains, offering up to a 50 0 improvement over implementations that do not attempt to overlap collective IÂO and computation.

📜 SIMILAR VOLUMES

Parallel implementations of the False Ne

Parallel implementations of the False Nearest Neighbors method for distributed memory architectures

✍ I. Marín Carrión; E. Arias Antúnez; M. M. Artigao Castillo; J. J. Miralles Canal 📂 Article 📅 2010 🏛 John Wiley and Sons 🌐 English ⚖ 815 KB

Evaluation of a spectral element CFD cod

Evaluation of a spectral element CFD code on parallel architectures

✍ N. Floros; J.S. Reeve 📂 Article 📅 1995 🏛 Elsevier Science 🌐 English ⚖ 825 KB

Implementation and performance evaluatio

Implementation and performance evaluation of a memory-coupled scalable node architecture MESCAR for parallel and distributed processing

✍ Shigeki Yamada; Satoshi Tanaka; Akira Tanaka; Ryo Mukai 📂 Article 📅 2001 🏛 John Wiley and Sons 🌐 English ⚖ 561 KB

Efficient Parallel Implementation of Mol

Efficient Parallel Implementation of Molecular Dynamics on a Toroidal Network. Part I. Parallelizing Strategy

✍ K. Esselink; B. Smit; P.A.J. Hilbers 📂 Article 📅 1993 🏛 Elsevier Science 🌐 English ⚖ 365 KB

Design and Implementation of a Parallel

Design and Implementation of a Parallel I/O Runtime System for Irregular Applications

✍ Jaechun No; Sung-soon Park; Jesus Carretero Perez; Alok Choudhary 📂 Article 📅 2002 🏛 Elsevier Science 🌐 English ⚖ 443 KB

We present the design, implementation, and evaluation of a runtime system based on collective I/O techniques for irregular applications. The design is motivated by the requirements of a large number of science and engineering applications including teraflops applications, where the data must be reor

Implementation of a Numerical Solution o

Implementation of a Numerical Solution of the Multicomponent Kinetic Collection Equation (MKCE) on Parallel Computers

✍ Tamir G. Reisin; Sabine C. Wurzler 📂 Article 📅 2001 🏛 Elsevier Science 🌐 English ⚖ 779 KB

Two different numerical solutions of the two-component kinetic collection equation were implemented on parallel computers. The parallelization approach included domain decomposition and MPI commands for communications. Four different parallel codes were tested. A dynamic decomposition based on an oc