𝔖 Bobbio Scriptorium
✦   LIBER   ✦

High-performance scientific data management system

✍ Scribed by Jaechun No; Rajeev Thakur; Alok Choudhary


Publisher
Elsevier Science
Year
2003
Tongue
English
Weight
299 KB
Volume
63
Category
Article
ISSN
0743-7315

No coin nor oath required. For personal study only.

✦ Synopsis


Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions have been used for this task: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that combines the good features of both file I/O and databases. SDM provides a high-level application programming interface to the user and, internally, uses a parallel file system to store real data (using various I/O optimizations available in MPI-IO) and a database to store application-related metadata. In order to support I/O in irregular applications, SDM makes extensive use of MPI-IO's noncontiguous collective I/O functions. Moreover, SDM uses the concept of a history file to optimize the cost of the index distribution using the metadata stored in database. We describe the design and implementation of SDM and present performance results with two regular applications, ASTRO3D and an Euler solver, and with two irregular applications, a CFD code called FUN3D and a Rayleigh-Taylor instability code.


πŸ“œ SIMILAR VOLUMES


Managing data and performance
✍ David M. Cannon; Joseph H. Godwin; Stephen R. Goldberg πŸ“‚ Article πŸ“… 2009 πŸ› John Wiley and Sons 🌐 English βš– 58 KB

The books selected for review address how to effectively manage data and performance. In Data Driven, Redman provides advice on managing and unlocking the value of an organization's data. Cokins, in Performance Management, provides an overview of multiple performance measurement methodologies and li

Managing Multiple Communication Methods
✍ Ian Foster; Jonathan Geisler; Carl Kesselman; Steven Tuecke πŸ“‚ Article πŸ“… 1997 πŸ› Elsevier Science 🌐 English βš– 352 KB

Modern networked computing environments and applications often require-or can benefit from-the use of multiple communication substrates, transport mechanisms, and protocols, chosen according to where communication is directed, what is communicated, or when communication is performed. We propose tech

Managing scientific data for long-term a
✍ Melissa H. Cragin; W. John MacMullen; Jillian Wallis; Ann Zimmerman; Anna Gold πŸ“‚ Article πŸ“… 2007 πŸ› Wiley (John Wiley & Sons) 🌐 English βš– 33 KB

## Abstract Preservation of data for long‐term use will require data management strategies that include curation and preservation planning and implementation. While data management and curatorial activities have been an integral part of some scientific domains for years (see for example, high energ