The books selected for review address how to effectively manage data and performance. In Data Driven, Redman provides advice on managing and unlocking the value of an organization's data. Cokins, in Performance Management, provides an overview of multiple performance measurement methodologies and li
High-performance scientific data management system
β Scribed by Jaechun No; Rajeev Thakur; Alok Choudhary
- Publisher
- Elsevier Science
- Year
- 2003
- Tongue
- English
- Weight
- 299 KB
- Volume
- 63
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
β¦ Synopsis
Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions have been used for this task: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that combines the good features of both file I/O and databases. SDM provides a high-level application programming interface to the user and, internally, uses a parallel file system to store real data (using various I/O optimizations available in MPI-IO) and a database to store application-related metadata. In order to support I/O in irregular applications, SDM makes extensive use of MPI-IO's noncontiguous collective I/O functions. Moreover, SDM uses the concept of a history file to optimize the cost of the index distribution using the metadata stored in database. We describe the design and implementation of SDM and present performance results with two regular applications, ASTRO3D and an Euler solver, and with two irregular applications, a CFD code called FUN3D and a Rayleigh-Taylor instability code.
π SIMILAR VOLUMES
Modern networked computing environments and applications often require-or can benefit from-the use of multiple communication substrates, transport mechanisms, and protocols, chosen according to where communication is directed, what is communicated, or when communication is performed. We propose tech
## Abstract Preservation of data for longβterm use will require data management strategies that include curation and preservation planning and implementation. While data management and curatorial activities have been an integral part of some scientific domains for years (see for example, high energ