Parallel Array Object I/O Support on Distributed Environments
✍ Scribed by Jenq Kuen Lee; Ing-Kuen Tsaur; San-Yih Hwang
- Publisher
- Elsevier Science
- Year
- 1997
- Tongue
- English
- Weight
- 482 KB
- Volume
- 40
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
✦ Synopsis
sor is associated with a local disk, and the information exchange between disks of different processors must go through an interconnection network.
Recent research efforts in parallel programming languages have been concentrated on specifying regular dimension-wise data distribution patterns for arrays among parallel machines with distributed memories. Languages supporting the distributed array and data distribution concepts include F90-D [14], HPF [18], parallel Cϩϩ languages [4,19],. The data distribution concepts and SPMD programming model deliver scalable performance on parallel machines for the computational part. It however does not solve the problem with the I/O part. The lack of parallel I/O support creates two problems. First, I/O is executed serially, which results in performance bottlenecks according to Amdahl's law [1]. Second, the array is distributed among different processors. Due to the lack of support of distribution in the file level, each processor must read the whole set of data from disk to memory and store the data in a temporary buffer, and then assign the data into the distributed array it owns. This creates extra burdens on programmers to keep two sets of data structures and results in extra storage use and program codes.
In this paper, we extend the concept of array distributions in HPF [18] and parallel Cϩϩ languages [4,19,24] from memory levels to file levels. There are three key elements in this work. First, we support parallel file objects with random access. There is a unique name for each array object in a parallel I/O unit. A parallel I/O unit can be either a file or a pipe in the conventional Unix sense, but now it is a parallel file or pipe and must be accessed by parallel file operators provided by our libraries. (For example, we now need to use pcat to cat names and contents of all array objects in a parallel I/O unit.) The access of array objects in a parallel I/O unit is no longer by sequential order but instead by the name of objects. The use of a unique name for an object in the secondary storage environment helps us to track down the access patterns to a particular object and allows us to do efficient implementation of parallel array object I/O. Second, when objects are read and/or written by multiple applications using different distributions, we provide an interactive environment for programmers to specify the inter-application I/O dependence, represented as a graph. We further provide a novel