A Compiler-Directed Approach to Network Latency Reduction for Distributed Shared Memory Multiprocessors
✍ Scribed by Sibabrata Ray; Hong Jiang; Qing Yang
- Publisher
- Elsevier Science
- Year
- 1996
- Tongue
- English
- Weight
- 254 KB
- Volume
- 38
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
✦ Synopsis
and/or devising improved routing disciplines in the case of distributed memory architecture, to reduce the expected access time for a variable. Extensive work has been done on both cache design and message routing.
In this paper a new shared-data approach is taken to attack the problem. We consider a parallel/distributed computing environment where there are no cache coherence mechanisms in the traditional sense, and some or most of the communication takes place in the form of sharing certain variables. Due to the absence of cache coherence or any other hardware mechanism to enforce data coherency, multiple copies of a shared variable may not exist in the system except when the variable is read only. There are several multiprocessors and multicomputers that do not incorporate any cache coherence mechanism. Examples of such systems are Hector of the University of Toronto, BBN Butterfly, Intel's Paragon, nCUBE, etc. Alliants CAMPUS/800 provides cache inside clusters of processors, but there are no intercluster caches. One noticeable advantage of such systems is the avoidance of overheads associated with coherence protocols in network latency and hardware cost. In [6], Klaiber and Levy showed that for certain classes of parallel applications, including scientific applications and iterative algorithms, cache coherent machines do much worse than both NUMA (nonuniform memory access) and message passing architectures. Further, in recent years the notion of workstation cluster, or network of workstations, NOW for short, has become popular because of its low cost and its potentials for higher utilization of already existing workstations which would otherwise be idle most of the time. NOW is a loosely coupled environment where homogeneous and/or heterogeneous workstations are interconnected through a local area network. It would clearly be costly, even if suitable, to incorporate caches in such an environment. The communication need of a parallel/distributed system is served using one of the following two methods, (1) using shared variables and (2) by passing messages. The communication requirement arising from synchronization of critical sections (e.g., monitors, semaphores), synchronization of parallel programming constructs (e.g., fork/join, parallel loops, barrier synchronization) or data redistributions falls under the first category. Further, data sharing is inevitable in