Architectures and message-passing algorithms for cluster computing: Design and performance
โ Scribed by Edward K. Blum; Xin Wang; Patrick Leung
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 239 KB
- Volume
- 26
- Category
- Article
- ISSN
- 0167-8191
No coin nor oath required. For personal study only.
โฆ Synopsis
This paper considers the architecture of clusters and related message-passing (MP) software algorithms and their eect on performance (speedup and eciency) of cluster computing (CC). We present new architectures for multi-segment Ethernet clusters and new MP algorithms which ยฎt these architectures. The multiple segments (e.g. commodity hubs) connect commodity processor nodes so as to allow MP to be highly parallelized by avoiding network contention and collisions in many applications where the all-gather and other collective operations are central. We analyze all-gather in some detail, and present new network topologies and new MP algorithms to minimize latency. The new topologies are based on a design, called two-by-four nets 2 ร 4 nets, by Compbionics. An integrated MP software system, called Reduced Overhead Cluster Communication (ROCC), which embodies the MP algorithms is also described. In brief, 2 ร 4 nets are networks of ``supernodes'', called 2 ร 4's, each having 4 processors on 2 segments and segments usually being Ethernet hubs. The supernodes are typically connected to form rings or tori of supernodes. We present actual test results and supporting analyses to demonstrate that 2 ร 4 nets with the ROCC MP software are faster than many existing clusters and generally less costly.
๐ SIMILAR VOLUMES
This paper discusses a multithreaded software architecture for messagepassing interface (MPI) software specification. The architecture is thread-safe, allows for concurrent communication over several communications media (multifabric communication), efficiently utilizes available hardware concurrenc
The "nite di!erence time domain method (FDTD) solves Maxwell's equations by employing numerically and storage intensive computation to map the electric and magnetic "elds within a "nite volume as an explicit function of time. Distributed computation, using heterogeneous networks of computers, is a c
Today's massively parallel machines are typically message-passing systems consisting of hundreds or thousands of processors. Implementing parallel applications efficiently in this environment is a challenging task, and poor parallel design decisions can be expensive to correct. Tools and techniques
This paper provides a survey of both architectural and algorithmic aspects of solving problems using parallel processors with ring, torus and hypercube interconnection.