Fully Scalable Fault-Tolerant Simulations for BSP and CGM
โ Scribed by Sung-Ryul Kim; Kunsoo Park
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 392 KB
- Volume
- 60
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
โฆ Synopsis
In this paper we consider general simulations of algorithms designed for fully operational BSP and CGM machines on machines with faulty processors. The BSP (or CGM) machine is a parallel multicomputer consisting of p processors for which a memory of n words is evenly distributed and each processor can send and receive at most h messages in a superstep. The faults are deterministic (i.e., worst-case distributions of faults are considered) and static (i.e., they do not change in the course of computation). We assume that a constant fraction of processors are faulty.
We present two fault-tolerant simulation techniques for BSP and CGM:
-
A deterministic simulation that achieves O(1) slowdown for local computations and O((log h p) 2 ) slowdown for communications per superstep, provided that a preprocessing is done that requires O((log h p) 2 ) supersteps and linear (in h) computation per processor in each superstep.
-
A randomized simulation that achieves O(1) slowdown for local computations and O(log h p) slowdown for communications per superstep with high probability, after the same (deterministic) preprocessing as above.
Our results are fully scalable over all values of p from 3(1) to 3(n). Furthermore, our results imply that if p n = for 0<=<1 and h=3((nร p) $ ) for 0<$ 1 (which hold in almost all practical BSP and CGM computations), algorithms can be made resilient to a constant fraction of processor faults without any asymptotic slowdown.
๐ SIMILAR VOLUMES
## I/O CONTROLLERS The study was aimed at developing tools for improving the reliability of computer I/O controllers. This has remained neglected for a long time and only in the past few years have researchers started working on this problem.