Design, implementation and evaluation of ICARE: an efficient recoverable DSM
✍ Scribed by A.-M. Kermarrec; C. Morin; M. Banâtre
- Publisher
- John Wiley and Sons
- Year
- 1998
- Tongue
- English
- Weight
- 348 KB
- Volume
- 28
- Category
- Article
- ISSN
- 0038-0644
No coin nor oath required. For personal study only.
✦ Synopsis
In the light of the increasing throughput of local area networks, Networks Of Workstations (NOWs) which provide a Distributed Shared Memory (DSM) have become a convenient and cheaper alternative to parallel architectures in the framework of parallel scientific applications. However, the probability that a failure occurs in such a system made up of a large number of components must not be neglected, especially for long-running applications. This paper presents the design, implementation and performance evaluation of ICARE, a page-based recoverable DSM implemented on top of an ATM-based NOW running the CHORUS microkernel. ICARE relies on a Backward Error Recovery (BER) mechanism, and provides a way to combine both efficiency and high-availability. The fact that checkpoints are stored in volatile memory provides a low-cost fault-tolerance mechanism, as well as the opportunity to exploit the symbiotic relationship between the data replication implemented in DSM systems and that needed for fault-tolerance. Furthermore, ICARE efficiently implements transparent process rollback recovery. Performance evaluations show the efficiency of the ICARE prototype that implements the proposed algorithms.
📜 SIMILAR VOLUMES
Many computer applications today require some form of distributed computing to allow different software components to communicate. Several different commercial products now exist based on the Common Object Request Broker Architecture (CORBA) of the Object Management Group. The use of such tools, how