Application controlled checkpointing coordination for fault-tolerant distributed computing systems
โ Scribed by Taesoon Park; Heon Y. Yeom
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 826 KB
- Volume
- 26
- Category
- Article
- ISSN
- 0167-8191
No coin nor oath required. For personal study only.
โฆ Synopsis
In order to provide fault tolerance for distributed systems, the checkpointing technique has widely been used and many researches have been performed to reduce the overhead of checkpointing coordination. In this paper, we present a new checkpointing coordination scheme in which the application controls the coordination activity by utilizing the communication pattern of the application program. Unlike the previous solutions which do not utilize the communication pattern of cooperating processes, it is possible to reduce the coordination eort as well as the number of checkpoints enforced to be taken. Extensive simulations have been performed to evaluate the proposed scheme and we have concluded that the proposed scheme signiยฎcantly reduces the coordination overhead compared with the existing loose coordination scheme.
๐ SIMILAR VOLUMES
A distributed fault tolerant middleware system called ARTEMIS (Advanced Reliable disTributed Environment MIddleware System) was developed for the purpose of building fault tolerant systems without modifying either the source code or the binary code of application programs in open systems. In ARTEMIS