๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Optimal checkpointing interval for two-level recovery schemes

โœ Scribed by Kenichiro Naruse; Shizuka Umemura; Sayori Nakagawa


Publisher
Elsevier Science
Year
2006
Tongue
English
Weight
294 KB
Volume
51
Category
Article
ISSN
0898-1221

No coin nor oath required. For personal study only.

โœฆ Synopsis


It is important to design computer systems to tolerate some failures. This paper proposes two-level recovery schemes, soft checkpoint (SC) and hard checkpoint (HC), which are useful to recover from failures. Soft checkpoint is less reliable and less overhead than those of HC, and is set up between HCs to reduce the overhead of the process. The total expected overhead of one cycle from HC to HC is obtained, using Markov renewal processes, and an optimal interval which minimizes it is computed. It is shown in a numerical example that a two-level recovery scheme can achieve a good performance. (~ 2006 Elsevier Ltd. All rights reserved.


๐Ÿ“œ SIMILAR VOLUMES


Optimal checkpointing intervals for a do
โœ S. Nakagawa; S. Fukumoto; N. Ishii ๐Ÿ“‚ Article ๐Ÿ“… 2003 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 336 KB

## This paper considers checkpointing intervals for a double modular redundancy (DMR) with signatures: a signature is a mapping of the original space into a much smaller spsce and represents the state of each processor. An execution time of a task is divided equally into n intervals, and at the en

Optimization models for recovery block s
โœ Oded Berman; U.Dinesh Kumar ๐Ÿ“‚ Article ๐Ÿ“… 1999 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 889 KB

This paper presents optimization models for a fault tolerant software by selecting a set of versions for a given program. The objective is to maximize the reliability of the software satisfying a budget limitation. Optimization models are developed for two block recovery schemes: (1) independent rec

Optimal Recovery Schemes for High-Availa
โœ Lars Lundberg; Charlie Svahnberg ๐Ÿ“‚ Article ๐Ÿ“… 2001 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 368 KB

Clusters and distributed systems offer two important advantages, viz. fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these compute