Fault tolerance in the execution of remote jobs on idling workstations
✍ Scribed by Yang, Cul-Qing ;Qu, Yaoshuang
- Publisher
- John Wiley and Sons
- Year
- 1995
- Tongue
- English
- Weight
- 824 KB
- Volume
- 7
- Category
- Article
- ISSN
- 1040-3108
No coin nor oath required. For personal study only.
✦ Synopsis
Many workstation-based distributed systems allow programs to be executed on remote idling machines for effective utilization of system resources. Usually, the control policies in these systems force a remote job be discontinued by the arrival of local jobs to guarantee the autonomy of individual workstations. Therefore, one special concern in the design of such systems is the fault-tolerant aspects for the execution of remote jobs. In the paper we discuss two control policies of workstation-based distributed systems, checkpointing and non-checkpointing policy, which support fault-tolerant execution of remote jobs on idling workstations. An analytical analysis on the reliability and mean turnaround time of the execution of remote jobs are conducted for both control policies. The optimal time interval between checkpoints in the checkpointing policy is formulated based on the given reliability and overhead of the system. In addition, several sample results derived from these analyses are compared with the outcome of corresponding simulation programs. Some observations of fault-tolerant features of each control policy are thereupon presented as guidelines for the future development of such workstation-based distributed systems.