𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Fault tolerance in the execution of remote jobs on idling workstations

✍ Scribed by Yang, Cul-Qing ;Qu, Yaoshuang


Publisher
John Wiley and Sons
Year
1995
Tongue
English
Weight
824 KB
Volume
7
Category
Article
ISSN
1040-3108

No coin nor oath required. For personal study only.

✦ Synopsis


Many workstation-based distributed systems allow programs to be executed on remote idling machines for effective utilization of system resources. Usually, the control policies in these systems force a remote job be discontinued by the arrival of local jobs to guarantee the autonomy of individual workstations. Therefore, one special concern in the design of such systems is the fault-tolerant aspects for the execution of remote jobs. In the paper we discuss two control policies of workstation-based distributed systems, checkpointing and non-checkpointing policy, which support fault-tolerant execution of remote jobs on idling workstations. An analytical analysis on the reliability and mean turnaround time of the execution of remote jobs are conducted for both control policies. The optimal time interval between checkpoints in the checkpointing policy is formulated based on the given reliability and overhead of the system. In addition, several sample results derived from these analyses are compared with the outcome of corresponding simulation programs. Some observations of fault-tolerant features of each control policy are thereupon presented as guidelines for the future development of such workstation-based distributed systems.