𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Units of Computation in Fault-Tolerant Distributed Systems

✍ Scribed by Mohan Ahuja; Shivakant Mishra


Publisher
Elsevier Science
Year
1997
Tongue
English
Weight
493 KB
Volume
40
Category
Article
ISSN
0743-7315

No coin nor oath required. For personal study only.

✦ Synopsis


We develop a framework that helps in understanding a faulttolerant distributed system and so aids in designing such systems. We illustrate the uses of the developed work in application areas such as checkpointing and recovery, phase termination detection, stable property detection, implementing membership protocols, debugging, and design of programming languages. We define a unit of computation, and refer to it as a molecule. A molecule has a well defined interface with other molecules. The smallest such unit-an indivisible molecule-is termed an atom. We show that any execution of a fault-tolerant distributed computation can be seen as an execution of molecules/atoms in a partial order, and such a view provides insights into understanding the computation, particularly for a fault-tolerant system where it is important to guarantee that a unit of computation is either completely executed or not at all and system designers need to reason about the states after execution of such units. Molecules are essentially a generalization of atomic actions.


πŸ“œ SIMILAR VOLUMES


Distributed replication mechanism for bu
✍ Hideaki Hirayama; Toshio Shirakihara; Tatsunori Kanai πŸ“‚ Article πŸ“… 2000 πŸ› John Wiley and Sons 🌐 English βš– 161 KB

A distributed fault tolerant middleware system called ARTEMIS (Advanced Reliable disTributed Environment MIddleware System) was developed for the purpose of building fault tolerant systems without modifying either the source code or the binary code of application programs in open systems. In ARTEMIS

Fault-tolerant control of process system
✍ Nael H. El-Farra; Adiwinata Gani; Panagiotis D. Christofides πŸ“‚ Article πŸ“… 2005 πŸ› American Institute of Chemical Engineers 🌐 English βš– 364 KB πŸ‘ 2 views

## Abstract A methodology for the design of fault‐tolerant control systems for chemical plants with distributed interconnected processing units is presented. Bringing together tools from Lyapunov‐based nonlinear control and hybrid systems theory, the approach is based on a hierarchical architecture