✦ LIBER ✦

Units of Computation in Fault-Tolerant Distributed Systems

✍ Scribed by Mohan Ahuja; Shivakant Mishra

Publisher: Elsevier Science
Year: 1997
Tongue: English
Weight: 493 KB
Volume: 40
Category: Article
ISSN: 0743-7315
DOI: 10.1006/jpdc.1996.1277

No coin nor oath required. For personal study only.

✦ Synopsis

We develop a framework that helps in understanding a faulttolerant distributed system and so aids in designing such systems. We illustrate the uses of the developed work in application areas such as checkpointing and recovery, phase termination detection, stable property detection, implementing membership protocols, debugging, and design of programming languages. We define a unit of computation, and refer to it as a molecule. A molecule has a well defined interface with other molecules. The smallest such unit-an indivisible molecule-is termed an atom. We show that any execution of a fault-tolerant distributed computation can be seen as an execution of molecules/atoms in a partial order, and such a view provides insights into understanding the computation, particularly for a fault-tolerant system where it is important to guarantee that a unit of computation is either completely executed or not at all and system designers need to reason about the states after execution of such units. Molecules are essentially a generalization of atomic actions.

📜 SIMILAR VOLUMES

Performance and effectiveness trade-off

Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems

✍ Panagiotis Katsaros; Lefteris Angelis; Constantine Lazos 📂 Article 📅 2006 🏛 John Wiley and Sons 🌐 English ⚖ 337 KB

Integrated fault-detection and fault-tol

Integrated fault-detection and fault-tolerant control of process systems

✍ Prashant Mhaskar; Adiwinata Gani; Nael H. El-Farra; Charles McFall; Panagiotis D 📂 Article 📅 2006 🏛 American Institute of Chemical Engineers 🌐 English ⚖ 351 KB 👁 2 views

Distributed replication mechanism for bu

Distributed replication mechanism for building fault tolerant system with distributed checkpoint mechanism

✍ Hideaki Hirayama; Toshio Shirakihara; Tatsunori Kanai 📂 Article 📅 2000 🏛 John Wiley and Sons 🌐 English ⚖ 161 KB

A distributed fault tolerant middleware system called ARTEMIS (Advanced Reliable disTributed Environment MIddleware System) was developed for the purpose of building fault tolerant systems without modifying either the source code or the binary code of application programs in open systems. In ARTEMIS

A decentralized and fault-tolerant Deskt

A decentralized and fault-tolerant Desktop Grid system for distributed applications

✍ Heithem Abbes; Christophe Cérin; Mohamed Jemni 📂 Article 📅 2009 🏛 John Wiley and Sons 🌐 English ⚖ 179 KB

Fault-tolerant control of process system

Fault-tolerant control of process systems using communication networks

✍ Nael H. El-Farra; Adiwinata Gani; Panagiotis D. Christofides 📂 Article 📅 2005 🏛 American Institute of Chemical Engineers 🌐 English ⚖ 364 KB 👁 2 views

## Abstract A methodology for the design of fault‐tolerant control systems for chemical plants with distributed interconnected processing units is presented. Bringing together tools from Lyapunov‐based nonlinear control and hybrid systems theory, the approach is based on a hierarchical architecture

Cost-effective designs of fault-tolerant

Cost-effective designs of fault-tolerant access networks in communication systems

✍ Xujin Chen; Bo Chen 📂 Article 📅 2009 🏛 John Wiley and Sons 🌐 English ⚖ 167 KB