Units of Computation in Fault-Tolerant Distributed Systems
β Scribed by Mohan Ahuja; Shivakant Mishra
- Publisher
- Elsevier Science
- Year
- 1997
- Tongue
- English
- Weight
- 493 KB
- Volume
- 40
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
β¦ Synopsis
We develop a framework that helps in understanding a faulttolerant distributed system and so aids in designing such systems. We illustrate the uses of the developed work in application areas such as checkpointing and recovery, phase termination detection, stable property detection, implementing membership protocols, debugging, and design of programming languages. We define a unit of computation, and refer to it as a molecule. A molecule has a well defined interface with other molecules. The smallest such unit-an indivisible molecule-is termed an atom. We show that any execution of a fault-tolerant distributed computation can be seen as an execution of molecules/atoms in a partial order, and such a view provides insights into understanding the computation, particularly for a fault-tolerant system where it is important to guarantee that a unit of computation is either completely executed or not at all and system designers need to reason about the states after execution of such units. Molecules are essentially a generalization of atomic actions.
π SIMILAR VOLUMES
A distributed fault tolerant middleware system called ARTEMIS (Advanced Reliable disTributed Environment MIddleware System) was developed for the purpose of building fault tolerant systems without modifying either the source code or the binary code of application programs in open systems. In ARTEMIS
## Abstract A methodology for the design of faultβtolerant control systems for chemical plants with distributed interconnected processing units is presented. Bringing together tools from Lyapunovβbased nonlinear control and hybrid systems theory, the approach is based on a hierarchical architecture