𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Probabilistic system-level fault diagnostic algorithms for multiprocessors

✍ Scribed by Tamás Bartha; Endre Selényi


Publisher
Elsevier Science
Year
1997
Tongue
English
Weight
914 KB
Volume
22
Category
Article
ISSN
0167-8191

No coin nor oath required. For personal study only.

✦ Synopsis


Massively parallel computers (MPCs) introduce new requirements for system-level fault diagnosis, like handling a huge number of processing elements in a heterogeneous system. They also have specific attributes, such as regular topology and low local complexity. Traditional deterministic methods of system-level diagnosis did not consider these issues. This paper presents a new approach, called local information diagnosis that exploits the characteristics of massively parallel systems. The paper defines the diagnostic model, which is based on generalized test invalidation to handle inhomogeneity in multiprocessors.

Five effective probabilistic diagnostic algorithms using the proposed method are also given, and their space and time complexity are estimated.


📜 SIMILAR VOLUMES


Performance evaluation of Two-level Sche
✍ Yukio Ohishi; Keizo Saisho; Akira Fukuda 📂 Article 📅 1998 🏛 John Wiley and Sons 🌐 English ⚖ 210 KB

In this article, we simulate and evaluate various Twolevel Scheduling algorithms for cluster-based NUMA (Non-Uniform Memory Access) multiprocessors. Twolevel Scheduling is a kind of space partitioning scheduling. We evaluate the following variations: (1) Cluster-free Algorithm and (2) Cluster-limite

Scheduling Algorithms with Fault Detecti
✍ K. Mahesh; G. Manimaran; C.Siva Ram Murthy; Arun K. Somani 📂 Article 📅 1998 🏛 Elsevier Science 🌐 English ⚖ 444 KB

Several schemes for detecting and locating faulty processors through self-diagnosis in multiprocessor systems have been discussed in the past. These schemes attempt to start multiple copies (versions) of the tasks on available idle processors simultaneously and compare the results generated by the c

Optimal adaptive fault diagnosis for sim
✍ Kranakis, Evangelos; Pelc, Andrzej; Spatharis, Anthony 📂 Article 📅 1999 🏛 John Wiley and Sons 🌐 English ⚖ 122 KB

We studied adaptive system-level fault diagnosis for multiprocessor systems. Processors can test each other and future tests can be selected on the basis of previous test results. Fault-free testers give always correct test results, while faulty testers are completely unreliable. The aim of diagnosi

A Fault-Tolerance Model for Multiprocess
✍ Sheng-Tzong Cheng; Chia-Mei Chen; Satish K. Tripathi 📂 Article 📅 2000 🏛 Elsevier Science 🌐 English ⚖ 298 KB

System reliability is an important aspect of real-time systems, because the result of a real-time application may be valid only if the application functions correctly and its timing constraints are satisfied. There are two kinds of faults, hardware and software faults, and the paper considers hardwa