✦ LIBER ✦

The ganglia distributed monitoring system: design, implementation, and experience

✍ Scribed by Matthew L Massie; Brent N Chun; David E Culler

Publisher: Elsevier Science
Year: 2004
Tongue: English
Weight: 356 KB
Volume: 30
Category: Article
ISSN: 0167-8191
DOI: 10.1016/j.parco.2004.04.001

No coin nor oath required. For personal study only.

✦ Synopsis

Ganglia is a scalable distributed monitoring system for high performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. This paper presents the design, implementation, and evaluation of Ganglia along with experience gained through real world deployments on systems of widely varying scale, configurations, and target application domains over the last two and a half years.

📜 SIMILAR VOLUMES

Design and implementation of a monitorin

Design and implementation of a monitoring system for NC machines

✍ S.V. De Carvalho; A.E.K. Sahraoui; Alexandro Serrano 📂 Article 📅 1992 🏛 Elsevier Science 🌐 English ⚖ 670 KB

DESIGN AND IMPLEMENTATION OF SENSOR-BASE

DESIGN AND IMPLEMENTATION OF SENSOR-BASED TOOL-WEAR MONITORING SYSTEMS

✍ Choon Seong Leem; David A. Dornfeld 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 261 KB

This article addresses the design of sensor-based tool-wear monitoring systems and their implementation, and specifically focuses on interpretation of signals from multiple sensors in terms of tool-wear level. Keeping in mind that the absence of a well-accepted reliable methodology and the ignorance

Microdialysis implemented in the design

Microdialysis implemented in the design of a system for continuous glucose monitoring

✍ Thomas Laurell 📂 Article 📅 1993 🏛 Elsevier Science 🌐 English ⚖ 362 KB

Experiences in the design and implementa

Experiences in the design and implementation of a structured real-time operating system

✍ Anthony Mark; Otto Eggenberger; Jürgen Nehmer 📂 Article 📅 1977 🏛 Elsevier Science ⚖ 506 KB

Managing the design and implementation o

Managing the design and implementation of a Real-Time Computerized Monitoring system for a textile environment

✍ Ricky W. Robertson 📂 Article 📅 1988 🏛 Elsevier Science 🌐 English ⚖ 313 KB

The STAR fault manager for distributed o

The STAR fault manager for distributed operating environments. design, implementation and performance

✍ Pierre Sens; Bertil Folliot 📂 Article 📅 1998 🏛 John Wiley and Sons 🌐 English ⚖ 225 KB 👁 1 views

This paper presents the design, implementation and performance evaluation of a software fault manager for distributed applications. Dubbed Star, it uses the natural redundancy existing in networks of workstations to offer a high level of fault tolerance. Fault management is transparent to the suppor