𝔖 Bobbio Scriptorium
✦   LIBER   ✦

A scalable framework for cluster ensembles

✍ Scribed by Prodip Hore; Lawrence O. Hall; Dmitry B. Goldgof


Publisher
Elsevier Science
Year
2009
Tongue
English
Weight
556 KB
Volume
42
Category
Article
ISSN
0031-3203

No coin nor oath required. For personal study only.

✦ Synopsis


An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.


πŸ“œ SIMILAR VOLUMES


Predictive weighting for cluster ensembl
✍ Christine Smyth; Danny Coomans πŸ“‚ Article πŸ“… 2007 πŸ› John Wiley and Sons 🌐 English βš– 414 KB

## Abstract An ensemble of regression models predicts by taking a weighted average of the predictions made by individual models. Calculating the weights such that they reflect the accuracy of individual models (post processing the ensemble) has been shown to increase the ensemble's accuracy. Howeve

PARMON: a portable and scalable monitori
✍ Rajkumar Buyya πŸ“‚ Article πŸ“… 2000 πŸ› John Wiley and Sons 🌐 English βš– 312 KB πŸ‘ 2 views

Workstation/PC clusters have become a cost-effective solution for high performance computing. C-DAC's PARAM 10000 (or OpenFrame, internal code name) is a large cluster of high-performance workstations interconnected through low-latency and high bandwidth networks. The management and control of such

Transparently Obtaining Scalability for
✍ Yariv Aridor; Michael Factor; Avi Teperman; Tamar Eilam; Assaf Schuster πŸ“‚ Article πŸ“… 2000 πŸ› Elsevier Science 🌐 English βš– 469 KB

cJVM is a Java virtual machine (JVM) which provides a single system image of a traditional JVM while executing in a distributed fashion on the nodes of a cluster. cJVM virtualizes the cluster, transparently distributing the objects and threads of any pure Java application. The aim of cJVM is to obta