𝔖 Bobbio Scriptorium
✦   LIBER   ✦

A linear algebra measure of cluster quality

✍ Scribed by Mather, Laura A.


Publisher
John Wiley and Sons
Year
2000
Tongue
English
Weight
114 KB
Volume
51
Category
Article
ISSN
0002-8231

No coin nor oath required. For personal study only.

✦ Synopsis


One of the most common models in information retrieval (IR), the vector space model, represents a document set as a term-document matrix where each row corresponds to a term and each column corresponds to a document. Because of the use of matrices in IR, it is possible to apply linear algebra to this IR model. This paper describes an application of linear algebra to text clustering, namely, a metric for measuring cluster quality. The metric is based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. The metric compares the singular values of the term-document matrix to the singular values of the matrices for each of the clusters to determine the amount of overlap of the terms across clusters. Because the metric can be difficult to interpret, a standardization of the metric is defined, which specifies the number of standard deviations a clustering of a document set is from an average, random clustering of that document set. Empirical evidence shows that the standardized cluster metric correlates with clustered retrieval performance when comparing clustering algorithms or multiple parameters for the same clustering algorithm.


πŸ“œ SIMILAR VOLUMES


Water quality monitoring using cluster a
✍ A. Manuela GonΓ§alves; Teresa Alpuim πŸ“‚ Article πŸ“… 2011 πŸ› John Wiley and Sons 🌐 English βš– 744 KB

The development of statistical methodologies based on spatial and temporal hydrological data is a very important tool in the monitoring of surface water quality in a river basin. This paper uses cluster analysis and linear models to describe hydrological space–time series of quality variables and to

A unified quality of recovery (QoR) meas
✍ P. ChoΕ‚da; A. Jajszczyk; K. Wajda πŸ“‚ Article πŸ“… 2008 πŸ› John Wiley and Sons 🌐 English βš– 469 KB πŸ‘ 1 views
Leaf Breakdown Rates: a Measure of Water
✍ ClΓ‘udia Pascoal; Fernanda CΓ‘ssio; Pedro Gomes πŸ“‚ Article πŸ“… 2001 πŸ› John Wiley and Sons 🌐 English βš– 149 KB πŸ‘ 1 views

The breakdown rates of Alnus glutinosa leaves and the structure of macroinvertebrate communities were used to evaluate the impact of the village of Montalegre (Portugal) on the water quality of the CΓ‘vado river. Chemical and microbial analyses of stream water indicated a high organic load in the vic

User preference: A measure of query-term
✍ Nina Wacholder; Lu Liu πŸ“‚ Article πŸ“… 2006 πŸ› John Wiley and Sons 🌐 English βš– 658 KB

## Abstract The goal of this research is to understand what characteristics, if any, lead users engaged in interactive information seeking to prefer certain sets of query terms. Underlying this work is the assumption that query terms that information seekers prefer induce a kind of cognitive effici