The development of statistical methodologies based on spatial and temporal hydrological data is a very important tool in the monitoring of surface water quality in a river basin. This paper uses cluster analysis and linear models to describe hydrological spaceβtime series of quality variables and to
A linear algebra measure of cluster quality
β Scribed by Mather, Laura A.
- Publisher
- John Wiley and Sons
- Year
- 2000
- Tongue
- English
- Weight
- 114 KB
- Volume
- 51
- Category
- Article
- ISSN
- 0002-8231
No coin nor oath required. For personal study only.
β¦ Synopsis
One of the most common models in information retrieval (IR), the vector space model, represents a document set as a term-document matrix where each row corresponds to a term and each column corresponds to a document. Because of the use of matrices in IR, it is possible to apply linear algebra to this IR model. This paper describes an application of linear algebra to text clustering, namely, a metric for measuring cluster quality. The metric is based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. The metric compares the singular values of the term-document matrix to the singular values of the matrices for each of the clusters to determine the amount of overlap of the terms across clusters. Because the metric can be difficult to interpret, a standardization of the metric is defined, which specifies the number of standard deviations a clustering of a document set is from an average, random clustering of that document set. Empirical evidence shows that the standardized cluster metric correlates with clustered retrieval performance when comparing clustering algorithms or multiple parameters for the same clustering algorithm.
π SIMILAR VOLUMES
The breakdown rates of Alnus glutinosa leaves and the structure of macroinvertebrate communities were used to evaluate the impact of the village of Montalegre (Portugal) on the water quality of the CΓ‘vado river. Chemical and microbial analyses of stream water indicated a high organic load in the vic
## Abstract The goal of this research is to understand what characteristics, if any, lead users engaged in interactive information seeking to prefer certain sets of query terms. Underlying this work is the assumption that query terms that information seekers prefer induce a kind of cognitive effici