Clustering and Information Retrieval

✍ Scribed by Weili Wu, Hui Xiong, Shashi Shekhar (auth.)

Publisher: Springer US
Year: 2004
Tongue: English
Leaves: 331
Series: Network Theory and Applications 11
Edition: 1
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Clustering is an important technique for discovering relatively dense sub-regions or sub-spaces of a multi-dimension data distribution. Clus tering has been used in information retrieval for many different purposes, such as query expansion, document grouping, document indexing, and visualization of search results. In this book, we address issues of cluster ing algorithms, evaluation methodologies, applications, and architectures for information retrieval. The first two chapters discuss clustering algorithms. The chapter from Baeza-Yates et al. describes a clustering method for a general metric space which is a common model of data relevant to information retrieval. The chapter by Guha, Rastogi, and Shim presents a survey as well as detailed discussion of two clustering algorithms: CURE and ROCK for numeric data and categorical data respectively. Evaluation methodologies are addressed in the next two chapters. Ertoz et al. demonstrate the use of text retrieval benchmarks, such as TRECS, to evaluate clustering algorithms. He et al. provide objective measures of clustering quality in their chapter. Applications of clustering methods to information retrieval is ad dressed in the next four chapters. Chu et al. and Noel et al. explore feature selection using word stems, phrases, and link associations for document clustering and indexing. Wen et al. and Sung et al. discuss applications of clustering to user queries and data cleansing. Finally, we consider the problem of designing architectures for infor mation retrieval. Crichton, Hughes, and Kelly elaborate on the devel opment of a scientific data system architecture for information retrieval.

✦ Table of Contents

Front Matter....Pages i-viii
Clustering in Metric Spaces with Applications to Information Retrieval....Pages 1-33
Techniques for Clustering Massive Data Sets....Pages 35-82
Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach....Pages 83-103
On Quantitative Evaluation of Clustering Systems....Pages 105-133
Techniques for Textual Document Indexing and Retrieval via Knowledge Sources and Data Mining....Pages 135-159
Document Clustering, Visualization, and Retrieval via Link Mining....Pages 161-193
Query Clustering in the Web Context....Pages 195-225
Clustering Techniques for Large Database Cleansing....Pages 227-259
A Science Data System Architecture for Information Retrieval....Pages 261-298
Granular Computing for the Design of Information Retrieval Support Systems....Pages 299-329

✦ Subjects

Data Structures, Cryptology and Information Theory; Information Storage and Retrieval; Information Systems Applications (incl.Internet); Artificial Intelligence (incl. Robotics)

📜 SIMILAR VOLUMES

Fuzzy Sets in Information Retrieval and

📁 Fuzzy Sets in Information Retrieval and Cluster Analysis

✍ Sadaaki Miyamoto (auth.) 📂 Library 📅 1990 🏛 Springer Netherlands 🌐 English

The present monograph intends to establish a solid link among three fields: fuzzy set theory, information retrieval, and cluster analysis. Fuzzy set theory supplies new concepts and methods for the other two fields, and provides a common frame work within which they can be reorganized. Four prin

Information Retrieval and Hypertext

📁 Information Retrieval and Hypertext

✍ Alan F. Smeaton (auth.), Maristella Agosti, Alan F. Smeaton (eds.) 📂 Library 📅 1996 🏛 Springer US 🌐 English

Information Retrieval (IR) has concentrated on the development of information management systems to support user retrieval from large collections of homogeneous textual material. A variety of approaches have been tried and tested with varying degrees of success over many decades of resea

Survey of Text Mining: Clustering, Class

📁 Survey of Text Mining: Clustering, Classification, and Retrieval

✍ Michael W. Berry 📂 Library 📅 2003 🏛 Springer, Berlin 🌐 English

I: CLUSTERING & CLASSIFICATION: * Cluster-preserving dimension reduction methods for efficient classification of text data * Automatic discovery of similar words * Simultaneous clustering and dynamic keyword weighting for text documents * Feature selection and document clustering II: INFORMATION E

Survey of Text Mining: Clustering, Class

📁 Survey of Text Mining: Clustering, Classification, and Retrieval

✍ Peg Howland, Haesun Park (auth.), Michael W. Berry (eds.) 📂 Library 📅 2004 🏛 Springer-Verlag New York 🌐 English

As the volume of digitized textual information continues to grow, so does the critical need for designing robust and scalable indexing and search strategies/software to meet a variety of user needs. Knowledge extraction or creation from text requires systematic, yet reliable processing

Survey of Text Mining: Clustering, Class

📁 Survey of Text Mining: Clustering, Classification, and Retrieval

✍ Peg Howland, Haesun Park (auth.), Michael W. Berry (eds.) 📂 Library 📅 2004 🏛 Springer-Verlag New York 🌐 English

Survey of Text Mining: Clustering, Class

📁 Survey of Text Mining: Clustering, Classification, and Retrieval

✍ Michael W. Berry 📂 Library 📅 2003 🏛 Springer 🌐 English

Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph the