✦ LIBER ✦

Efficient streaming text clustering

✍ Scribed by Shi Zhong

Publisher: Elsevier Science
Year: 2005
Tongue: English
Weight: 574 KB
Volume: 18
Category: Article
ISSN: 0893-6080
DOI: 10.1016/j.neunet.2005.06.008

No coin nor oath required. For personal study only.

✦ Synopsis

Clustering data streams has been a new research topic, recently emerged from many real data mining applications, and has attracted a lot of research attention. However, there is little work on clustering high-dimensional streaming text data. This paper combines an efficient online spherical k-means (OSKM) algorithm with an existing scalable clustering strategy to achieve fast and adaptive clustering of text streams. The OSKM algorithm modifies the spherical k-means (SPKM) algorithm, using online update (for cluster centroids) based on the well-known Winner-Take-All competitive learning. It has been shown to be as efficient as SPKM, but much superior in clustering quality. The scalable clustering strategy was previously developed to deal with very large databases that cannot fit into a limited memory and that are too expensive to read/scan multiple times. Using the strategy, one keeps only sufficient statistics for history data to retain (part of) the contribution of history data and to accommodate the limited memory. To make the proposed clustering algorithm adaptive to data streams, we introduce a forgetting factor that applies exponential decay to the importance of history data. The older a set of text documents, the less weight they carry. Our experimental results demonstrate the efficiency of the proposed algorithm and reveal an intuitive and an interesting fact for clustering text streams-one needs to forget to be adaptive.

📜 SIMILAR VOLUMES

Clustering Text Data Streams

✍ Yu-Bao Liu; Jia-Rong Cai; Jian Yin; Ada Wai-Chee Fu 📂 Article 📅 2008 🏛 Springer 🌐 English ⚖ 717 KB

On clustering massive text and categoric

On clustering massive text and categorical data streams

✍ Charu C. Aggarwal; Philip S. Yu 📂 Article 📅 2009 🏛 Springer-Verlag 🌐 English ⚖ 785 KB

Text clustering using frequent itemsets

✍ Wen Zhang; Taketoshi Yoshida; Xijin Tang; Qing Wang 📂 Article 📅 2010 🏛 Elsevier Science 🌐 English ⚖ 767 KB

Frequent itemset originates from association rule mining. Recently, it has been applied in text mining such as document categorization, clustering, etc. In this paper, we conduct a study on text clustering using frequent itemsets. The main contribution of this paper is three manifolds. First, we pre

Dynamic data assigning assessment cluste

Dynamic data assigning assessment clustering of streaming data

✍ O. Georgieva; F. Klawonn 📂 Article 📅 2008 🏛 Elsevier Science 🌐 English ⚖ 728 KB

Inter-Object Layer Clustering for scalab

Inter-Object Layer Clustering for scalable video streaming

✍ Hyunjoo Kim; Heon Y. Yeom; Sooyong Kang; Youjip Won 📂 Article 📅 2009 🏛 Springer US 🌐 English ⚖ 843 KB

Factor matrix text filtering and cluster

Factor matrix text filtering and clustering

✍ Ronald N. Kostoff; Joel A. Block 📂 Article 📅 2005 🏛 John Wiley and Sons 🌐 English ⚖ 228 KB

## Abstract The presence of trivial words in text databases can affect record or concept (words/phrases) clustering adversely. Additionally, the determination of whether a word/phrase is trivial is context‐dependent. Our objective in the present article is to demonstrate a context‐dependent trivial