✦ LIBER ✦

Text clustering using frequent itemsets

✍ Scribed by Wen Zhang; Taketoshi Yoshida; Xijin Tang; Qing Wang

Publisher: Elsevier Science
Year: 2010
Tongue: English
Weight: 767 KB
Volume: 23
Category: Article
ISSN: 0950-7051
DOI: 10.1016/j.knosys.2010.01.011

No coin nor oath required. For personal study only.

✦ Synopsis

Frequent itemset originates from association rule mining. Recently, it has been applied in text mining such as document categorization, clustering, etc. In this paper, we conduct a study on text clustering using frequent itemsets. The main contribution of this paper is three manifolds. First, we present a review on existing methods of document clustering using frequent patterns. Second, a new method called Maximum Capturing is proposed for document clustering. Maximum Capturing includes two procedures: constructing document clusters and assigning cluster topics. We develop three versions of Maximum Capturing based on three similarity measures. We propose a normalization process based on frequency sensitive competitive learning for Maximum Capturing to merge cluster candidates into predefined number of clusters. Third, experiments are carried out to evaluate the proposed method in comparison with CFWS, CMS, FTC and FIHC methods. Experiment results show that in clustering, Maximum Capturing has better performances than other methods mentioned above. Particularly, Maximum Capturing with representation using individual words and similarity measure using asymmetrical binary similarity achieves the best performance. Moreover, topics produced by Maximum Capturing distinguished clusters from each other and can be used as labels of document clusters.

📜 SIMILAR VOLUMES

Text Classification Using Sentential Fre

Text Classification Using Sentential Frequent Itemsets

✍ Shi-Zhu Liu; He-Ping Hu 📂 Article 📅 2007 🏛 Springer 🌐 English ⚖ 302 KB

FICW: Frequent itemset based text cluste

FICW: Frequent itemset based text clustering with window constraint

✍ Zhou Chong; Lu Yansheng; Zou Lei; Hu Rong 📂 Article 📅 2006 🏛 Wuhan University 🌐 English ⚖ 594 KB

Evaluating Cluster Preservation in Frequ

Evaluating Cluster Preservation in Frequent Itemset Integration for Distributed Databases

✍ Sumeet Dua; Michael P. Dessauer; Prerna Sethi 📂 Article 📅 2010 🏛 Springer US 🌐 English ⚖ 664 KB

Using the Structure of Prelarge Trees to

Using the Structure of Prelarge Trees to Incrementally Mine Frequent Itemsets

✍ Chun-Wei Lin; Tzung-Pei Hong; Wen-Hsiang Lu 📂 Article 📅 2010 🏛 Springer 🌐 English ⚖ 654 KB

Efficient mining of maximal frequent ite

Efficient mining of maximal frequent itemsets from databases on a cluster of workstations

✍ Soon M. Chung; Congnan Luo 📂 Article 📅 2007 🏛 Springer-Verlag 🌐 English ⚖ 799 KB

A new concise representation of frequent

A new concise representation of frequent itemsets using generators and a positive border

✍ Guimei Liu; Jinyan Li; Limsoon Wong 📂 Article 📅 2007 🏛 Springer-Verlag 🌐 English ⚖ 466 KB