𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Data mining for text categorization with semi-supervised agglomerative hierarchical clustering

✍ Scribed by Antonio Gómez Skarmeta; Amine Bensaid; Nadia Tazi


Publisher
John Wiley and Sons
Year
2000
Tongue
English
Weight
119 KB
Volume
15
Category
Article
ISSN
0884-8173

No coin nor oath required. For personal study only.

✦ Synopsis


In this paper we study the use of a semi-supervised agglomerative hierarchical clustering Ž . ssAHC algorithm to text categorization, which consists of assigning text documents to Ž . Ž . predefined categories. ssAHC is i a clustering algorithm that ii uses a finite design set Ž . Ž . of labeled data to iii help agglomerative hierarchical clustering AHC algorithms Ž . partition a finite set of unlabeled data and then iv terminates without the capability to label other objects. We first describe the text representation method we use in this work; we then present a feature selection method that is used to reduce the dimensionality of the feature space. Finally, we apply the ssAHC algorithm to the Reuters database of documents and show that its performance is superior to the Bayes classifier and to the Expectation-Maximization algorithm combined with Bayes classifier. We showed also that ssAHC helps AHC techniques to improve their performance.


📜 SIMILAR VOLUMES


Integrating WordNet knowledge to supplem
✍ Mohammed Benkhalifa; Abdelhak Mouradi; Houssaine Bouyakhf 📂 Article 📅 2001 🏛 John Wiley and Sons 🌐 English ⚖ 168 KB

The text categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. TC has been an application for many learning approaches, which proved effective. Nevertheless, TC provides many challenges to machine learning. In this paper, we suggest, f