✦ LIBER ✦

Ranking and selecting terms for text categorization via SVM discriminate boundary

✍ Scribed by Tien-Fang Kuo; Yasutoshi Yajima

Book ID: 102280678
Publisher: John Wiley and Sons
Year: 2009
Tongue: English
Weight: 298 KB
Volume: 25
Category: Article
ISSN: 0884-8173
DOI: 10.1002/int.20392

No coin nor oath required. For personal study only.

✦ Synopsis

The problem of natural language document categorization consists of classifying documents into predetermined categories based on their contents. Each distinct term, or word, in the documents is a feature for representing a document. In general, the number of terms may be extremely large and the dozens of redundant terms may be included, which may reduce the classification performance. In this paper, a support vector machine (SVM)-based feature ranking and selecting method for text categorization is proposed. The contribution of each term for classification is calculated based on the nonlinear discriminant boundary, which is generated by the SVM. The results of experiments on several real-world data sets show that the proposed method is powerful enough to extract a smaller number of important terms and achieves a higher classification performance than existing feature selecting methods based on latent semantic indexing and χ 2 statistics values.

📜 SIMILAR VOLUMES

[IEEE 2010 Fourth International Conferen

[IEEE 2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC 2010) - Shenzhen (2010.12.13-2010.12.15)] 2010 Fourth International Conference on Genetic and Evolutionary Computing - Term-frequency Based Feature Selection Methods for Text Categorization

✍ Yan Xu, ; Lin Chen, 📂 Article 📅 2010 🏛 IEEE ⚖ 142 KB