๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Ranking and selecting terms for text categorization via SVM discriminate boundary

โœ Scribed by Tien-Fang Kuo; Yasutoshi Yajima


Book ID
102280678
Publisher
John Wiley and Sons
Year
2009
Tongue
English
Weight
298 KB
Volume
25
Category
Article
ISSN
0884-8173

No coin nor oath required. For personal study only.

โœฆ Synopsis


The problem of natural language document categorization consists of classifying documents into predetermined categories based on their contents. Each distinct term, or word, in the documents is a feature for representing a document. In general, the number of terms may be extremely large and the dozens of redundant terms may be included, which may reduce the classification performance. In this paper, a support vector machine (SVM)-based feature ranking and selecting method for text categorization is proposed. The contribution of each term for classification is calculated based on the nonlinear discriminant boundary, which is generated by the SVM. The results of experiments on several real-world data sets show that the proposed method is powerful enough to extract a smaller number of important terms and achieves a higher classification performance than existing feature selecting methods based on latent semantic indexing and ฯ‡ 2 statistics values.