Ranking and selecting terms for text categorization via SVM discriminate boundary
โ Scribed by Tien-Fang Kuo; Yasutoshi Yajima
- Book ID
- 102280678
- Publisher
- John Wiley and Sons
- Year
- 2009
- Tongue
- English
- Weight
- 298 KB
- Volume
- 25
- Category
- Article
- ISSN
- 0884-8173
No coin nor oath required. For personal study only.
โฆ Synopsis
The problem of natural language document categorization consists of classifying documents into predetermined categories based on their contents. Each distinct term, or word, in the documents is a feature for representing a document. In general, the number of terms may be extremely large and the dozens of redundant terms may be included, which may reduce the classification performance. In this paper, a support vector machine (SVM)-based feature ranking and selecting method for text categorization is proposed. The contribution of each term for classification is calculated based on the nonlinear discriminant boundary, which is generated by the SVM. The results of experiments on several real-world data sets show that the proposed method is powerful enough to extract a smaller number of important terms and achieves a higher classification performance than existing feature selecting methods based on latent semantic indexing and ฯ 2 statistics values.
๐ SIMILAR VOLUMES