𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Feature selection and effective classifiers

✍ Scribed by Deogun, Jitender S. ;Choubey, Suresh K. ;Raghavan, Vijay V. ;Sever, Hayri


Publisher
John Wiley and Sons
Year
1998
Tongue
English
Weight
133 KB
Volume
49
Category
Article
ISSN
0002-8231

No coin nor oath required. For personal study only.

✦ Synopsis


In this article, we develop and analyze four algorithms

patterns from large databases. As described in Fayyad for feature selection in the context of rough set method- (1996) and Simoudis (1996), this process is typically ology. The initial state and the feasibility criterion of all made up of selection and sampling, preprocessing and these algorithms are the same. That is, they start with a cleaning, transformation and reduction, data mining, and given feature set and progressively remove features, evaluation steps. The first step in the data-mining process while controlling the amount of degradation in classification quality. These algorithms, however, differ in the is to select a target data set from a database (or a data heuristics used for pruning the search space of features. warehouse) and to possibly sample the target data. The Our experimental results confirm the expected relationpreprocessing and data cleaning step handles noise and ship between the time complexity of these algorithms unknown values, as well as accounting for missing data and the classification accuracy of the resulting upper fields, time sequence information, and so forth. The data classifiers. Our experiments demonstrate that a u-reduct of a given feature set can be found efficiently. Although reduction and transformation step involves finding relewe have adopted upper classifiers in our investigations, vant features depending on the goal of the task and certain the algorithms presented can, however, be used with transformations on the data such as converting one type any method of deriving a classifier, where the quality of of data to another (e.g., changing nominal values into classification is a monotonically decreasing function of the size of the feature set. We compare the performance numeric ones, discretizing continuous values), and/or deof upper classifiers with those of lower classifiers. We fining new attributes. In the mining step, the user may find that upper classifiers perform better than lower apply one or more knowledge discovery techniques on classifiers for a duodenal ulcer data set. This should be the transformed data to extract valuable patterns. Finally, generally true when there is a small number of elements the evaluation step involves interpreting the result (or in the boundary region. An upper classifier has some important features that make it suitable for data mining discovered pattern) with respect to the goal/task at hand.

applications. In particular, we have shown that the upper

Note that the data-mining process is not linear and inclassifiers can be summarized at a desired level of abvolves a variety of feedback loops, because any one step straction by using extended decision tables. We also can result in changes in preceding or succeeding steps.

point out that an upper classifier results in an inconsistent decision algorithm, which can be interpreted deter-Furthermore, the nature of a large, real-world data set, ministically or non-deterministically to obtain a consiswhich may contain noisy, incomplete, dynamic, reduntent decision algorithm. dant, spare, and missing values, certainly requires that existing techniques and approaches be extended to cope with such problems (Deogun, Raghavan, Sarkar, & Sever, 1. Introduction 1997; Matheus, Chan, & Piatetsky-Shapiro , 1993). This A data-mining process involves extracting valid, prearticle uses the rough set model to address issues related viously unknown, potentially useful, and comprehensible to some aspects of real-world data and investigates the interactions between feature selection algorithms and


πŸ“œ SIMILAR VOLUMES


Comparison of algorithms that select fea
✍ Mineichi Kudo; Jack Sklansky πŸ“‚ Article πŸ“… 2000 πŸ› Elsevier Science 🌐 English βš– 506 KB

A comparative study of algorithms for large-scale feature selection (where the number of features is over 50) is carried out. In the study, the goodness of a feature subset is measured by leave-one-out correct-classi"cation rate of a nearestneighbor (1-NN) classi"er and many practical problems are u