A numeric comparison of variable selection algorithms for supervised learning
โ Scribed by G. Palombo; I. Narsky
- Publisher
- Elsevier Science
- Year
- 2009
- Tongue
- English
- Weight
- 411 KB
- Volume
- 612
- Category
- Article
- ISSN
- 0168-9002
No coin nor oath required. For personal study only.
โฆ Synopsis
Datasets in modern High Energy Physics (HEP) experiments are often described by dozens or even hundreds of input variables. Reducing a full variable set to a subset that most completely represents information about data is therefore an important task in analysis of HEP data. We compare various variable selection algorithms for supervised learning using several datasets such as, for instance, imaging gamma-ray Cherenkov telescope (MAGIC) data found at the UCI repository. We use classifiers and variable selection methods implemented in the statistical package StatPatternRecognition (SPR), a free open-source Cรพ รพ package developed in the HEP community (http://sourceforge.net/projects/ statpatrec/). For each dataset, we select a powerful classifier and estimate its learning accuracy on variable subsets obtained by various selection algorithms. When possible, we also estimate the CPU time needed for the variable subset selection. The results of this analysis are compared with those published previously for these datasets using other statistical packages such as R and Weka. We show that the most accurate, yet slowest, method is a wrapper algorithm known as generalized sequential forward selection (''Add N Remove R'') implemented in SPR.
๐ SIMILAR VOLUMES
First, the cerebellar model articulation controller (CMAC), invented in the early 1970s by AIbus, and the associative memory system (AMS), developed for learning control systems by H. Tolle et al. in the early 1980s, are briefly described. The underlying mathematics of the AMS learning or training a
The range of Fourier methods can be significantly increased by extending a nonperiodic function f (x) to a periodic function f on a larger interval. When f (x) is analytically known on the extended interval, the extension is straightforward. When f (x) is unknown outside the physical interval, there
In this article we compare two contrasting methods, active set method ASM and genetic algorithms, for learning the weights in aggregation operators, such as weighted mean ลฝ . ลฝ . WM , ordered weighted average OWA , and weighted ordered weighted average ลฝ . WOWA . We give the formal definitions for e