Ensemble feature selection with the simple Bayesian classification
โ Scribed by Alexey Tsymbal; Seppo Puuronen; David W. Patterson
- Publisher
- Elsevier Science
- Year
- 2003
- Tongue
- English
- Weight
- 376 KB
- Volume
- 4
- Category
- Article
- ISSN
- 1566-2535
No coin nor oath required. For personal study only.
โฆ Synopsis
A popular method for creating an accurate classifier from a set of training data is to build several classifiers, and then to combine their predictions. The ensembles of simple Bayesian classifiers have traditionally not been a focus of research. One way to generate an ensemble of accurate and diverse simple Bayesian classifiers is to use different feature subsets generated with the random subspace method. In this case, the ensemble consists of multiple classifiers constructed by randomly selecting feature subsets, that is, classifiers constructed in randomly chosen subspaces. In this paper, we present an algorithm for building ensembles of simple Bayesian classifiers in random subspaces. The EFS_SBC algorithm includes a hill-climbing-based refinement cycle, which tries to improve the accuracy and diversity of the base classifiers built on random feature subsets. We conduct a number of experiments on a collection of 21 real-world and synthetic data sets, comparing the EFS_SBC ensembles with the single simple Bayes, and with the boosted simple Bayes. In many cases the EFS_SBC ensembles have higher accuracy than the single simple Bayesian classifier, and than the boosted Bayesian ensemble. We find that the ensembles produced focusing on diversity have lower generalization error, and that the degree of importance of diversity in building the ensembles is different for different data sets. We propose several methods for the integration of simple Bayesian classifiers in the ensembles. In a number of cases the techniques for dynamic integration of classifiers have significantly better classification accuracy than their simple static analogues. We suggest that a reason for that is that the dynamic integration better utilizes the ensemble coverage than the static integration.
๐ SIMILAR VOLUMES
We investigate using the frequency of simple features to provide image signatures for input to a classiยฎer. In an approach inspired by the n-gram technique for text classiยฎcation, a binary image is scanned with a small window, e.g. 3 ยด3 matrix and the occurrences of all possible features patterns wi
## Abstract Within the last decades, the detailed knowledge on the impact of membrane bound drug efflux transporters of the ATP binding cassette (ABC) protein family on the pharmacological profile of drugs has enormously increased. Especially, ABCB1 (Pโglycoprotein, Pโgp, MDR1) has attracted partic