Evolutionary variable selection in regression and PLS analyses
β Scribed by Hugo Kubinyi
- Publisher
- John Wiley and Sons
- Year
- 1996
- Tongue
- English
- Weight
- 929 KB
- Volume
- 10
- Category
- Article
- ISSN
- 0886-9383
No coin nor oath required. For personal study only.
β¦ Synopsis
Evolutionary and genetic algorithms are powerful tools for searching global optima of complex functions. An evolutionary approach, the MUSEUM (mutation and selection uncover models) programme, is applied to various QSAR data sets to prove the general applicability of this approach for variable selection in regression and PLS analyses. 'Best' regression models are found within seconds or a few minutes of calculation time, even for data sets including large numbers of variables. The MUSEUM algorithm starts from an arbitrary model and adds or eliminates variables to or from this model in a random manner. Any 'better' model defined by a certain fitness criterion is taken as a new breeding organism which is mutated by further variable additions, eliminations or exchanges. In this manner the models improve gradually until a global optimum or at least a good local optimum results. In most cases several different models are obtained from different runs. A systematic search for the best models indicates that in all cases the global optima and good local optima result from the evolutionary search. Most often the fit and cross-validation results of these regression models are better than the fit and cross-validation results of a PLS analysis which includes all variables of the data set. The variables contained in the best regression models are suitable as subsets for PLS analyses and some of these PLS results are even better than the best regression results.
π SIMILAR VOLUMES
## Abstract Variable selection in regression with very big numbers of variables is challenging both in terms of model specification and computation. We focus on genetic studies in the field of survival, and we present a Bayesianβinspired penalized maximum likelihood approach appropriate for highβdi
Three alternative approaches are discussed for finding the final calibration model (regression coefficients) in PLS regression of k-way Y on N-way X. The simplest approach is to skip the deflation of the X-data. From the observation that the specific deflation used in multiway PLS is inconsequential
A modified PLS algorithm is introduced with the goal of achieving improved prediction ability. The method, denoted IVS-PLS, is based on dimension-wise selective reweighting of single elements in the PLS weight vector w. Cross-validation, a criterion for the estimation of predictive quality, is used