Selection in discriminant analysis with continuous and discrete variables
β Scribed by J.J. Daudin; A. Bar-Hen
- Publisher
- Elsevier Science
- Year
- 1999
- Tongue
- English
- Weight
- 105 KB
- Volume
- 32
- Category
- Article
- ISSN
- 0167-9473
No coin nor oath required. For personal study only.
β¦ Synopsis
A problem frequently encountered by the practitioner in Discriminant Analysis is how to select the best variables. In mixed discriminant analysis (MDA), i.e., discriminant analysis with both continuous and discrete variables, the problem is more di cult because of the di erent nature of the variables. Various methods have been proposed in recent years for selecting variables in MDA. In this paper we use two versions of a generalized Mahalanobis distance between populations based on the Kullback-Leibler divergence for the ΓΏrst and on the Hellinger-Matusita distance for the second. Stopping rules are established from distributional results. A simulation experiment is used to compare the two proposed selection methods and a third based on a modiΓΏed version of the Akaike Information Criterion (AIC). Since the simulations focus on situations with just one continuous and one binary variable, they can only give indications concerning a few variables and caution is recommended if extended to more usual situations.
π SIMILAR VOLUMES
Sufficient conditions are given to ensure a better performance of the plug-in version of the covariates adjusted location linear discriminant function in an asymptotic comparison of the overall expected error rate. Our findings generalize several earlier results on discriminant function with covaria
A robust method of selecting variables with the greatest discriminatory power is presented in the paper. It is based on the robustified Wilks A statistic and can be applied in a multi-group discrimination problem. An application to some respiratory disease data together with a comparison of the clas
The likelihood ratio classification rule based on the location model is estimated given: (1) data consist of both binary and continuous variables; (2) some states have either zero frequency or too few observations, the case that usually happens in practice. An iterative proportional fitting of a con