One of the disadvantages of SIMCA pattern recognition is its inability to produce probabilistic classifications. Attempts to correct this involve distributional assumptions. It appears that SIMCA can handle the residual error terms efficiently, but that inside the class model subspace a crude trunca
The improvement of SIMCA classification by using kernel density estimation : Part 2. Practical evaluation of SIMCA, ALLOC and CLASSY on three data sets
โ Scribed by Hilko vander Voet; Durk A. Doornbos
- Publisher
- Elsevier Science
- Year
- 1984
- Tongue
- English
- Weight
- 664 KB
- Volume
- 161
- Category
- Article
- ISSN
- 0003-2670
No coin nor oath required. For personal study only.
โฆ Synopsis
The performance of the new probabilistic classification method CLASSY is evaluated on three different data sets, together with its predecessors SIMCA and ALLOC. The improvement made over ALLOC is only marginal, whereas CLASSY shows better predictive ability and greater reliability than SIMCA in most cases.
The evaluation of pattern recognition techniques was considered theoretically in Part 1 of this series [l] . The present paper is concerned with what information is provided by the selected measures for predictive ability [the number of errors, NE, and the quadratic score (QJ1), sharpness (QZ)] and reliability (Q5) about the SIMCA method (made probabilistic as described in Part 1) and the ALLOC and CLASSY methods. Because the optimal dimensionality for the principal component (PC) class models in SIMCA and CLASSY was unknown, and because the same optimum was not expected for both methods, all possible values of class dimensionality A were examined systematically.
DATA AND COMPUTER PROGRAMS
The pattern recognition methods SIMCA, ALLOC and CLASSY were evaluated on three data sets. Data sets Iris data. The well known iris data from Fisher have been analysed by several authors [2-41.
The data set consists of measurements made on flowers from three species of iris: Iris setosa, Iris uersicolor and Iris uirginica. Iris setosa was very easily distinguished from the other two by all methods, so only the latter two species were used here. There are four variables: sepal length, sepal width, petal length and petal width. Each class contains 50 individuals, which were divided randomly in two groups, a training and a test
๐ SIMILAR VOLUMES