𝔖 Bobbio Scriptorium
✦   LIBER   ✦

On the Probability of Correct Selection for Large k Populations, with Application to Microarray Data

✍ Scribed by Xinping Cui; Jason Wilson


Publisher
John Wiley and Sons
Year
2008
Tongue
English
Weight
168 KB
Volume
50
Category
Article
ISSN
0323-3847

No coin nor oath required. For personal study only.

✦ Synopsis


Abstract

One frontier of modern statistical research is the problems arising from data sets with extremely large k (>1000) populations, e.g. microarray and neuroimaging data. For many such problems the focus shifts from testing for significance to selecting, filtering, or screening. Classical Ranking and Selection Methodology (RSM) studied the probability of correct selection (PCS). PCS is the probability that the “best” (t = 1) of k populations is truly selected, according to some specified criteria of best. This paper extends and adapts two selection goals from the RSM literature that are suitable for large k problems (d ‐best and G ‐best selection). It is then shown how estimation of PCS for selecting multiple (t > 1) populations with d ‐best and G ‐best selection can be implemented to provide a useful measure of the quality of a given selection. A simulation study and the application of the proposed method to a benchmark microarray data set show it is an effective and versatile tool for assessing the probability that a particular gene selection or gene filtering step truly obtains the best genes. Moreover, the proposed method is fully general and may be applied to any such extremely large k problem. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)