✦ LIBER ✦

On the Probability of Correct Selection for Large k Populations, with Application to Microarray Data

✍ Scribed by Xinping Cui; Jason Wilson

Publisher: John Wiley and Sons
Year: 2008
Tongue: English
Weight: 168 KB
Volume: 50
Category: Article
ISSN: 0323-3847
DOI: 10.1002/bimj.200710457

No coin nor oath required. For personal study only.

✦ Synopsis

Abstract

One frontier of modern statistical research is the problems arising from data sets with extremely large k (>1000) populations, e.g. microarray and neuroimaging data. For many such problems the focus shifts from testing for significance to selecting, filtering, or screening. Classical Ranking and Selection Methodology (RSM) studied the probability of correct selection (PCS). PCS is the probability that the “best” (t = 1) of k populations is truly selected, according to some specified criteria of best. This paper extends and adapts two selection goals from the RSM literature that are suitable for large k problems (d ‐best and G ‐best selection). It is then shown how estimation of PCS for selecting multiple (t > 1) populations with d ‐best and G ‐best selection can be implemented to provide a useful measure of the quality of a given selection. A simulation study and the application of the proposed method to a benchmark microarray data set show it is an effective and versatile tool for assessing the probability that a particular gene selection or gene filtering step truly obtains the best genes. Moreover, the proposed method is fully general and may be applied to any such extremely large k problem. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)