Controlling the number of false discoveries: application to high-dimensional genomic data
✍ Scribed by Edward L Korn; James F Troendle; Lisa M McShane; Richard Simon
- Book ID
- 104339943
- Publisher
- Elsevier Science
- Year
- 2004
- Tongue
- English
- Weight
- 855 KB
- Volume
- 124
- Category
- Article
- ISSN
- 0378-3758
No coin nor oath required. For personal study only.
✦ Synopsis
Researchers conducting gene expression microarray experiments often are interested in identifying genes that are di erentially expressed between two groups of specimens. A straightforward approach to the identiÿcation of such "di erentially expressed" genes is to perform a univariate analysis of group mean di erences for each gene, and then identify those genes that are most statistically signiÿcant. However, with the large number of genes typically represented on a microarray, using nominal signiÿcance levels (unadjusted for the multiple comparisons) will lead to the identiÿcation of many genes that truly are not di erentially expressed, "false discoveries." A reasonable strategy in many situations is to allow a small number of false discoveries, or a small proportion of the identiÿed genes to be false discoveries. Although previous work has considered control for the expected proportion of false discoveries (commonly known as the false discovery rate), we show that these methods may be inadequate. We propose two stepwise permutation-based procedures to control with speciÿed conÿdence the actual number of false discoveries and approximately the actual proportion of false discoveries. Limited simulation studies demonstrate substantial gain in sensitivity to detect truly di erentially expressed genes even when allowing as few as one or two false discoveries. We apply these new methods to analyze a microarray data set consisting of measurements on approximately 9000 genes in paired tumor specimens, collected both before and after chemotherapy on 20 breast cancer patients. The methods described are broadly applicable to the problem of identifying which variables of any large set of measured variables di er between pre-speciÿed groups.
📜 SIMILAR VOLUMES