✦ LIBER ✦

Data mining, neural nets, trees — Problems 2 and 3 of Genetic Analysis Workshop 15

✍ Scribed by Andreas Ziegler; Anita L. DeStefano; Inke R. König

Publisher: John Wiley and Sons
Year: 2007
Tongue: English
Weight: 153 KB
Volume: 31
Category: Article
ISSN: 0741-0395
DOI: 10.1002/gepi.20280

No coin nor oath required. For personal study only.

✦ Synopsis

Genome-wide association studies using thousands to hundreds of thousands of single nucleotide polymorphism (SNP) markers and region-wide association studies using a dense panel of SNPs are already in use to identify disease susceptibility genes and to predict disease risk in individuals. Because these tasks become increasingly important, three different data sets were provided for the Genetic Analysis Workshop 15, thus allowing examination of various novel and existing data mining methods for both classification and identification of disease susceptibility genes, gene by gene or gene by environment interaction. The approach most often applied in this presentation group was random forests because of its simplicity, elegance, and robustness. It was used for prediction and for screening for interesting SNPs in a first step. The logistic tree with unbiased selection approach appeared to be an interesting alternative to efficiently select interesting SNPs. Machine learning, specifically ensemble methods, might be useful as pre-screening tools for large-scale association studies because they can be less prone to overfitting, can be less computer processor time intensive, can easily include pair-wise and higher-order interactions compared with standard statistical approaches and can also have a high capability for classification. However, improved implementations that are able to deal with hundreds of thousands of SNPs at a time are required.

📜 SIMILAR VOLUMES

Data mining of RNA expression and DNA ge

Data mining of RNA expression and DNA genotype data: Presentation Group 5 contributions to Genetic Analysis Workshop 15

✍ Catherine T. Falk; Stephen J. Finch; Wonkuk Kim; Nitai D. Mukhopadhyay 📂 Article 📅 2007 🏛 John Wiley and Sons 🌐 English ⚖ 137 KB 👁 1 views

The complexity of data available in human genetics continues to grow at an explosive rate. With that growth, the challenges to understanding the meaning of the underlying information also grow. A currently popular approach to dissecting such information falls under the broad category of data mining.

Comparison of single-nucleotide polymorp

Comparison of single-nucleotide polymorphisms and microsatellite markers for linkage analysis in the COGA and simulated data sets for Genetic Analysis Workshop 14: Presentation Groups 1, 2, and 3

✍ Marsha A. Wilcox; Elizabeth W. Pugh; Heping Zhang; Xiaoyun Zhong; Douglas F. Lev 📂 Article 📅 2005 🏛 John Wiley and Sons 🌐 English ⚖ 523 KB

The papers in presentation groups 1-3 of Genetic Analysis Workshop 14 (GAW14) compared microsatellite (MS) markers and single-nucleotide polymorphism (SNP) markers for a variety of factors, using multiple methods in both data sets provided to GAW participants. Group 1 focused on data provided from t