✦ LIBER ✦

Direct analysis of unphased SNP genotype data in population-based association studies via Bayesian partition modelling of haplotypes

✍ Scribed by Andrew P. Morris

Publisher: John Wiley and Sons
Year: 2005
Tongue: English
Weight: 212 KB
Volume: 29
Category: Article
ISSN: 0741-0395
DOI: 10.1002/gepi.20080

No coin nor oath required. For personal study only.

✦ Synopsis

We describe a novel method for assessing the strength of disease association with single nucleotide polymorphisms (SNPs) in a candidate gene or small candidate region, and for estimating the corresponding haplotype relative risks of disease, using unphased genotype data directly. We begin by estimating the relative frequencies of haplotypes consistent with observed SNP genotypes. Under the Bayesian partition model, we specify cluster centres from this set of consistent SNP haplotypes. The remaining haplotypes are then assigned to the cluster with the "nearest" centre, where distance is defined in terms of SNP allele matches. Within a logistic regression modelling framework, each haplotype within a cluster is assigned the same disease risk, reducing the number of parameters required. Uncertainty in phase assignment is addressed by considering all possible haplotype configurations consistent with each unphased genotype, weighted in the logistic regression likelihood by their probabilities, calculated according to the estimated relative haplotype frequencies. We develop a Markov chain Monte Carlo algorithm to sample over the space of haplotype clusters and corresponding disease risks, allowing for covariates that might include environmental risk factors or polygenic effects. Application of the algorithm to SNP genotype data in an 890-kb region flanking the CYP2D6 gene illustrates that we can identify clusters of haplotypes with similar risk of poor drug metaboliser (PDM) phenotype, and can distinguish PDM cases carrying different high-risk variants. Further, the results of a detailed simulation study suggest that we can identify positive evidence of association for moderate relative disease risks with a sample of 1,000 cases and 1,000 controls.

📜 SIMILAR VOLUMES

Streamlined analysis of pooled genotype

Streamlined analysis of pooled genotype data in SNP-based association studies

✍ Valentina Moskvina; Nadine Norton; Nigel Williams; Peter Holmans; Michael Owen; 📂 Article 📅 2005 🏛 John Wiley and Sons 🌐 English ⚖ 130 KB 👁 1 views

## Abstract Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor __k__ for the degree to which measu