<p><span>This volume expands on statistical analysis of genomic data by discussing cross-cutting groundwork material, public data repositories, common applications, and representative tools for operating on genomic data. </span><span>Statistical Genomics: Methods and Protocols </span><span>is divide
Statistical Population Genomics (Methods in Molecular Biology, 2090)
β Scribed by Julien Y. Dutheil (editor)
- Publisher
- Springer
- Year
- 2020
- Tongue
- English
- Leaves
- 467
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
This open access volume presents state-of-the-art inference methods in population genomics, focusing on data analysis based on rigorous statistical techniques. After introducing general concepts related to the biology of genomes and their evolution, the book covers state-of-the-art methods for the analysis of genomes in populations, including demography inference, population structure analysis and detection of selection, using both model-based inference and simulation procedures. Last but not least, it offers an overview of the current knowledge acquired by applying such methods to a large variety of eukaryotic organisms. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, pointers to the relevant literature, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls.
Authoritative and cutting-edge, Statistical Population Genomics aimsto promote and ensure successful applications of population genomic methods to an increasing number of model systems and biological questions.
β¦ Table of Contents
Preface
Acknowledgments
Contents
Contributors
Part I: Essential Concepts
Chapter 1: A Population Genomics Lexicon
1 Genomic Variation
1.1 Loci, Alleles, and Polymorphism
1.2 Mutations
1.3 The Wright-Fisher Model
1.4 The Backward Wright-Fisher Model: The Standard Coalescent
2 Beyond the Wright-Fisher Model
2.1 Demography
2.2 Population Structure
3 Statistics on Nucleotide Diversity
4 Selective Processes
4.1 Protein-Coding Genes
4.2 Fitness Effect
4.3 Types of Selection
4.4 Inference of Selection in Protein-Coding Sequences
5 Linkage and Recombination
5.1 The Coalescent with Recombination
5.2 Impact of Linkage on Selection
6 Notes
References
Part II: Statistical Methods for Analyzing Genomes in Populations
Chapter 2: Processing and Analyzing Multiple Genomes Alignments with MafFilter
1 Introduction: Multiple Genome Alignments
2 General Principles on MafFilter Usage
2.1 Serial Processing of Alignment Blocks: Filters
2.2 Option Files and Command Line Arguments
3 MafFilter as a Data Processor
3.1 Extracting Data of Interest
3.2 Statistics with MafFilter
3.3 Pre-Processing the Data for Quality Insurance
3.4 Conversion to Other Formats
4 Examples of Advanced Analyses
4.1 Example Analysis 1: Computing Nucleotide Diversity Along the Genome
4.2 Example Analysis 2: Inferring Phylogenetic Relationships
4.3 Example Analysis 3: Running External Software
4.4 Example Analysis 4: Coordinates Translation from One Species to Another
5 Other Useful Tools
6 Conclusion
7 Note
References
Chapter 3: Data Management and Summary Statistics with PLINK
1 Introduction
2 Materials
3 Methods
3.1 Getting Started: Importing and Merging Data
3.1.1 Variant Call Format
3.1.2 PLINK text ({.ped, .map})
3.1.3 Other Formats
3.1.4 Alternate Chromosome/Contig Sets
3.1.5 Missing Variant IDs
3.1.6 Merging
3.1.7 Filling in Missing Pedigree Information
3.2 Missingness Filters
3.3 Selecting a Sample Subset Without Very Close Relatives
3.4 Minor Allele Frequency Reporting, Filtering
3.5 Hardy-Weinberg Equilibrium Statistics
3.6 Selecting a SNP Subset in Approximate Linkage Equilibrium
3.7 Principal Component Analysis
3.8 Sex Validation and Imputation
3.8.1 Subpopulation Allele Frequencies, and -read-freq
3.9 Reporting Linkage Disequilibrium Statistics
3.10 Data Export
4 Notes
References
Chapter 4: Exploring Population Structure with Admixture Models and Principal Component Analysis
1 Introduction
2 Materials
3 Methods
3.1 Subsetting Data
3.2 Filter Out SNPs to Remove Linkage Disequilibrium (LD)
3.3 Running ADMIXTURE
3.3.1 An Example Run with Visualization
3.3.2 Considering Different Values of K
3.3.3 Some Advanced Options
3.4 PCA with SMARTPCA
3.4.1 Running PCA
3.4.2 Plotting PCA Results with PCAviz
4 Discussion
References
Chapter 5: Detecting Positive Selection in Populations Using Genetic Data
1 The Selective Sweep Theory
2 Methods to Detect Selective Sweeps in Genome-Wide Data
2.1 Detecting Sweeps Based on Diversity Reduction
2.2 The SFS Signature of a Selective Sweep
2.3 The LD Signature of a Selective Sweep
2.4 Detecting Sweeps Using Machine Learning Methods
3 The Problem of Demography
4 A Guideline on Selection Detection Tools
4.1 Summary Statistics
4.2 Detecting Sweeps in Whole Genomes
4.2.1 SweepFinder
4.2.2 SweeD
4.2.3 SweepFinder2
4.2.4 OmegaPlus
4.2.5 MFDM Test
4.3 RAiSD
5 Evaluation
5.1 Detection Accuracy
5.2 Execution Time
6 Machine Learning for Population Genetics
6.1 Machine Learning Background
6.2 Categories of Machine Learning
6.3 Algorithms in Machine Learning
7 Methods
7.1 Data Generation
7.2 Computing Summary Statistics
7.3 Application of Classification Algorithms
7.4 Dataset Manipulation
7.5 Feature Selection
8 Results
8.1 Reducing the Feature Space
8.2 Evaluation
8.2.1 Logistic Regression
8.2.2 Random Forests
8.2.3 K Nearest Neighbors
8.2.4 Support Vector Machines
9 Discussion
References
Chapter 6: polyDFE: Inferring the Distribution of Fitness Effects and Properties of Beneficial Mutations from Polymorphism Data
1 Introduction
1.1 Modelling the Properties of Mutations on Fitness
1.2 Calculating the Rate of Adaptive Evolution, Ξ±
2 Pre-processing of the Data
2.1 The Type of Information Required by polyDFE
2.2 Example of a polyDFE Input File
2.3 Note on SFS Data
3 Model Fitting with polyDFE
3.1 Specifying a DFE Model to Fit Using polyDFE
3.2 Note on Likelihood Maximization
4 Post-Processing of the polyDFE Output
4.1 Example of a polyDFE Output File
4.2 Merging and Parsing Output Files
4.3 Summarizing the DFE Estimated by polyDFE
4.4 Estimating Ξ±
5 Hypothesis Testing and Model Averaging
5.1 Bootstrap-Based Confidence Intervals
5.2 Hypothesis Testing
5.3 Model Averaging with polyDFE
5.4 Note on Divergence Data
6 Conclusion
References
Chapter 7: MSMC and MSMC2: The Multiple Sequentially Markovian Coalescent
1 Introduction
1.1 MSMC
1.2 MSMC2
2 Software Overview
2.1 MSMC
2.2 MSMC2
2.3 MSMC-Tools
2.4 Data Requirements
2.4.1 Diploid Data
2.4.2 Phasing
2.4.3 Complete Genomes
2.4.4 High Coverage Data
3 Input Data Format
3.1 Generating VCF and Mask Files from Individual BAM Files
3.2 Phasing
3.3 Combining Multiple Individuals into One Input File
4 Running MSMC and MSMC2
4.1 Resource Requirements
4.2 Test Data
4.3 Running MSMC
4.4 Running MSMC2
4.5 Plotting Results
5 Tips and Tricks
5.1 Bootstrapping
5.2 Controlling Time Patterning
References
Chapter 8: Ancestral Population Genomics with Jocx, a Coalescent Hidden Markov Model
1 Introduction
2 Software
2.1 Preparing Data
2.2 Inferring Parameters
2.2.1 NM
2.2.2 GA
2.2.3 PSO
3 Simulation, Execution, and Result Summarization
4 Conclusions
References
Chapter 9: Coalescent Simulation with msprime
1 Introduction
2 Running Simulations
2.1 Trees and Replication
2.2 Population Models
2.2.1 Exponentially Growing/Shrinking Populations
2.3 Mutations
2.4 Population Structure
2.5 Demographic Events
2.5.1 Migration Rate Change
2.5.2 Mass Migration
2.5.3 Population Parameter Change
2.6 Ancient Samples
2.7 Recombination
3 Processing Results
3.1 Computing MRCAs
3.2 Sample Counts
3.3 Obtaining Subsets
3.4 Processing Variants
3.5 Incremental Calculations
3.6 Exporting Variant Data
4 Validating Analytic Predictions
4.1 Total Branch Length and Segregating Sites
4.2 Recombination
5 Example Inference Scheme
6 Discussion
References
Chapter 10: Inference of Ancestral Recombination Graphs Using ARGweaver
1 Overview
1.1 What Is an ARG?
1.2 Why Would You Want to Estimate an ARG?
1.3 Practical Considerations
1.4 ARGweaver Algorithm Overview
1.4.1 ARGweaver Model and Assumptions
2 Ancient Hominins Analysis
2.1 Pre-requisites
2.2 Obtaining and Installing ARGweaver
2.3 Sequence File Format
2.4 SITES Format
2.4.1 Phasing Options
2.5 Masked Regions
2.5.1 Genomic vs Variant VCFs
2.5.2 Genotype Probabilities
3 Choosing Model Parameters
3.1 Mutation Rates
3.2 Recombination Rates
3.3 Population Size
3.4 Time Discretization
4 Other Options
4.1 Sampling Frequency
4.2 Ancient Samples
4.3 Site Compression
5 Running ARGweaver
5.1 Time/Memory Requirements
5.2 Monitoring Convergence
5.2.1 Resuming a Run
6 Interpreting Results
6.1 Leaf Trace Plots
6.2 Computing Basic ARG Statistics
6.2.1 Examining Local Trees
6.2.2 Allele Age
6.2.3 Neandertal Introgression
7 Discussion
References
Part III: Advances in Population Genomics
Chapter 11: Population Genomics of Transitions to Selfing in Brassicaceae Model Systems
1 Introduction
2 The Molecular Basis of the Loss of SI and Evolution of Self-Fertilization in Brassicaceae
3 Population Genetics Consequences of Selfing
3.1 Theoretical Expectations
3.2 Empirical Results
4 Discovering the Geographic Origin and the Timing of the Mating System Shift
5 Some Caveats
6 Future Directions
References
Chapter 12: Genomics of Long- and Short-Term Adaptation in Maize and Teosintes
1 Introduction
2 How to Explore Adaptation?
3 What Constraints Adaptation?
4 Mechanisms of Genetic Adaptation in Maize and Teosintes
5 Local Adaptation in Maize and Teosintes
6 How Convergent Is Adaptation?
7 What Is the Role of Phenotypic Plasticity?
8 Conclusion
References
Chapter 13: Neurospora from Natural Populations: Population Genomics Insights into the Life History of a Model Microbial Eukar...
1 Introduction: Fungi and Population Genomics
2 The Rise of Neurospora as a Model for Evolutionary and Ecological Genetics
3 Neurospora Population Genomics Has Revealed Cryptic Species with Large Variation in the Extent of Their Geographical Distrib...
3.1 Nothing Is Generally Everywhere
3.2 Geographic Endemicity Within Globally Distributed Neurospora Morphospecies
3.3 On the Difficulty of Species Diagnosis in Neurospora and Fungi
3.4 Population Structure Within Neurospora Phylogenetic Species
3.5 Comparative Population Genomics of Selfing and Outcrossing Neurospora Species
4 Neurospora Population Genomics Has Refined Our Views on the Permeability of Barriers to Gene Flow
5 Studies Neurospora Provide Insights into the Genetic Basis of (Potentially Adaptive) Phenotypes in Wild Microbial Eukaryotes
6 Conclusion
7 Notes
References
Chapter 14: Population Genomics of Fungal Plant Pathogens and the Analyses of Rapidly Evolving Genome Compartments
1 Introduction
2 Key Discoveries from Population Genomics in Plant Fungal Pathogens
2.1 High Recombination Rates and Population Admixture Contribute to Rapid Adaptation of Fungal Plant Pathogen Genomes
3 Fungal Plant Pathogen Genomes Are Often Compartmentalized, A Trait Driven by Transposable Elements
4 Interspecific Hybridization Contributes to Genome Evolution of Fungal Plant Pathogens
5 Discovering Variation in Population Genomic Data
5.1 Variant Calling Through Short-Read Mapping: Methods and Limits
6 De Novo Assembly and the Rise of Long-Read Sequencing
7 Detection of Structural Variation in Genomes
8 Conclusion
References
Chapter 15: Population Genomics on the Fly: Recent Advances in Drosophila
1 Introduction
2 Data Sources
2.1 Data Acquisition Techniques
2.1.1 Isofemale Inbred Lines
2.1.2 Haploid Embryo Sequencing
2.1.3 Genomic Sequencing and Phasing of Hemiclones
2.1.4 Pooled Sequencing (Pool-Seq)
2.2 Consortia and Available Datasets
2.2.1 Drosophila Genetic Reference Panel (DGRP) and Drosophila Population Genomics Project (DPGP)
2.2.2 Drosophila Population Genomics Projects
2.2.3 The Drosophila Genome Nexus
2.2.4 Dros-RTEC and DrosEU
2.2.5 Other Data
3 Neutral Evolution
3.1 Demographic Analyses
3.1.1 Out of Africa
3.2 Recombination
3.3 Biased Gene Conversion
3.4 Population Genetics of Chromosomal Inversions
3.5 Population Genomics of Transposable Elements
4 Selection
4.1 Hitchhiking Effects
4.2 Recurrent Hitchhiking and Background Selection
4.3 Selection on Noncoding DNA
4.4 Selection on Synonymous Codon Usage
4.5 Adaptive Chromosomal Inversions
4.6 Adaptive Insertions of Transposons
4.7 Faster-X Evolution
5 Perspectives: Temporal and Geographical Clines
6 Notes
References
Chapter 16: Genomic Access to the Diversity of Fishes
1 Diversity of Fishes
2 The Genomic Makeup of Fishes
3 Genomics in Studies on the Biology of Fishes
References
Chapter 17: Avian Population Genomics Taking Off: Latest Findings and Future Prospects
1 Introduction
2 Latest Findings
2.1 Relevance of Genomic Insight for Evolution
2.2 Relevance of Genomic Insight for Conservation
2.3 Locus-Level Work to Examine the Genetic Basis of Phenotypic Traits
2.4 Locus-Level Work to Understand the Genetics of Adaptation and Speciation
3 Roadblock: Genome Assemblies, Novel Genes and Structural Variants
4 Prospects
4.1 Continued Application of Population Genomics for Conservation
4.2 Control for Alternative Processes in Genome Scans and Expand Studies to Focus on the Process of Speciation
4.3 Expand Beyond Studies of Genetic Variation Alone
4.4 Integrate Population Genomics with Additional Fields
5 Conclusion
Glossary
References
Further Reading
Chapter 18: Population Genomics of the House Mouse and the Brown Rat
1 Introduction
1.1 History of the House Mouse
1.2 Brown Rat History
2 Population Genomics
2.1 House Mouse Genetic Variation
2.2 Brown Rat Genetic Variation
3 Examples of Genes Under Positive Selection
3.1 Rodent Resistance to Anticoagulants: Vkorc1
3.2 Pathogen Related Resistance: Xpr1
3.3 Segmental Duplications and Selective Sweeps: R2D2
3.4 The t-Haplotype as Meiotic Drive Element
4 Conclusion
5 Note
References
Chapter 19: Population Genomics in the Great Apes
1 Species Trees and Incomplete Lineage Sorting
2 Gene Flow and Demography
3 Selection
4 Recombination
5 The X Chromosomes of Great Apes
6 Conclusion
References
Correction to: Statistical Population Genomics
Index
π SIMILAR VOLUMES
Since the first edition, published in 2001, genomics research has taken great strides. In this updated second edition, a team of expert researchers share the most current information in a field that has recently switched emphasis from gene identification to functional genomics and the characterizati
<span>Book by</span>
<p><P>While there is a wide selection of 'by experts, for expertsβ books in statistics and molecular biology, there is a distinct need for a book that presents the basic principles of proper statistical analyses and progresses to more advanced statistical methods in response to rapidly developing te
<span>This detailed volume provides an overview of recent advances in the application of genomic technologies in several domains of marine biology, raising awareness of various DNA- and RNA-based technologies. Genomic methods are essential in identifying previously undetected taxonomic (e.g. DNA bar