Comparative Genomics: Methods and Protocols (Methods in Molecular Biology, 2802)
โ Scribed by Joรฃo Carlos Setubal (editor), Peter F. Stadler (editor), Jens Stoye (editor)
- Publisher
- Humana
- Year
- 2024
- Tongue
- English
- Leaves
- 622
- Edition
- 2
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
This second edition provides new and updated chapters covering computational and mathematical techniques and concepts related to the field of comparative genomics. The topics covered in the chapters range from those that address general techniques and concepts that apply to all organisms to others that are specialized and apply to specific biological systems such as viruses, bacteria, nematodes, and insects. Well-known comparative genomics web-based platforms are also covered in specific chapters. Written in the highly successful Methods in Molecular Biology series format, many chapters include introductions to their respective topics and step-by-step comparison procedures, demonstrated on actual sets of genome sequences.
Authoritative and cutting-edge, Comparative Genomics: Methods and Protocols, Second Edition aims to ensure successful results in the further study of this vital field.
โฆ Table of Contents
Preface
Contents
Contributors
Chapter 1: The Theory of Gene Family Histories
1 Introduction
2 Scenarios
2.1 Notation
2.2 Reconciliation
2.3 Relaxed Scenarios
2.4 Event-Labels
3 Best Match Graphs and Orthology
3.1 Definition and Characterization
3.2 Orthology in the Absence of HGT
3.3 Clusters of Orthologous Genes
4 Fitch Graphs and Horizontal Gene Transfer
4.1 Definition and Characterization
4.2 LDT Graphs
4.3 Orthology in the Presence of HGT
5 Discussion and Open Problems
References
Chapter 2: Protein-Coding Gene Families in Prokaryote Genome Comparisons
1 Introduction
2 Requirements and Assumptions
3 Datasets
4 Software
5 Generating Orthologs by Various Methods
5.1 OrthoMCL
5.2 Cluster of Orthologous Groups
5.3 OrthoFinder2
5.4 OMA
5.5 Performance of Orthology Inference Methods
6 Obtaining Phylogenetic Trees from Orthogroups
6.1 OrthoFinder2
6.2 OMA
7 eggNOG-Mapper
8 Obtaining Orthologs from Ortholog Databases
8.1 OrthoDB
8.2 eggNOG
9 Conclusion
References
Chapter 3: Family-Free Genome Comparison
1 Introduction
2 Overview and Gene Similarity Graphs
3 Gene Cluster Analysis
4 Genomic Similarities, Ancestral Reconstruction, and Genomic Distances Leading to Gene Family Inference
4.1 Similarity Measures and Median Based on Conserved Adjacencies
4.1.1 Induced Conserved Adjacencies
4.1.2 Median of 3 Based on Restricted Conserved Adjacencies
4.2 Distance Measures Based on Genome Rearrangements
4.2.1 DCJ Distance Induced by an Ortholog-Set
4.2.2 DCJ-Indel Distance Induced by an Ortholog-Set
5 Technical Details and Comparative Discussion of the Methods
5.1 Determining Chromosomal Gene Orders
5.2 Local Sequence Alignment Scores and Gene Relationship Graphs
5.3 Comparing the Methods and Their Implementations
6 Conclusion
References
Chapter 4: Methods for Pangenomic Core Detection
1 Introduction
2 Data Structures and Algorithms
2.1 Review of k-mer-Based Counting Methods
2.2 Review of k-mer-Based Indexing Methods
2.3 Review of de Bruijn Graph-Based Methods
3 Solutions for Sequence-Based Core Detection
3.1 k-mer-Based Core Detection Using Pangrowth
3.2 Graph-Based Core Detection Using Corer
4 Conclusion
References
Chapter 5: Step-by-Step Bacterial Genome Comparison
1 Introduction
2 Requirements and Assumptions
3 Datasets
4 Software
5 Genome Annotation
6 Pangenome Reconstruction and Visualization
6.1 Ortholog Gene Computation and Clustering
6.2 Open and Closed Pangenomes
6.3 Comparison of Gene Content
6.4 Pangenome-Wide Association Studies
7 Phylogenetic Tree Based on Core Genome Alignment
8 Pangenome Graphs
9 Identification of Sequences of Interest in Genomic Data
9.1 Prediction of Antimicrobial-Resistance Genes and Virulence Factors
9.2 Phage Sequence Prediction
10 Conclusion
References
Chapter 6: How to Obtain and Compare Metagenome-Assembled Genomes
1 Introduction
2 How MAGs Can Be Obtained
3 Quality Checking
3.1 Completeness and Contamination
3.2 MAG Quality Standards
4 MAG Annotation
5 MAG Comparative Analysis
5.1 MAG Pairwise Comparison
5.2 MAG Databases
5.3 Taxonomic Classification
5.4 MAGset and MAGcheck
6 Practical Example
6.1 Assumptions
6.2 Sample Description
6.3 Downloading the Data
6.4 Preprocessing
6.5 MAG Assembly Pipeline
6.6 Taxonomic Classification
6.7 Searching MAGs Against the GEM Database
6.8 Merging GTDB-Tk and GEM Comparison Results
6.9 Annotating MAGs with PGAP
6.10 Comparing and Analyzing MAGs with MAGset
6.11 Running MAGcheck
7 Conclusion
References
Chapter 7: Comparative Genome Annotation
1 Introduction
2 Noncomparative Approaches
3 Protein Homology
4 Annotating a Single Target Genome Using Genome Alignments
4.1 Evolutionary Search for Exons
4.2 Comparative Gene Prediction
4.3 Annotation Mapping
5 Multi-genome Annotation
5.1 Simultaneous Gene Prediction in Multiple Genomes
5.2 Cross-Species Consistency of Gene Sets
6 Parameter Training
7 Related Tasks: Visualization, Quality Control, Non-coding Genes, Functional Annotation, and Submission
7.1 Visualization
7.2 Quality Control
7.3 Functional Annotation for a GenBank Submission
8 Discussion
References
Chapter 8: Annotation and Comparative Genomics of Prokaryotic Transposable Elements
1 The Importance of Prokaryotic Transposable Elements
2 Common Structural Features of Prokaryotic Transposable Elements
3 Insertion Sequence Landmarks and Diversity
4 Transposons
4.1 Compound Transposons Landmarks
4.2 Tn3 Family Landmarks
4.3 Tn7 Family Landmarks
4.4 Tn402 Family Landmarks
4.5 Tn554 Family Landmarks
5 A Comparative Genomics Point-of-View on the Genomic Evolution, Impact, and Emergence of Antimicrobial Resistance and Pathoge...
5.1 IS General Considerations
5.2 An Example of Capture and Evolution of Antibiotic Resistance in Tn3 Family
5.3 What We Have Learned About Plant-Pathogen and Their Associated Transposons?
6 Bioinformatics Guidelines for the Manual Curation of Prokaryotic Transposable Elements
7 Concluding Remarks
Bibliography
Chapter 9: Genome Rearrangement Analysis
1 Introduction
2 Preliminaries
2.1 Gene Orders
2.2 Rearrangement Model
2.3 Genome Rearrangement Analysis
3 Cut and Join Genome Rearrangement Models
3.1 Genome Rearrangement Graphs
3.2 Single-Cut or Join Model
3.3 Single-Cut and Join Model
3.4 Double-Cut and Join Model
3.5 Multi-Cut and Join Model
4 Rearrangement Models with Intergenic Regions
5 Preserving Genome Rearrangement Models
5.1 Strong Interval Tree
5.2 Algorithms for the Preserving Inversion Model
5.3 Related Preserving Problems
6 Conclusion
References
Chapter 10: AGO, a Framework for the Reconstruction of Ancestral Syntenies and Gene Orders
1 Introduction
2 Methods
2.1 Input Data
2.1.1 Species Tree
2.1.2 Genomes and Genes
2.1.3 Homologous Gene Families
2.1.4 Gene Families Data
2.1.5 Data Formats
2.2 Pipeline Tools
2.2.1 MSA: MACSE
2.2.2 Gene Trees: IQ-TREE
2.2.3 Reconciled Gene Trees: GeneRax, ALE, and ecceTERA
2.2.4 Candidate Ancestral Gene Adjacencies: DeCoSTAR
2.2.5 Ancestral Gene Orders: SPP_DCJ
2.2.6 Full Pipelines
2.3 Implementation
3 Use Case
3.1 Data
3.2 Pipeline Parameters
3.3 Results
4 Discussion
References
Chapter 11: A Guide to Phylogenomic Inference
1 Introduction
2 Which Data to Use?
2.1 Sequence Data
2.1.1 Whole Genomes
2.1.2 Local Collinear Blocks
2.1.3 Pan-Genome
2.1.4 Core Genome
Core-coding Genes
Exons
Introns
Other Regions
Caution When Using SNPs
2.1.5 Taking Heterozygosity into Account
2.2 Other Types of Data
2.2.1 Genome Rearrangements
2.2.2 Presence/Absence Data
Indels
Gene Content
Presence/Absence of Other Molecular Markers
Mobile Elements
2.2.3 Repetitive Loci
2.2.4 Morphology
2.3 Analyzing It All Together
3 Estimating Gene Trees
3.1 Multiple Alignment
3.2 Phylogenetic Methods
3.2.1 Maximum Parsimony
3.2.2 Distance Methods
3.2.3 Maximum Likelihood
3.2.4 Bayesian Inference
3.2.5 What Is the Best Phylogenetic Method?
3.3 Tree Search
3.4 Models of Evolution
3.4.1 DNA Models
3.4.2 Spatial Heterogeneity of Rates
3.4.3 Temporal Heterogeneity of Rates
3.4.4 RNA Models
3.4.5 Protein Models
3.4.6 Codon Models
3.4.7 General Discrete-State Model
3.5 Model Choice
3.6 Partitioning
3.7 Controlling for Biases When Inferring Gene Trees
3.7.1 Possible Sources of Bias Prior to a Multiple Alignment
3.7.2 Sampling Errors
3.7.3 Systematic Errors
3.8 Inferring Clade Support
3.8.1 Bootstrap
3.8.2 Ultrafast Bootstrap (UFBoot)
3.8.3 Jackknife
3.8.4 Posterior Probability
3.8.5 Local Support Measures
3.8.6 Decay and Double-Decay Indices
3.8.7 Transfer Bootstrap Expectation
3.8.8 Rootstrap
3.8.9 Comparison of Support Indices
3.9 Tree Distances
3.10 Assessing Significance of Non-optimal Trees
4 Estimating the Species Tree
4.1 Supermatrix Analysis
4.2 Supertree Methods
4.3 Species Tree Methods
4.4 What Is the Best Phylogenomic Method?
4.5 Exploring and Filtering Data
4.5.1 Taxon Subsampling
4.5.2 Data Reduction
4.5.3 Binning Data Under Supertree or Species Tree Analyses
4.5.4 Testing Subsamples of Data
4.5.5 Testing Only Relationships of Interest
4.6 Rooting Trees
5 Reticulated Trees and Phylogenetic Networks
6 Additional Comments
7 A Practical Example of Phylogenomic Analysis
7.1 Gathering the Data Matrices
7.1.1 Retrieving Homologous Families
7.1.2 Filtering Orthologous Families
7.1.3 Multiple Alignment of Orthologous Families
7.1.4 Building the Supermatrix
7.1.5 Generating the Matrix of Gene Presence/Absence
7.1.6 Generating the Matrix of Indels
7.2 Phylogenomic Inference
References
Chapter 12: Comparative RNA Genomics
1 Introduction
2 Hallmarks of Conserved RNA
2.1 Structured RNAs
2.2 Non-Structured RNAs
2.3 Unspliced and Unstructured lncRNAs
3 Homology Search
3.1 Sequence-Based Methods
3.2 Secondary Structure Descriptors
3.3 Covariance Models
3.4 Clustering of RNA Secondary Structures
4 Methods for Comparative RNA Structured Discovery
4.1 Sequence Alignment-Based Methods
4.1.1 Energy Directed Folding
4.1.2 Approaches Based on Covariance Models
4.2 Structural Alignments
4.2.1 Computational Methods
4.2.2 Computational Methods Applied for Genome-Wide Discovery
4.3 Structured RNAs in Coding Regions
4.4 Evaluating Large Scale Screens
4.5 Alignment Shuffling Techniques
4.6 A Puzzling Lack of Consistency
4.7 RNA Genes from Conserved Splice Sites
4.8 RNA Gene Finding in Procaryotes
5 Target Prediction
5.1 General Principles
5.2 Restricted RNA-RNA Interaction Models
5.3 RIP Model
5.4 Conserved Interaction Sites
5.5 Fast Genome-Wide Approaches
6 RNA Structure-Seq Methods
7 Protein-RNA Interactions
8 Toward RNA-DNA-Protein Interactions: An Example in CRISPR gRNA Design
9 Concluding Remarks
References
Chapter 13: Bioinformatic Approaches for Comparative Analysis of Viruses
1 Introduction
2 Materials
2.1 Desktop or Laptop Computer
2.2 Google Colab
2.3 Key Online Platforms and Tools
3 Methods
3.1 First Things First: Grouping Genomes
3.1.1 Step 1: Select a Genome of Interest
3.1.2 Step 2: BLAST Query Genome
3.1.3 Step 3: Select Similar Genomes to Compose the Group
3.2 Small RNA Genome Multiple Sequence Alignments
3.2.1 Step 1: Select Similar Sequences to Be Aligned
3.2.2 Step 2: Open MAFFT Online Tool
3.2.3 Step 3: Perform Multiple Sequence Alignment
3.2.4 Step 4: Review Multiple Sequence Alignment
3.2.5 Step 5: Identify Indels and Misalignments
3.3 Comparative Analysis Using Proteins
3.3.1 Step 1: Understand Genome Structure and Marker Genes
3.3.2 Step 2: Recover Glycoprotein Precursor Sequence from the Genome of Interest
3.3.3 Step 3: Select Orthologous Proteins from Close Species
3.3.4 Step 4: Perform Protein Multiple Sequence Alignment
3.3.5 Step 5: Alignment Visualization and Conserved Domains
3.3.6 Step 6: Phylogenetic Analysis
3.4 Variant Calling and Annotation
3.4.1 Step 1: Configure Your Computer
3.4.2 Step 2: Download Sample Data and Reference Genome
3.4.3 Step 3: Quality Check and Data Cleaning
3.4.4 Step 4: Mapping and Coverage Metrics
3.4.5 Step 5: Variant Calling and Annotation
3.5 Virome Analyses and Pathogen Identification
3.5.1 Step 1: Configure Your Computer
3.5.2 Step 2: Download and Organize Data
3.5.3 Step 3: Human Contamination Removal
3.5.4 Step 4: Contig Assembly
3.5.5 Step 5: Taxonomic Identification
3.5.6 Step 6: Genome Recovery
3.5.7 Step 7: (Optional) Online Platforms for Metagenomics Analyses
4 Notes
References
Chapter 14: Comparative Analyses of Bacteriophage Genomes
1 Introduction
2 Materials
3 Methods
3.1 Preparing Your Dataset: Obtaining Phage Genomes
3.2 CDS Prediction and Functional Annotation
3.3 ANI and AAI
3.4 Clustering Strategies
3.4.1 Nucleotide Clustering
3.4.2 Predicted Proteome Clustering
3.5 Shared Proteins and Phylogeny
3.6 Taxonomy
3.7 Other Visualization Methods
4 Conclusion
References
Chapter 15: Comparative Genomics of Sex, Chromosomes, and Sex Chromosomes in Caenorhabditis elegans and Other Nematodes
1 Introduction
2 Comparative Genomics of Sex
2.1 Comparative Genomics Reveals Hybrid Origin of Asexuality
2.2 The Evolution of Hermaphroditism Is Accompanied by Loss of Male-Specific Genes
2.3 Genetic Determinants of Sex Were Lost in the Bursaphelenchus Lineage
2.4 Multiple Meiosis-Related Genes Are Taxonomically Restricted Orphan Genes
3 Comparative Genomics of Chromosomes
3.1 Chromosomes Arms and Centers Exhibit Distinct Genomic Signatures
3.2 Modular Rearrangements of Nigon Elements Shapes Chromosome Evolution
4 Comparative Genomics Sex Chromosomes
4.1 Fusions of Nigon Elements Frequently Impact the Evolution of Sex Chromosomes
4.2 Sex Chromosomal Gene Dosage Is Differentially Regulated Across Nematodes
5 Conclusions
References
Chapter 16: Comparative Evolutionary Genomics in Insects
1 Introduction
2 Sequencing Genomes
2.1 DNA Extraction
2.2 Short-Read Sequencing
2.3 Long-Read Sequencing
2.4 Scaffolding Technologies
3 Assembling Genomes
3.1 Assembly Using Short-Reads
3.2 Assembly Using Long-Reads
3.3 Merging Assemblies and Gap- Filling
3.4 Post Assembly Polishing and QC
4 Repetitive Elements
4.1 Background
4.2 Annotating REs
4.3 RE Analyses in Insects
5 Protein-Coding Annotation
6 Evolutionary Analyses
6.1 Coding Sequence Extraction
6.2 Orthology Detection
6.3 Gene Family Expansions and Contractions
6.4 Detecting Domain Absences and Rearrangements
6.5 Detecting Signals of Selection
6.6 Transcription Factor Identification and DNA Binding Motif Analysis
6.7 Functional Enrichment Analyses
7 Discussion
References
Chapter 17: Comparative Methods for Demystifying Spatial Transcriptomics
1 Introduction
2 Materials
2.1 Visium ST Experiments
2.2 Software
2.2.1 The Space Ranger Pipeline
2.2.2 The Loupe Browser
2.2.3 System Tools and Third-Party Applications
2.3 Genome and Transcriptome References
3 Methods
3.1 Preparing an SR Run
3.1.1 Preparing the Fiducial Images
3.1.2 Preparing the Reads
3.1.3 Preparing the Genome and the Transcriptome References
3.2 The Core SR Quantification Pipeline
3.2.1 Image Analysis
3.2.2 Read Mapping
3.2.3 Molecule and Gene Expression Quantitation
3.3 Secondary SR Analyses
3.3.1 Spatial Clustering
3.3.2 Comparing Gene-Expression Clusters
3.3.3 Aggregating SR Count Runs
4 Discussion
5 Concluding Remarks
References
Chapter 18: Comparative Genomic Analysis of Bacterial Data in BV-BRC: An Example Exploring Antimicrobial Resistance
1 Introduction
2 Material
3 Methods
4 Conclusion
References
Chapter 19: VEuPathDB Resources: A Platform for Free Online Data Exploration, Integration, and Analysis
1 Introduction
2 Site Search in VEuPathDB
3 Specialized Searches
4 The Search Strategy System
5 The Genome Browser
5.1 Example #1
5.2 Example #2
6 Getting Additional Help
References
Chapter 20: A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics an...
1 Introduction
2 Overview of the Structure and Terminology of MIxS
3 Checklists Describe Sampling and Sequencing Methods
4 Extensions Describe Sample and Sampling Contexts
5 Use of Ontologies and Value Sets
6 MIxS Versions
7 Methods
7.1 How to Access the MIxS Standard
7.2 How to Use the MIxS Standard for Data Use, Reuse, and Analysis
7.3 How to Use the MIxS Standard for Data Submission
7.4 How to Specify Sample Environments Using the EnvO Ecosystem Classification
8 A Primer on Using MIxS: The MIMS Checklist and Soil Extension
9 Discussion
9.1 How to Contribute to Future Development of the MIxS Standard
9.2 Beyond Sequence Data Standards: Metabolomics and Proteomics
9.3 Partnerships and Alignment with Other Standards
10 Conclusion
References
Index
๐ SIMILAR VOLUMES
Since the first edition, published in 2001, genomics research has taken great strides. In this updated second edition, a team of expert researchers share the most current information in a field that has recently switched emphasis from gene identification to functional genomics and the characterizati
<span>Book by</span>
<span>This detailed volume provides an overview of recent advances in the application of genomic technologies in several domains of marine biology, raising awareness of various DNA- and RNA-based technologies. Genomic methods are essential in identifying previously undetected taxonomic (e.g. DNA bar
<p><span>This volume presents the latest protocols for both laboratory and bioinformatics based analyses in the field of marine genomics. The chapters presented in the book cover a wide range of topics, including the sampling and genomics of bacterial communities, DNA extraction in marine organisms,
<span>Genomic imprinting is the process by which gene activity is regulated according to parent of origin. Usually, this means that either the maternally inherited or the paternally inherited allele of a gene is expressed while the opposite allele is repressed. The phenomenon is largely restricted t