Institute of Food Research, Norwich, U.K. Methods in Molecular Biology Series, Volume 24. First of a two-part practical aid to the researcher who uses computers for the acquisition, storage, or analysis of nucleic acid or protein sequences. Plastic comb binding. 11 contributors, 3 U.S.
Deep Sequencing Data Analysis (Methods in Molecular Biology, 2243)
โ Scribed by Noam Shomron (editor)
- Publisher
- Humana
- Year
- 2021
- Tongue
- English
- Leaves
- 376
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
This second edition provides new and updated chapters from expert researchers in the field detailing methods used to study the multi-facet deep sequencing data field. Chapters guide readers through techniques for processing RNA-seq data, microbiome analysis, deep learning methodologies, and various approaches for the identification of sequence variants. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls.
Authoritative and cutting-edge, Deep Sequencing Data Analysis: Methods and Protocols, Second Edition aims to ensure successful results in the further study of this vital field.
โฆ Table of Contents
Preface
Contents
Contributors
Chapter 1: Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing
1 Introduction
2 Variant Detection Workflow
3 Variation Types
3.1 SNVs and Small Indels
3.2 Structural/Copy Number Variation
3.3 Repeat Variation
4 Variant Annotation Algorithms
4.1 Variant Impact on Proteins
4.2 Missense Mutation Prediction Tools
5 Specialized Strategies
5.1 Sample Selection Strategies
5.2 Sequencing Strategies
5.3 Software Strategies
6 Clinical Reporting
6.1 External Data Sources
6.2 Variant Classification
6.3 Secondary Findings
6.4 Strategy of Variant Prioritization
References
Chapter 2: Statistical Considerations on NGS Data for Inferring Copy Number Variations
1 Introduction and Background
2 NGS Technology for CNV Detection: A Brief Summary
3 Computational Approaches
3.1 The SeqSeq Software Tool
3.2 The rSW-seq Algorithm
4 Statistical Model-Based Approaches
4.1 Hidden Markov Models
4.2 Shifting Level Model
4.3 Change Point Model
4.3.1 Frequentist Approaches
4.3.2 Bayesian Approaches
Modeling the Log Ratio of Read Counts NGS Data
Modeling the NGS Read Count Data Using Data Transformation
Modeling the NGS Read Count Data Using an On-Line Change Point Model
4.4 Penalized Regression Approach
4.4.1 For Single Subject Profile
4.4.2 For Multiple Subject Profiles
5 Conclusions
References
Chapter 3: Applications of Community Detection Algorithms to Large Biological Datasets
1 Introduction
1.1 Clustering Analysis
1.2 Networks and Community Detection
2 Materials
2.1 Single-Cell and ``Bulkยดยด RNA Sequencing Datasets
2.2 KNN Network Construction and Visualization
2.3 Community Detection Algorithms
2.4 Statistical Measures for Comparing NBC and Other Common Clustering Algorithms
2.5 Tissue-Specific Reference Genes Lists from the Human Protein Atlas
3 Methods
3.1 A Workflow for Networks Based Clustering (NBC)
3.2 NBC Accurately Resolves Seven Cell Types from a Glioblastoma Single-Cell RNA-seq Dataset
3.3 Comparing NBC with Other Common Clustering Methods
3.4 NBC Can Be Used to Resolve Tissue-Specific Genes
4 Notes
References
Chapter 4: Processing and Analysis of RNA-seq Data from Public Resources
1 Introduction
1.1 The Rise of RNA-seq
1.2 Data Availability and Impact
2 Public Resources Overview
2.1 The Cancer Genome Atlas: TCGA
2.1.1 Introduction
2.1.2 Data Access
2.2 Genotype-Tissue Expression: GTEx
2.2.1 Introduction
2.2.2 Data Access
2.3 Cancer Cell Line Encyclopedia: CCLE
2.3.1 Introduction
2.3.2 Data Access
2.4 Human Induced Pluripotent Stem Cell Initiative: HipSci
2.4.1 Introduction
2.4.2 Data Access
2.5 Expression Atlas
2.5.1 Introduction
2.5.2 Data Access
3 Analysis
3.1 General Pipeline
3.2 Combining RNA-seq Data from Different Sources
3.2.1 General Approach
3.2.2 Batch Correction
4 Notes
References
Chapter 5: Improved Analysis of High-Throughput Sequencing Data Using Small Universal k-Mer Hitting Sets
1 Introduction
2 Materials
3 Methods
3.1 Definitions
3.2 DOCKS
3.3 DOCKSany and DOCKSanyX
3.4 Results
3.5 Potential Applications
4 Notes
References
Chapter 6: An Introduction to Whole-Metagenome Shotgun Sequencing Studies
1 Introduction
2 Metagenomic Assembly
3 Community Profiling
4 Functional Profiling
5 Future Perspectives
6 Conclusion
References
Chapter 7: Microbiome Analysis Using 16S Amplicon Sequencing: From Samples to ASVs
1 Introduction
2 Materials
3 Methods
3.1 Experimental Processing
3.1.1 Experiment Design
3.1.2 Sample Processing
3.1.3 DNA Extraction
3.1.4 PCR Amplification
3.1.5 Sequencing
3.2 Bioinformatic Analysis
3.2.1 Demultiplexing
3.2.2 Denoising
3.3 Select Topics in Downstream Processing
3.3.1 Controlling for Contamination
3.3.2 Phylogenetic Tree Generation
3.3.3 Rarefaction
3.4 Full Bioinformatic Processing Pipeline Using Qiime2
3.4.1 Import the Deep-Sequencing Output Reads
3.4.2 Demultiplexing
3.4.3 Quality Filtering
3.4.4 Denoising Using deblur
3.4.5 Creating a Phylogenetic Tree
3.5 Conclusions
References
Chapter 8: RNA-Seq in Nonmodel Organisms
1 Introduction
2 Materials
3 Methods
3.1 Strategic Decision: Which Sequencing Technology to Use
3.2 Strategic Decision: Which Reference to Use
3.3 Quality Assessment of the Raw Sequence Reads
3.4 Reads Separation Between Organisms
3.5 Transcriptome De Novo Assembly
3.6 Transcriptome Filtering
3.7 Transcriptome QA
3.8 Transcriptome Functional Annotation
3.9 Read Alignment and Quantitation, and Their QA
3.10 Differential Expression and Clustering Analysis
3.11 Pathway/Function Enrichment Analysis
3.11.1 BiNGO
3.11.2 Blast2GO
3.11.3 clusterProfiler
3.11.4 KEGG
3.11.5 Tools Supporting Enrichment Analysis with User-Provided Functional Categories
3.12 A Ready-to-Use Pipeline
4 Notes
References
Chapter 9: Deep Learning Applied on Next Generation Sequencing Data Analysis
1 Introduction
1.1 Deep Learning
1.2 Deep Learning in Medicine
1.3 Deep Learning in Genomic Research
1.4 Deep Learning for Cancer Diagnosis
2 Methods
2.1 Description of Data
2.2 Data Manipulation and Processing
2.3 Description of Model Structure
3 Results and Discussion
3.1 Data Loading and Manipulation Impact on Training Efficiency and Speed
3.2 Results of Model Performance on Microbiome Dataset Sample Classification
3.3 Deteriorating Results for Population from Different Countries
3.4 The Advantages and Limitations of the DL Models
3.5 Future Directions and Exploring Explainability
3.6 Code Availability
References
Chapter 10: Interrogating the Accessible Chromatin Landscape of Eukaryote Genomes Using ATAC-seq
1 Introduction
2 Materials
2.1 Genomic Sequence and Annotation Files
2.2 Software Packages
3 Methods
3.1 Preparation of Genomic Files
3.2 Read Mapping
3.3 Filtering and Deduplicating Alignments
3.4 Genome Browser Track Generation
3.5 Mapping Statistics and ATAC-seq Quality Assessment
3.6 Peak Calling and Identification of Reproducible Peaks
3.6.1 IDR Analysis Pipeline
3.6.2 Removing Known Artifacts
3.7 Merging Peaks and Creating Multisample Data Matrices
3.8 Identifying Differentially Accessible Regions
3.9 Visualizing Signal Around Genomic Features Using Heatmaps
3.10 Data Exploration
3.11 Clustering of Accessible Regions Across Conditions/Cell Types
3.12 Analyzing Variable Motif Accessibility Using chromVAR
3.13 Analyzing Transcription Factor Footprints
3.14 V-plots
References
Chapter 11: Genome-Wide Noninvasive Prenatal Diagnosis of SNPs and Indels
1 Introduction
1.1 Noninvasive Prenatal Diagnosis
1.2 Genome-Wide Analysis of Rare Diseases
1.3 Genome-Wide NIPD of Monogenic Disorders
1.4 Chapter Outline
2 Materials
2.1 Preanalysis Materials
2.2 Computational Tools
3 Methods
3.1 Sample Selection
3.1.1 Family Trios
3.1.2 True Fetal Sample
3.2 Biological and Technical Considerations
3.2.1 Fetal Fraction
3.2.2 Fragment Length Distribution (Fig. 2)
3.2.3 Depth of Coverage
3.2.4 Exome vs Genome Sequencing
3.3 Computational Pipeline for Noninvasive Fetal Variant Calling
3.3.1 Alignment, Deduplication, Indel Realignment
3.3.2 Step 1: Parental Variant Calling
3.3.3 Step 2: Plasma DNA Preprocessing
3.3.4 Step 2: Part A-Calculating the Fetal Fraction and Length Distribution
3.3.5 Step 2: Part B-Identifying Potential Mutation Loci in the cfDNA
3.3.6 Step 3: Bayesian Algorithm for Variant Calling
3.3.7 Machine Learning-Based Variant Recalibration
3.3.8 Machine Learning Model Training
4 Notes
References
Chapter 12: Genome-Wide Noninvasive Prenatal Diagnosis of De Novo Mutations
1 Introduction
1.1 Noninvasive Prenatal Diagnosis
1.2 NIPD of De Novo Mutations
1.3 NIPD of De Novo Mutations Using Machine Learning
1.4 Summary
2 Materials
2.1 Pre-analysis Materials
2.2 Computational Tools
3 Methods
3.1 Sample Selection
3.1.1 Family Trios
3.1.2 True Fetal Sample
3.2 Biological and Technical Considerations
3.2.1 Fetal Fraction
3.2.2 Fragment Length Distribution
3.2.3 De Novo Mutations Rate
3.2.4 Depth of Coverage
3.2.5 Genome Wide Sequencing
3.3 Computational Pipeline for Noninvasive Fetal Variant Calling
3.3.1 Alignment, Deduplication, Indel Realignment
3.3.2 Plasma Variant Calling
3.3.3 Parental Variant Calling
3.3.4 Plasma DNA Preprocessing
3.3.5 Bayesian Algorithm for Variant Calling and Filtering of De Novo Candidates
Variant Processing and Accuracy Assessment
3.3.6 Machine Learning-Based Variant Recalibration
Model Training
4 Notes
References
Chapter 13: Accurate Imputation of Untyped Variants from Deep Sequencing Data
1 Motivation
2 Characteristics of a Reference Panel
2.1 Extensiveness
2.2 Population Structure
2.3 Structural Variants
3 Methods for Imputing Untyped Variants
4 Estimating the Accuracy of Imputation of Untyped Variants
4.1 Statistical Probabilities
4.2 Mask, Reimpute, and Compare
4.3 Genotyping the Individual from Which the Reference Genome Was Obtained
4.4 Leave-One-Out
4.5 Whole-Genome Sequencing
5 Conclusion
References
Chapter 14: Multiregion Sequence Analysis to Predict Intratumor Heterogeneity and Clonal Evolution
1 Introduction
2 Materials
2.1 Study Design
2.1.1 Tissue Preparation
2.1.2 Sequencing Strategy
2.2 Data
3 Methods
3.1 Overview on TUMOR Phylogenetic Tree Construction
3.2 Evolution Model
3.3 Sample Tree and Clone Tree
3.4 Constructing a Tumor Evolutionary Tree
3.5 Software
4 Notes
References
Chapter 15: Overcoming Interpretability in Deep Learning Cancer Classification
1 Introduction
2 Materials
3 Methods
4 Results
5 Conclusion
References
Chapter 16: Single-Cell Transcriptome Profiling
1 Introduction
2 Materials
2.1 Software and Hardware
2.2 Data
3 Methods
3.1 Raw Sequencing Data Processing
3.2 Count Data Quality Control
3.3 Dimensionality Deduction
3.4 Clustering
3.5 Differential Expression
4 Notes
References
Chapter 17: Biological Perspectives of RNA-Sequencing Experimental Design
1 Introduction
2 Materials and Methods: Experimental Design
2.1 Biological Sample Variability
2.2 Sample Size
2.3 Replicates and Pooling
2.4 Randomization
2.5 Batches
2.6 Multiplexing and Barcodes
2.7 Sequence Depth, Coverage and Quality Score
2.8 NGS Platforms
2.9 Validations
2.10 Result Reporting and Presentation
3 Conclusions
References
Chapter 18: Analysis of microRNA Regulation in Single Cells
1 Introduction
2 Materials
2.1 Software
2.2 Data Files
3 Methods
3.1 Import Data
3.2 Estimate Expression Levels and Noise of mRNAs
3.3 Measure miRNA Regulation
4 Using DCA to Denoise Gene Expression Matrix
4.1 Run DCA
4.2 Process DCA Output
4.3 Perform Analysis with DCA Output
5 Notes
6 Conclusions
References
Chapter 19: DNA Data Collection and Analysis in the Forensic Arena
1 Advancements in DNA Reading Ability
2 Human DNA Analysis
3 Early Forensic Science
4 STRs and the Combined DNA Index System (CODIS)
5 NGS Used for Forensics
6 National DNA Databases
7 Direct-to-Consumer (DTC) DNA Access
8 Privacy DNA Rights in the USA
9 Familial DNA Searching
10 Private DNA-Testing Companies
11 Public DNA Databases
12 Should Law Enforcement Have Access to all Genetic Data?
References
Index
๐ SIMILAR VOLUMES
Institute of Food Research, Norwich, U.K. Methods in Molecular Biology Series, Volume 25. Second volume completing a practical aid for nucleic acid sequence researchers who use computers to acquire, store, or analyze their data. Plastic comb binding. 15 contributors, 5 U.S.
<span>This thorough book collects methods and strategies to analyze proteomics data. It is intended to describe how data obtained by gel-based or gel-free proteomics approaches can be inspected, organized, and interpreted to extrapolate biological information. Organized into four sections, the volum
<p><p>The new genetic revolution is fuelled by Deep Sequencing (or Next Generation Sequencing) apparatuses which, in essence, read billions of nucleotides per reaction. Effectively, when carefully planned, any experimental question which can be translated into reading nucleic acids can be applied.In
In this new volume, renowned authors contribute fascinating, cutting-edge insights into microarray data analysis. Information on an array of topics is included in this innovative book including in-depth insights into presentations of genomic signal processing. Also detailed is the use of tiling arra
<p><span>This volume details a comprehensive set of methods and tools for Hi-C data processing, analysis, and interpretation. Chapters cover applications of Hi-C to address a variety of biological problems, with a specific focus on state-of-the-art computational procedures adopted for the data analy