<p><span>Development of high-throughput technologies in molecular biology during the last two decades has contributed to the production of tremendous amounts of data. Microarray and RNA sequencing are two such widely used high-throughput technologies for simultaneously monitoring the expression patt
Gene Expression Data Analysis: A Statistical and Machine Learning Perspective
โ Scribed by Pankaj Barah, Dhruba Kumar Bhattacharyya, Jugal Kumar Kalita
- Publisher
- CRC Press
- Year
- 2021
- Tongue
- English
- Leaves
- 379
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
Development of high throughput technologies in molecular biology during the last two decades has contributed to the production of tremendous amounts of data. Microarray and RNA-sequencing are two such widely used high throughput technologies for monitoring the expression patterns of thousands of genes simultaneously. Data produced from such experiments are voluminous (both in dimensionality and numbers of instances) and evolving in nature. Analysis of huge amounts of data towards the identification of interesting patterns that are relevant for a given biological question requires high performance computational infrastructure as well as efficient machine learning algorithms. Cross-communication of ideas between biologists and computer scientists remains a big challenge.
Gene Expression Data Analysis: A Statistical and Machine Learning Perspective has been written keeping a multi-disciplinary audience in mind. The book discusses gene expression data analysis from molecular biology, machine learning and statistical perspectives. Readers will be able to acquire both theoretical as well as practical knowledge of methods for identification of novel patterns of high biological significance. To measure the effectiveness of such algorithms, we discuss statistical and biological performance metrics that can be used in real life or in a simulated environment. This book discusses a large number of benchmark algorithms, tools, systems and repositories that are commonly used in analyzing gene expression data and validating results.This book will benefit students, researchers and practitioners in biology, medicine, and computer science by enabling them to acquire in-depth knowledge in statistical and machine learning based methods for analyzing gene expression data.
Key features:
- An introduction to the Central Dogma of molecular biology and information flow in biological systems.
- A systematic overview of the methods for generating gene expression data.
- Background knowledge on statistical modeling and machine learning techniques.
- Detailed methodology of analyzing gene expression data with an example case study.
- Clustering methods for finding co-expression patterns from microarray, bulkRNA and scRNA data.
- A large number of practical tools, systems and repositories that are useful for computational biologists to create, analyze and validate biologically relevant gene expression patterns.
- Suitable for multi-disciplinary researchers and practitioners in computer science and biological sciences.
โฆ Table of Contents
Cover
Half Title
Title Page
Copyright Page
Dedication
Contents
Acknowledgements
Authors
Preface
1. Introduction
1.1. Introduction
1.2. Central Dogma
1.3. Measuring Gene Expression
1.4. Representation of Gene Expression Data
1.5. Gene Expression Data Analysis: Applications
1.6. Machine Learning
1.7. Statistical and Biological Evaluation
1.8. Gene Expression Analysis Approaches
1.8.1. Preprocessing in Microarray and RNAseq Data
1.8.2. Co-Expressed Pattern-Finding Using Machine Learning
1.8.3. Co-Expressed Pattern-Finding Using Network-Based Approaches
1.9. Differential Co-Expression Analysis
1.10. Differential Expression Analysis
1.11. Tools and Systems for Gene Expression Data Analysis
1.11.1. (Diff) Co-Expression Analysis Tools and Systems
1.11.2. Differential Expression Analysis Tools and Systems
1.12. Contribution of This Book
1.13. Organization of This Book
2. Information Flow in Biological Systems
2.1. Concept of Systems Theory
2.1.1. A Brief History of Systems Thinking
2.1.2. Areas of Application of Systems Theory in Biology
2.2. Complexity in Biological Systems
2.2.1. Hierarchical Organization of Biological Systems from Macroscopic Levels to Microscopic Levels
2.2.2. Information Flow in Biological Systems
2.2.3. Top-Down and Bottom-Up Flow
2.3. Central Dogma of Molecular Biology
2.3.1. DNA Replication
2.3.2. Transcription
2.3.3. Translation
2.4. Ambiguity in Central Dogma
2.4.1. Reverse Transcription
2.4.2. RNA Replication
2.5. Discussion
2.5.1. Biological Information Flow from a Computer Science Perspective
2.5.2. Future Perspective
3. Gene Expression Data Generation
3.1. History of Gene Expression Data Generation
3.2. Low-Throughput Methods
3.2.1. Northern Blotting
3.2.2. Ribonuclease Protection Assay
3.2.3. qRT-PCR
3.2.4. SAGE
3.3. High-Throughput Methods
3.3.1. Microarray
3.3.2. RNA-Seq
3.3.3. Types of RNA-Seq
3.3.4. Gene Expression Data Repositories
3.3.5. Standards in Gene Expression Data
3.4. Chapter Summary
4. Statistical Foundations and Machine Learning
4.1. Introduction
4.2. Statistical Background
4.2.1. Statistical Modeling
4.2.2. Probability Distributions
4.2.3. Hypothesis Testing
4.2.4. Exact Tests
4.2.5. Common Data Distributions
4.2.6. Multiple Testing
4.2.7. False Discovery Rate
4.2.8. Maximum Likelihood Estimation
4.3. Machine Learning Background
4.3.1. Significance of Machine Learning
4.3.2. Machine Learning and Its Types
4.3.3. Supervised Learning Methods
4.3.4. Unsupervised Learning Methods
4.3.5. Outlier Mining
4.3.6. Association Rule Mining
4.4. Chapter Summary
4.4.1. Statistical Modeling
4.4.2. Supervised Learning: Classification and Regression Analysis
4.4.3. Proximity Measures
4.4.4. Unsupervised Learning: Clustering
4.4.5. Unsupervised Learning: Biclustering
4.4.6. Unsupervised Learning: Triclustering
4.4.7. Outlier Mining
4.4.8. Unsupervised Learning: Association Mining
5. Co-Expression Analysis
5.1. Introduction
5.2. Gene Co-Expression Analysis
5.2.1. Types of Gene Co-Expression
5.2.2. An Example
5.3. Measures to Identify Co-Expressed Patterns
5.4. Co-Expression Analysis Using Clustering
5.4.1. CEA Using Clustering: A Generic Architecture
5.4.2. Co-Expressed Pattern Finding Using 1-Way Clustering
5.4.3. Subspace or 2-way Clustering in Co-Expression Mining
5.4.4. Co-Expressed Pattern-Finding Using 3-Way Clustering
5.5. Network Analysis for Co-Expressed Pattern-Finding
5.5.1. Definition of CEN
5.5.2. Analyzing CENs: A Generic Architecture
5.6. Chapter Summary and Recommendations
6. Differential Expression Analysis
6.1. Introduction
6.1.1. Importance of DE Analysis
6.2. Differential Expression (DE) of a Gene
6.2.1. Differential Expression of a Gene: An Example
6.3. Differential Expression Analysis (DEA)
6.3.1. A Generic Framework
6.3.2. Preprocessing
6.3.3. DE Genes Identification
6.3.4. DE Gene Analysis
6.3.5. Statistical Validation
6.3.6. Discussion
6.4. Biomarker Identification Using DEA: A Case Study
6.4.1. Problem Definition
6.4.2. Dataset Used
6.4.3. Preprocessing
6.4.4. Framework of Analysis Used
6.4.5. Results
6.4.6. Discussion
6.5. Summary and Recommendations
7. Tools and Systems
7.1. Introduction
7.1.1. Generic Characteristics of a Systems Biology Tool
7.1.2. Target Systems Biology Activities
7.2. Systems Biology Tools
7.2.1. A Taxonomy
7.2.2. Pre-Processing Tools
7.3. Gene Expression Data Analysis Tools
7.3.1. Co-Expression Analysis
7.3.2. Differential Co-Expression Analysis
7.3.3. Differential Expression Analysis
7.4. Visualization
7.5. Validation
7.5.1. Statistical Validation
7.6. Biological Validation
7.7. Chapter Summary and Concluding Remarks
8. Concluding Remarks and Research Challenges
8.1. Concluding Remarks
8.2. Some Issues and Research Challenges
Bibliography
Glossary
Index
๐ SIMILAR VOLUMES
Statistical Analysis of Gene Expression Microarray Data promises to become the definitive basic reference in the field. Under the editorship of Terry Speed, some of the world's most pre-eminent authorities have joined forces to present the tools, features, and problems associated with the analysis o
<p>Carry out a variety of advanced statistical analyses including generalized additive models, mixed effects models, multiple imputation, machine learning, and missing data techniques using R. Each chapter starts with conceptual background information about the techniques, includes multiple examples
<div>Carry out a variety of advanced statistical analyses including generalized additive models, mixed effects models, multiple imputation, machine learning, and missing data techniques using R. Each chapter starts with conceptual background information about the techniques, includes multiple exampl
Data analysis and machine learning are research areas at the intersection of computer science, artificial intelligence, mathematics and statistics. They cover general methods and techniques that can be applied to a vast set of applications such as web and text mining, marketing, medical science, bio
Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data (Princeton Series in Modern Observational Astronomy)<br><br>As telescopes, detectors, and computers grow ever more powerful, the volume of data at the disposal of astronomers and astr