<p><span>This book contains the latest material in the subject, covering next generation sequencing (NGS) applications and meeting the requirements of a complete semester course. This book digs deep into analysis, providing both concept and practice to satisfy the exact need of researchers seeking t
Introduction to Bioinformatics with R: A Practical Guide for Biologists (Chapman & Hall/CRC Computational Biology Series)
β Scribed by Edward Curry
- Publisher
- Chapman and Hall/CRC
- Year
- 2020
- Tongue
- English
- Leaves
- 311
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
In biological research, the amount of data available to researchers has increased so much over recent years, it is becoming increasingly difficult to understand the current state of the art without some experience and understanding of data analytics and bioinformatics. An Introduction to Bioinformatics with R: A Practical Guide for Biologists leads the reader through the basics of computational analysis of data encountered in modern biological research. With no previous experience with statistics or programming required, readers will develop the ability to plan suitable analyses of biological datasets, and to use the R programming environment to perform these analyses. This is achieved through a series of case studies using R to answer research questions using molecular biology datasets. Broadly applicable statistical methods are explained, including linear and rank-based correlation, distance metrics and hierarchical clustering, hypothesis testing using linear regression, proportional hazards regression for survival data, and principal component analysis. These methods are then applied as appropriate throughout the case studies, illustrating how they can be used to answer research questions.
Key Features:
Β·Β Β Β Β Β Β Β Β Provides a practical course in computational data analysis suitable for students or researchers with no previous exposure to computer programming.
Β·Β Β Β Β Β Β Β Β Describes in detail the theoretical basis for statistical analysis techniques used throughout the textbook, from basic principles
Β·Β Β Β Β Β Β Β Β Presents walk-throughs of data analysis tasks using R and example datasets. All R commands are presented and explained in order to enable the reader to carry out these tasks themselves.
Β·Β Β Β Β Β Β Β Β Uses outputs from a large range of molecular biology platforms including DNA methylation and genotyping microarrays; RNA-seq, genome sequencing, ChIP-seq and bisulphite sequencing; and high-throughput phenotypic screens.
Β·Β Β Β Β Β Β Β Β Gives worked-out examples geared towards problems encountered in cancer research, which can also be applied across many areas of molecular biology and medical research.
This book has been developed over years of training biological scientists and clinicians to analyse the large datasets available in their cancer research projects. It is appropriate for use as a textbook or as a practical book for biological scientists looking to gain bioinformatics skills.
β¦ Table of Contents
Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
Acknowledgements
1. Introduction
1.1 Why informatics is important for biologists
1.2 How to use this book
2. Introduction to R
2.1 Obtaining R
2.1.1 Downloading R
2.1.2 Installing R
2.2 R console
2.2.1 Starting the R console
2.3 The R workspace
2.3.1 Creating/deleting objects
2.3.2 The working directory
2.4 Data handling
2.4.1 Basic data types
2.4.2 Vectors
2.4.3 Arrays
2.4.4 Lists
2.4.5 Data frames
2.4.6 Data input/output
2.5 More advanced concepts: Scripts and functions
2.5.1 Simple scripts
2.5.2 Functions
2.5.3 Using `apply'
2.5.3.1 apply
2.5.3.2 sapply
2.5.3.3 lapply
2.5.3.4 mapply
2.6 Plots
2.6.1 Simple scatterplot
2.6.2 Arguments of plot()
2.6.3 Multiple plots on one graph
2.6.4 Scatterplots of multiple variables
2.6.5 Box plots
2.6.6 Saving images to file
2.7 More advanced graphics with ggplot2
2.8 Using R help
3. An Introduction to LINUX for Biological Research
3.1 UNIX
3.2 Linux survival guide
3.3 Useful dependencies and programs
4. Statistical Methods for Data Analysis
4.1 What are statistical methods, and why do we use them in biological research?
4.1.1 A worked example
4.1.2 A brief summary
4.2 What do I need to understand statistics?
4.2.1 Probability
4.2.1.1 Random variables
4.2.1.2 Probability distributions
4.2.1.3 Hypothesis testing
4.2.2 Linear algebra
4.2.3 Summary
4.3 Normalization: Removing technical variation
4.3.1 Centering and scaling
4.3.2 An illustrative example
4.3.3 Quantile normalization
4.3.4 Batch effects
4.4 Correlation
4.4.1 Pearson correlation coefficient
4.4.2 Spearman's rank correlation
4.4.3 Examples
4.5 Clustering
4.5.1 Clustering illustration using R
4.6 Linear regression models
4.6.1 Limma
4.6.1.1 Installing limma
4.6.1.2 Categorical explanatory variables
4.6.1.3 Continuous explanatory variables
4.7 Multiple hypothesis testing
4.8 Survival analysis
4.8.1 Kaplan-Meier plots
4.8.2 Cox proportional hazards regression models
4.9 Projection methods
4.9.1 PCA
4.9.2 PLS
4.10 Resampling: Permutation tests and the bootstrap
4.11 Stability and robustness
4.12 Summary
5. Analyzing Generic Tabular Numeric Datasets in R
5.1 Introduction
5.2 Loading data into R
5.3 Data visualisation
5.3.1 Scatter plots
5.3.2 Box plots
5.3.3 Bar charts
5.4 Correlation and clustering
5.4.1 Correlation
5.4.2 Clustering
5.4.3 Heatmaps
5.5 Statistical analysis using linear models
5.5.1 Comparison of two groups
5.5.2 Alternative models
5.6 Summary
6. Functional Enrichment Analysis
6.1 Introduction
6.2 Loading gene sets into R
6.3 Over-representation
6.3.1 Online tools
6.3.2 Testing gene sets in R
6.4 Systematic enrichment
6.4.1 Online tools
6.4.2 Testing gene sets in R
6.5 Summary
7. Integrating Multiple Datasets in R
7.1 Introduction
7.2 Data import
7.3 Exploratory data analysis
7.4 Integrating multiple datasets
7.4.1 Survival analysis
7.5 Multiple molecular endpoints
7.6 Summary
8. Analyzing Microarray Data in R
8.1 Bioconductor
8.2 Accessing microarray data from GEO
8.3 Single-channel array analysis
8.4 Loading data
8.5 Data visualisation
8.5.1 Image plots
8.5.2 MA plots
8.5.3 Scatterplots
8.5.4 Box plots
8.6 Normalizing data
8.7 Differential expression (linear models)
8.7.1 Design matrix
8.7.2 Fitting linear models
8.7.3 Making use of the results
8.7.4 Postscript: Assumptions
8.8 Clustering and correlation
8.8.1 Expression profiles
8.8.2 Correlation
8.9 Clustering
8.9.1 Filtering
8.10 Survival analysis
8.10.1 Kaplan-Meier plots
8.10.2 Cox proportional hazards regression
8.11 Footnote: Correlation to explore associated functions
9. Analyzing DNA Methylation Microarray Data in R
9.1 Introduction
9.2 Importing raw data
9.3 Quality control
9.4 Normalization and estimating methylation level
9.5 Analyzing beta values
9.6 Using previously preprocessed data
9.7 Further analyses using minfi
10. DNA Analysis with Microarrays
10.1 Introduction
10.2 Genotyping
10.2.1 Normalization
10.2.2 Genotype calling
10.2.3 Downstream analysis: Genome-wide association tests
10.3 Copy number analysis
10.3.1 Normalization
10.3.2 Copy number estimation
10.3.3 Segmentation
10.3.3.1 Hidden Markov model
10.3.3.2 Circular binary segmentation
10.3.4 Downstream analysis
10.3.4.1 Mapping CNA data to genes
10.3.4.2 Finding frequently-mutated genes
10.4 Summary
11. Working with Sequencing Data
11.1 Introduction
11.2 Sequence data analysis tasks
11.3 Quality control
11.3.1 Base call quality filtering
11.3.2 Adapter trimming
11.4 Alignment
11.4.1 Bowtie
11.4.2 BWA
11.4.3 Post-alignment filtering
11.4.4 Removing duplicate reads
11.5 Obtaining sequencing data from the SRA
12. Genomic Sequence Profiling
12.1 Introduction
12.2 SNV: Single nucleotide variants
12.3 Variant filtering and annotation
12.4 Indels: Short insertions and deletions
12.5 SV: Structural variants
12.6 Making use of variant calls
12.7 Summary
13. ChIP-seq
13.1 Introduction
13.2 Cross-correlation
13.3 Filtering blacklisted reads
13.4 Peak calling
13.5 Peak annotation
13.6 Quantitative comparisons of ChIP-seq libraries
13.7 Summary
14. RNA-seq
14.1 Introduction
14.2 Obtaining RNA-seq data from GEO
14.3 Transcript quantification via pseudoalignment
14.3.1 Building a transcript index
14.3.2 Quantifying transcripts using reads
14.3.3 Downstream analysis
14.4 Analysis with transcriptome assembly
14.4.1 Building the transcriptome directly
14.4.2 Transcript quantification
14.4.3 Downstream analysis
14.5 Summary
15. Bisulphite Sequencing
15.1 Introduction
15.2 Alignment and methylation calls
15.3 Downstream analysis
15.4 Summary
16. Final Notes
Index
π SIMILAR VOLUMES
<p>In biological research, the amount of data available to researchers has increased so much over recent years, it is becoming increasingly difficult to understand the current state of the art without some experience and understanding of data analytics and bioinformatics<em>. </em><em>An Introductio
Programming knowledge is often necessary for finding a solution to a biological problem. Based on the authorβs experience working for an agricultural biotechnology company, Python for Bioinformatics helps scientists solve their biological problems by helping them understand the basics of programming
<span><p>Metabolomics is the scientific study of the chemical processes in a living system, environment and nutrition. It is a relatively new omics science, but the potential applications are wide, including medicine, personalized medicine and intervention studies, food and nutrition, plants, agricu
The computational methods of bioinformatics are being used more and more to process the large volume of current biological data. Promoting an understanding of the underlying biology that produces this data, Pattern Discovery in Bioinformatics: Theory and Algorithms provides the tools to study regula