From Sequences to Graphs: Discrete Methods and Structures for Bioinformatics

✍ Scribed by Annie Chateau, Mikael Salson

Publisher: Wiley-ISTE
Year: 2022
Tongue: English
Leaves: 268
Series: Computer Science: Bioinformatics
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

In order to study living organisms, scientists not only study them at an overall macroscopic scale but also on a more detailed microscopic scale. This observation, pushed to its limits, consists of investigating the very center of each cell, where we find the molecules that determine the way it functions: DNA (deoxyribonucleic acid) and RNA (ribonucleic acid).

In an organism, DNA carries the genetic information, which is called the genome. It is represented as four-letter sequences using the letters A, C, G and T; based on these sequences, computer methods described in this book can answer fundamental questions in bioinformatics.

This book explores how to quickly find sequences of a few hundred nucleotides within a genome that may be made up of several billion, how to compare those sequences and how to reconstruct the complete sequence of a genome. It also discusses the problems of identifying bacteria in a given environment and predicting the structure of RNA based on its sequence.

✦ Table of Contents

Cover
Half-Title Page
Title Page
Copyright Page
Contents
Preface
Author Biographies
Chapter 1. Methodological Concepts: Algorithmic Solutions of Bioinformatics Problems
1.1. Data, models, problem formalism in bioinformatics
1.1.1. Data
1.1.2. Genome modeling
1.1.3. Problems in bioinformatics
1.2. Mathematical preliminaries
1.2.1. Propositional logic preliminaries
1.2.2. Preliminaries on sets
1.3. Vocabulary in text algorithmics
1.4. Graph theory
1.4.1. Subgraphs
1.4.2. Path in a graph
1.4.3. Matching
1.4.4. Planarity
1.4.5. Tree decomposition
1.5. Algorithmic problems
1.5.1. Definition
1.5.2. Graph problem
1.5.3. Satisfiability problems
1.6. Problem solutions
1.6.1. Algorithm
1.6.2. Complexity
1.6.3. Runtime
1.7. Complexity classes
1.7.1. Generality
1.7.2. Exact algorithms
1.7.3. Approximation algorithms
1.7.4. Solvers
1.8. Some algorithmic techniques
1.8.1. Dynamic programming
1.8.2. Tree traversal
1.9. Validation
1.9.1. The different types of errors
1.9.2. Quality measures
1.9.3. And in the non-binary case?
1.10. Conclusion
1.11. References
Chapter 2. Sequence Indexing
2.1. Introduction
2.1.1. What is indexing?
2.1.2. When to index?
2.1.3. What to index?
2.1.4. Indexing structures and queries considered
2.1.5. Basic notions and vocabulary
2.2. Word indexing
2.2.1. Bloom filters
2.2.2. Inverted list
2.2.3. De Bruijn graphs
2.2.4. Efficient structures for targeted queries
2.3. Full-text indexing
2.3.1. Suffix tree
2.3.2. (Extended) suffix array
2.3.3. Burrows–Wheeler transform
2.4. Indexing choice criteria
2.4.1. Based on the type of the necessary query
2.4.2. Based on the space-time and data quantity trade-off
2.4.3. Based on the need to add or modify indexed data
2.4.4. Indexing choices according to applications
2.5. Conclusion and perspectives
2.5.1. Efficient methods for indexing a few genomes or sequencing sets
2.5.2. Methods that struggle to take advantage of data redundancy
2.6. References
3. Sequence Alignment
3.1. Introduction
3.1.1. What is pairwise alignment?
3.1.2. How to evaluate an alignment?
3.2. Exact alignment
3.2.1. Representation in edit graph form
3.2.2. Global alignment and Needleman–Wunsch algorithm
3.2.3. Local alignment and Smith–Waterman algorithm
3.2.4. Alignment with affine indel function and the Gotoh
3.3. Heuristic alignment
3.3.1. Seeds
3.3.2. Min-hash and global sampling
3.3.3. Minimizing and local sampling
3.4. References
Chapter 4. Genome Assembly
4.1. Introduction
4.2. Sequencing technologies
4.2.1. Short reads
4.2.2. Long reads
4.2.3. Linked reads
4.2.4. Hi-C reads
4.2.5. Optical mapping
4.3. Assembly strategies
4.3.1. The main steps
4.3.2. Cleaning and correction of reads
4.3.3. Scaffold construction
4.3.4. Scaffold ordering
4.4. Scaffold construction methods
4.4.1. Greedy assembly
4.4.2. OLC assembly
4.4.3. DBG assembly
4.4.4. Constrained assembly
4.5. Scaffold-ordering methods
4.5.1. Hi-C data-based methods
4.5.2. Optical mapping-based methods
4.6. Assembly validation
4.6.1. Metrics
4.6.2. Read realignment
4.6.3. Gene prediction
4.6.4. Competitions
4.7. Conclusion
4.8. References
Chapter 5. Metagenomics and Metatranscriptomics
5.1. What is metagenomics?
5.1.1. Motivations and historical context
5.1.2. The metagenomic data
5.1.3. Bioinformatics challenges for metagenomics
5.2. “Who are they”: taxonomic characterization of microbial communities
5.2.1. Methods for targeted metagenomics
5.2.2. Whole-genome methods with reference
5.2.3. Reference-free methods
5.3. “What are they able to do?”: functional metagenomics
5.3.1. Gene prediction and annotation
5.3.2. Metatranscriptomics
5.3.3. Reconstruction of metabolic networks
5.4. Comparative metagenomics
5.4.1. Comparative metagenomics with diversity estimation
5.4.2. De novo comparative metagenomics
5.5. Conclusion
5.6. References
Chapter 6. RNA Folding
6.1. Introduction
6.1.1. RNA folding
6.1.2. Secondary structure
6.2. Optimization for structure prediction
6.2.1. Computing the minimum free-energy (MFE) structure
6.2.2. Listing (sub)optimal structures
6.2.3. Comparative prediction: simultaneous alignment/folding of RNAs
6.2.4. Joint alignment/folding model
6.3. Analyzing the Boltzmann ensemble
6.3.1. Computing the partition function
6.3.2. Statistical sampling
6.3.3. Boltzmann probability of structural patterns
6.4. Studying RNA structure in practice
6.4.1. The Turner model
6.4.2. Tools
6.5. References
Conclusion
List of Authors
Index
EULA

📜 SIMILAR VOLUMES

Bioinformatics Methods: From Omics to Ne

📁 Bioinformatics Methods: From Omics to Next Generation Sequencing

✍ Shili Lin, Denise Scholtens, Sujay Datta 📂 Library 📅 2022 🏛 CRC Press/Chapman & Hall 🌐 English

The past three decades have witnessed an explosion of what is now referred to as high-dimensional `omics' data. Bioinformatics Methods: From Omics to Next Generation Sequencing describes the statistical methods and analytic frameworks that are best equipped to interpret these complex data and how th

RNA Sequence, Structure, and Function: C

📁 RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods

✍ Jan Gorodkin, Walter L. Ruzzo (eds.) 📂 Library 📅 2014 🏛 Humana Press 🌐 English

<p><p>The existence of genes for RNA molecules not coding for proteins (ncRNAs) has been recognized since the 1950's, but until recently, aside from the critically important ribosomal and transfer RNA genes, most focus has been on protein coding genes. However, a long series of striking discoveries,

RNA Sequence, Structure, and Function: C

📁 RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods

✍ Jan Gorodkin, Walter L. Ruzzo (eds.) 📂 Library 📅 2014 🏛 Humana Press 🌐 English

DNA Sequencing: From Experimental Method

📁 DNA Sequencing: From Experimental Methods to Bioinformatics (Introduction to Biotechniques Series)

✍ Luke Alphey 📂 Library 📅 1997 🌐 English

Univ. of Manchester, U.K. Practical, strategic guide to the process of DNA sequencing. Covers planning the approach, data acquisition, and extracting findings from the data. Softcover.

Bioinformatics: Sequence, structure, and

📁 Bioinformatics: Sequence, structure, and databanks

✍ Higgins, Des; Taylor, Willie (eds.) 📂 Library 📅 2001 🏛 Oxford University Press 🌐 English

This title covers important topics in the analysis of protein sequences and structures. It includes comparing amino acid sequences to structures comparing structures to each other, searching information on entire protein families as well as searching with single sequences, how to use the Internet an

Bioinformatics: Sequence, Structure and

📁 Bioinformatics: Sequence, Structure and Databanks

✍ D. Higgins, Willie Taylor 📂 Library 📅 2000 🏛 Oxford University Press, USA 🌐 English

Bioinformatics covers practical important topics in the analysis of protein sequences and structures. It includes comparing amino acid sequences to structures comparing structures to each other, searching information on entire protein families as well as searching with single sequences, how to use t