𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

From Sequences to Graphs: Discrete Methods and Structures for Bioinformatics

✍ Scribed by Annie Chateau, Mikael Salson


Publisher
Wiley-ISTE
Year
2022
Tongue
English
Leaves
268
Series
Computer Science: Bioinformatics
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


In order to study living organisms, scientists not only study them at an overall macroscopic scale but also on a more detailed microscopic scale. This observation, pushed to its limits, consists of investigating the very center of each cell, where we find the molecules that determine the way it functions: DNA (deoxyribonucleic acid) and RNA (ribonucleic acid).

In an organism, DNA carries the genetic information, which is called the genome. It is represented as four-letter sequences using the letters A, C, G and T; based on these sequences, computer methods described in this book can answer fundamental questions in bioinformatics.

This book explores how to quickly find sequences of a few hundred nucleotides within a genome that may be made up of several billion, how to compare those sequences and how to reconstruct the complete sequence of a genome. It also discusses the problems of identifying bacteria in a given environment and predicting the structure of RNA based on its sequence.

✦ Table of Contents


Cover
Half-Title Page
Title Page
Copyright Page
Contents
Preface
Author Biographies
Chapter 1. Methodological Concepts: Algorithmic Solutions of Bioinformatics Problems
1.1. Data, models, problem formalism in bioinformatics
1.1.1. Data
1.1.2. Genome modeling
1.1.3. Problems in bioinformatics
1.2. Mathematical preliminaries
1.2.1. Propositional logic preliminaries
1.2.2. Preliminaries on sets
1.3. Vocabulary in text algorithmics
1.4. Graph theory
1.4.1. Subgraphs
1.4.2. Path in a graph
1.4.3. Matching
1.4.4. Planarity
1.4.5. Tree decomposition
1.5. Algorithmic problems
1.5.1. Definition
1.5.2. Graph problem
1.5.3. Satisfiability problems
1.6. Problem solutions
1.6.1. Algorithm
1.6.2. Complexity
1.6.3. Runtime
1.7. Complexity classes
1.7.1. Generality
1.7.2. Exact algorithms
1.7.3. Approximation algorithms
1.7.4. Solvers
1.8. Some algorithmic techniques
1.8.1. Dynamic programming
1.8.2. Tree traversal
1.9. Validation
1.9.1. The different types of errors
1.9.2. Quality measures
1.9.3. And in the non-binary case?
1.10. Conclusion
1.11. References
Chapter 2. Sequence Indexing
2.1. Introduction
2.1.1. What is indexing?
2.1.2. When to index?
2.1.3. What to index?
2.1.4. Indexing structures and queries considered
2.1.5. Basic notions and vocabulary
2.2. Word indexing
2.2.1. Bloom filters
2.2.2. Inverted list
2.2.3. De Bruijn graphs
2.2.4. Efficient structures for targeted queries
2.3. Full-text indexing
2.3.1. Suffix tree
2.3.2. (Extended) suffix array
2.3.3. Burrows–Wheeler transform
2.4. Indexing choice criteria
2.4.1. Based on the type of the necessary query
2.4.2. Based on the space-time and data quantity trade-off
2.4.3. Based on the need to add or modify indexed data
2.4.4. Indexing choices according to applications
2.5. Conclusion and perspectives
2.5.1. Efficient methods for indexing a few genomes or sequencing sets
2.5.2. Methods that struggle to take advantage of data redundancy
2.6. References
3. Sequence Alignment
3.1. Introduction
3.1.1. What is pairwise alignment?
3.1.2. How to evaluate an alignment?
3.2. Exact alignment
3.2.1. Representation in edit graph form
3.2.2. Global alignment and Needleman–Wunsch algorithm
3.2.3. Local alignment and Smith–Waterman algorithm
3.2.4. Alignment with affine indel function and the Gotoh
3.3. Heuristic alignment
3.3.1. Seeds
3.3.2. Min-hash and global sampling
3.3.3. Minimizing and local sampling
3.4. References
Chapter 4. Genome Assembly
4.1. Introduction
4.2. Sequencing technologies
4.2.1. Short reads
4.2.2. Long reads
4.2.3. Linked reads
4.2.4. Hi-C reads
4.2.5. Optical mapping
4.3. Assembly strategies
4.3.1. The main steps
4.3.2. Cleaning and correction of reads
4.3.3. Scaffold construction
4.3.4. Scaffold ordering
4.4. Scaffold construction methods
4.4.1. Greedy assembly
4.4.2. OLC assembly
4.4.3. DBG assembly
4.4.4. Constrained assembly
4.5. Scaffold-ordering methods
4.5.1. Hi-C data-based methods
4.5.2. Optical mapping-based methods
4.6. Assembly validation
4.6.1. Metrics
4.6.2. Read realignment
4.6.3. Gene prediction
4.6.4. Competitions
4.7. Conclusion
4.8. References
Chapter 5. Metagenomics and Metatranscriptomics
5.1. What is metagenomics?
5.1.1. Motivations and historical context
5.1.2. The metagenomic data
5.1.3. Bioinformatics challenges for metagenomics
5.2. β€œWho are they”: taxonomic characterization of microbial communities
5.2.1. Methods for targeted metagenomics
5.2.2. Whole-genome methods with reference
5.2.3. Reference-free methods
5.3. β€œWhat are they able to do?”: functional metagenomics
5.3.1. Gene prediction and annotation
5.3.2. Metatranscriptomics
5.3.3. Reconstruction of metabolic networks
5.4. Comparative metagenomics
5.4.1. Comparative metagenomics with diversity estimation
5.4.2. De novo comparative metagenomics
5.5. Conclusion
5.6. References
Chapter 6. RNA Folding
6.1. Introduction
6.1.1. RNA folding
6.1.2. Secondary structure
6.2. Optimization for structure prediction
6.2.1. Computing the minimum free-energy (MFE) structure
6.2.2. Listing (sub)optimal structures
6.2.3. Comparative prediction: simultaneous alignment/folding of RNAs
6.2.4. Joint alignment/folding model
6.3. Analyzing the Boltzmann ensemble
6.3.1. Computing the partition function
6.3.2. Statistical sampling
6.3.3. Boltzmann probability of structural patterns
6.4. Studying RNA structure in practice
6.4.1. The Turner model
6.4.2. Tools
6.5. References
Conclusion
List of Authors
Index
EULA


πŸ“œ SIMILAR VOLUMES


Bioinformatics Methods: From Omics to Ne
✍ Shili Lin, Denise Scholtens, Sujay Datta πŸ“‚ Library πŸ“… 2022 πŸ› CRC Press/Chapman & Hall 🌐 English

The past three decades have witnessed an explosion of what is now referred to as high-dimensional `omics' data. Bioinformatics Methods: From Omics to Next Generation Sequencing describes the statistical methods and analytic frameworks that are best equipped to interpret these complex data and how th

RNA Sequence, Structure, and Function: C
✍ Jan Gorodkin, Walter L. Ruzzo (eds.) πŸ“‚ Library πŸ“… 2014 πŸ› Humana Press 🌐 English

<p><p>The existence of genes for RNA molecules not coding for proteins (ncRNAs) has been recognized since the 1950's, but until recently, aside from the critically important ribosomal and transfer RNA genes, most focus has been on protein coding genes. However, a long series of striking discoveries,

RNA Sequence, Structure, and Function: C
✍ Jan Gorodkin, Walter L. Ruzzo (eds.) πŸ“‚ Library πŸ“… 2014 πŸ› Humana Press 🌐 English

<p><p>The existence of genes for RNA molecules not coding for proteins (ncRNAs) has been recognized since the 1950's, but until recently, aside from the critically important ribosomal and transfer RNA genes, most focus has been on protein coding genes. However, a long series of striking discoveries,

Bioinformatics: Sequence, structure, and
✍ Higgins, Des; Taylor, Willie (eds.) πŸ“‚ Library πŸ“… 2001 πŸ› Oxford University Press 🌐 English

This title covers important topics in the analysis of protein sequences and structures. It includes comparing amino acid sequences to structures comparing structures to each other, searching information on entire protein families as well as searching with single sequences, how to use the Internet an

Bioinformatics: Sequence, Structure and
✍ D. Higgins, Willie Taylor πŸ“‚ Library πŸ“… 2000 πŸ› Oxford University Press, USA 🌐 English

Bioinformatics covers practical important topics in the analysis of protein sequences and structures. It includes comparing amino acid sequences to structures comparing structures to each other, searching information on entire protein families as well as searching with single sequences, how to use t