Bioinformatics Volume I: Data, Sequence Analysis, and Evolution

✍ Scribed by Keith, Jonathan M(Editor)

Publisher: Springer New York : Imprint: Humana Press
Year: 2016;2017
Tongue: English
Leaves: 490
Series: Methods in molecular biology 1525
Edition: 2nd ed. 2017
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This second edition provides updated and expanded chapters covering a broad sampling of useful and current methods in the rapidly developing and expanding field of bioinformatics. Bioinformatics, Volume I: Data, Sequence Analysis, and Evolution, Second Edition is comprised of three sections: Data and Databases, Sequence Analysis, and Phylogenetics and Evolution. The first section details bioinformatics methodologies in the generation of sequence and structural data and its organization into conceptual categories, and databases to facilitate further analyses. The Sequence Analysis section describes the fundamental methodologies for processing the sequences of biological molecules: techniques that are used in almost every pipeline of bioinformatics analysis, particularly in the preliminary stages of such pipelines. Last but not least, the phylogenetics and evolution section deals with methodologies that compare biological sequences for the purpose of understanding how they evolved. As a volume in the highly successful Methods in Molecular Biology series, chapters feature the kind of detail and expert implementation advice to ensure positive results. Comprehensive and practical, Bioinformatics, Volume I: Data, Sequence Analysis, and Evolution, Second Edition is an essential resource for graduate students, early career researchers, and others who are in the process of integrating new bioinformatics methods into their research.;Genome Sequencing -- Sequence Assembly -- A Practical Approach to Protein Crystallography -- Managing Sequence Data -- Genome Annotation -- Working with Ontologies -- The Classification of Protein Domains -- Multiple Sequence Alignment -- Large-Scale Sequence Comparison -- Genomic Database Searching -- Finding Genes in Genome Sequence -- Sequence Segmentation with changeptGUI -- Measuring Natural Selection -- Inferring Trees -- Identifying Optimal Models of Evolution -- Scaling Up the Phylogenetic Detection of Lateral Gene Transfer Events -- Detecting and Analyzing Genetic Recombination Using RDP4 -- Species Tree Estimation from Genome-Wide Data with Guenomu.

✦ Table of Contents

Preface......Page 5
Contents......Page 7
Contributors......Page 9
Part I: Data and Databases......Page 11
1 Introduction......Page 12
1.1 General Procedure for Genome Sequencing......Page 13
1.2 Choosing a Sequencing Strategy......Page 14
2 Materials......Page 16
3 Methods......Page 20
3.1.1 Maxam-Gilbert Method......Page 21
3.1.2 Sanger Sequencing......Page 22
3.2 The Birth of Next-Generation Sequencing (NGS) Techniques......Page 23
3.3.1 454 Pyrosequencing......Page 25
3.3.2 The Illumina (Solexa) Genome Analyzer......Page 26
3.3.3 Applied Biosystems SOLiD Sequencer......Page 27
3.3.4 Polonator......Page 28
3.4.1 The Ion Torrent Sequencing Technology......Page 30
3.4.2 DNA Nanoball Sequencing......Page 31
3.4.3 Pacific Bioscience RS (SMRT Sequencing)......Page 33
3.4.4 Nanopore Sequencing......Page 34
4.3 Evolutionary Biology......Page 35
4.7 Cancer Research......Page 36
6 Notes......Page 37
References......Page 39
1 Introduction......Page 43
2.2 Software......Page 44
3.1 Algorithm......Page 45
3.2 Using PCAP.Solexa......Page 48
3.3 Troubleshooting......Page 51
References......Page 52
1.1 Protein Crystallization......Page 54
1.2.1 Protein Crystals......Page 55
1.2.4 X-ray Detector......Page 56
1.3 Structure Determination......Page 58
1.3.1 Molecular Replacement......Page 59
1.3.2 Rotation Function......Page 60
1.4 Structure Refinement......Page 61
2.3 Molecular Replacement......Page 63
3.1.1 Crystallization Procedure......Page 64
3.1.2 High-Throughput Crystallization......Page 65
3.2 Data Measurement......Page 66
3.3.1 HKL2000......Page 67
3.3.2 XDS......Page 71
3.4 Molecular Replacement......Page 72
3.5 Structure Refinement......Page 76
3.6 Model Building......Page 77
4 Notes......Page 79
References......Page 84
1.1 INSDC......Page 86
1.2 SRA/GEO......Page 87
1.4 BioSample......Page 88
1.5 GenBank......Page 89
1.6 Genomes......Page 90
1.7 Metagenomes and Environmental Sample Sequencing......Page 91
1.8.1 Definition, Accession and Organism......Page 92
1.8.2 Reference Section......Page 93
1.9 Updates and Maintenance of the Database......Page 94
1.10.1 Bad Annotation and Propagation......Page 95
1.11 RefSeq......Page 96
2.1 SRA Submissions......Page 97
2.2 GenBank Submissions......Page 98
3.3 Advanced Entrez Query......Page 100
3.5.1 Putting It All Together with Bioproject......Page 101
3.8 BioProject Linking......Page 102
4.1 The SRA Toolkit......Page 103
4.3 BioProject Download......Page 104
6 Notes......Page 105
References......Page 112
1 Introduction......Page 114
2.1.1 Transcription and Its Regulation......Page 115
2.1.2 Epigenetic Modifications and Chromatin Structures......Page 116
2.1.3 Evolutionary Conservation and Variation......Page 117
2.2.1 Data File Formats......Page 118
2.2.2 Genome Browsers......Page 119
2.2.3 Further Use of Genome Annotations......Page 120
3 Methods......Page 121
3.2 Scenario II: Make and Browse Your Own Annotations......Page 122
4 Notes......Page 125
References......Page 126
1 Introduction......Page 129
1.2 Applications......Page 130
2.2 Encoding Ontologies......Page 131
3.1 Gene Ontology......Page 132
3.1.1 Accessing the Gene Ontology......Page 133
3.2 Biological Pathway Exchange Ontology......Page 134
3.2.2 Accessing Pathway Knowledge via rBiopaxParser......Page 136
4 Notes......Page 138
References......Page 139
1 Introduction......Page 142
2 What is a Protein Domain?......Page 144
3.1 Automatic Domain Sequence Clustering......Page 145
3.3 Families Represented by Multiple Sequence Alignments......Page 147
3.3.2 Profiles......Page 148
3.4 Domain Sequence Classifications......Page 149
3.4.2 Pfam......Page 151
3.5.1 HAMAP......Page 152
4 Classification of Domains from Structure......Page 153
4.1 Identification of Domain Boundaries at the Structural Level......Page 154
4.2 Methods for Structural Comparison......Page 156
4.3 Domain Structure Classification Hierarchies......Page 157
4.4 Structural Domain Classifications......Page 160
4.6 Consistency of Structure Domain Databases......Page 162
5 Domain Family Annotation of Genomes......Page 163
6 Conclusions......Page 165
References......Page 166
Part II: Sequence Analysis......Page 170
1.1 Definition and Implementation of an MSA......Page 171
1.3 Dynamic Programming......Page 172
1.4 The Progressive Alignment Protocol......Page 173
2.1 Selection of Sequences......Page 174
2.2 Unequal Sequence Lengths: Global and Local Alignment......Page 175
3.1 PRALINE......Page 176
3.2 MUSCLE......Page 179
3.3 T-Coffee......Page 180
3.4 MAFFT......Page 182
3.5 ProbCons......Page 183
3.7 MSAProbs......Page 184
3.8 Clustal Omega......Page 185
4 Notes......Page 186
References......Page 191
1.1 Homology, Similarity, and Identity......Page 194
1.2 Substitutions and Indels......Page 195
2.2 Pairwise Alignment and Scoring Matrices......Page 196
2.2.1 PAM Matrices......Page 197
2.2.2 BLOSUM Matrices......Page 198
2.3 Global and Local Alignment......Page 200
2.3.1 Needleman and Wunsch Algorithm: Global Sequence Alignment......Page 201
2.3.2 Smith and Waterman Algorithm: Local Sequence Alignment......Page 204
3.1 BLAST......Page 205
3.1.1 Understanding the BLAST Parameters......Page 207
3.1.2 PSI-BLAST......Page 209
3.2 FASTA......Page 211
3.3.2 BLAT......Page 212
3.3.3 BLASTZ......Page 213
3.3.4 LAGAN......Page 214
3.3.6 AVID......Page 216
3.3.8 WABA......Page 217
4 Notes......Page 218
References......Page 226
Chapter 10: Genomic Database Searching......Page 228
1 Introduction......Page 229
2.1 Reference Genome Sequences......Page 231
3 Genomic Databases, and Approaches to Searching Them......Page 232
4.1 General Features of Genome Browsers......Page 233
4.2.1 Searching Ensembl......Page 235
4.2.2 Navigating and Customizing the Genome Browser......Page 236
4.2.3 Exporting Outputs from Ensembl......Page 237
4.4 University of California Santa Cruz (UCSC) Genome Browser......Page 238
4.4.2 Genome Browser Navigation and Customization......Page 239
4.4.3 Exporting Outputs from the UCSC Genome Browser......Page 240
4.5.1 Searching Map Viewer......Page 241
4.5.2 Genome Browser Navigation and Customization......Page 242
4.6 Viral Genomes......Page 243
4.7 Stand-Alone Genome Browsers......Page 244
5 Genomic Database Searching with RNA Identifiers......Page 245
5.3 Piwi-Interacting RNAs (piRNAs)......Page 246
5.4 Long Noncoding RNAs (lncRNAs)......Page 247
7 Genome Searching Using Chromosomal Coordinates......Page 248
8.1 Sequence-Based Searches......Page 249
8.2 Motif-Based Searches......Page 252
8.3 Matrix-Based Searches......Page 253
9.1 Creating a Hyperlinked ID Table......Page 254
9.2 Creating an Annotated ID Table......Page 255
9.3 Batch Retrieval of Sequences from Multiple Genomic Coordinates......Page 256
10 Genomic Database Searching Using NGS Data......Page 258
11.2 Galaxy......Page 259
12 Genome Searching Using Application Programming Interfaces......Page 260
12.2 Ensembl......Page 261
12.5 Bioconductor......Page 262
13 Conclusions and Perspectives......Page 263
14 Notes......Page 264
References......Page 267
1 Introduction......Page 273
2.1 Gene Finding in Bacteria and Archaea......Page 274
2.2 Gene Finding in Environmental Sequence Samples......Page 279
2.3 Gene Finding in Eukaryotes......Page 281
2.3.2 Gene Prediction and Annotation Pipelines......Page 283
2.3.4 Accuracy Assessment and Refining Annotation......Page 285
2.3.5 Performance Estimates on Nematode Genomes: The nGASP Competition......Page 286
4 Notes......Page 287
References......Page 290
1 Introduction......Page 294
2.1 Software for Sequence Segmentation: changept......Page 296
2.3 MCMC Simulation......Page 297
3.1 The Input Sequence(s)......Page 298
3.1.2 Alignment Encoding......Page 299
3.1.4 Example: Alignment Encoding......Page 300
3.1.5 Parallel Input Sequences......Page 301
3.2.3 Loading the Input Sequence......Page 302
3.2.6 Number of Samples......Page 303
3.2.8 Example: Running changept......Page 304
3.3.1 Information Criteria......Page 306
3.3.4 Example Continued: Model Selection and Segment Class Parameter Estimates......Page 307
3.4.1 Generating Map of Segment Positions......Page 309
3.4.2 The Segmentation Viewer......Page 310
4 Notes......Page 311
References......Page 312
Part III: Phylogenetics and Evolution......Page 314
1 Introduction......Page 315
2 Expected Signatures of Natural Selection in Coding Sequences......Page 316
3 Three Steps to dN/dS......Page 317
3.2 Step 2: Counting Synonymous (S) and Non-synonymous (N) Differences......Page 318
3.3 Step 3: Correcting for Multiple (Latent) Mutational Events to Obtain dN and dS......Page 319
4.1 The Markov Process......Page 321
4.2 The Importance of the Transition/Transversion Ratio......Page 324
4.3 Codon Frequency Model......Page 326
4.4 Amount of Divergence......Page 329
4.5 Estimating dN/dS......Page 332
4.5.1 Sites Models......Page 333
4.5.2 Branch Models......Page 336
4.5.3 Branch-Site Models......Page 339
5 Model Selection......Page 341
6 Assumptions to Keep in Mind......Page 343
7 Final Remarks......Page 344
8 Notes......Page 345
References......Page 346
1 Introduction......Page 348
Box 1: Key Words and Phrases When Inferring Trees......Page 349
2.1 Assumptions About the Sequence Data......Page 350
2.2 The Tree Assumption......Page 351
2.3 Model Assumptions......Page 352
3.1 Scores......Page 353
4 Inferring Trees Using Maximum Likelihood......Page 355
4.1 Proposing an Initial Tree......Page 356
4.2 Refining the Tree Estimate......Page 357
4.5 Other Approaches to Point Estimation......Page 358
4.6 Choosing a Model and Partitioning Data......Page 359
5.1 Measures of General Branch Support and Bootstrapping......Page 360
5.2 Confidence Sets of Trees: The SH and AU Test......Page 361
6 Bayesian Inference of Trees......Page 363
6.1 Bayesian Estimation of Trees......Page 364
6.2 Sampling the Posterior Using Markov Chain Monte Carlo (MCMC)......Page 365
6.3 How Long to Run a Markov Chain Monte Carlo......Page 366
6.4 The Specification of Priors......Page 367
7 Strengths and Weaknesses of Statistical Methods......Page 368
8 Notes......Page 370
References......Page 372
1 Introduction......Page 377
2 Underlying Principles......Page 379
2.1 The Phylogenetic Assumptions......Page 380
2.2 Modeling the Evolutionary Process at a Site in a Sequence......Page 382
2.3 Modeling the Evolutionary Processes at a Site in two Sequences......Page 383
3.2 Signal......Page 384
3.3 Testing the Stationary, Reversible, and Homogeneous Condition......Page 385
3.3.1 Matched-pairs Tests of Homogeneity......Page 386
3.3.2 Matched-pairs Tests of Homogeneity with Two Sequences......Page 387
3.3.3 Matched-pairs Tests of Homogeneity with More than Two Sequences......Page 389
3.3.4 Analysis of Species-rich Alignments......Page 391
3.3.5 Phylogenetic Analysis of Sequences That Have Evolved Under Complex Conditions......Page 396
3.4 Testing the Assumption of Independent and Identical Processes......Page 402
3.5 Choosing a Time-reversible Substitution Model......Page 406
3.6 General Approaches to Model Selection......Page 409
4 Discussion......Page 411
References......Page 412
1 Introduction......Page 419
2.2 Software Programs......Page 421
3.1 Clustering of Sequences into Sets of Putative Homologs and Orthologs......Page 422
3.3 Inference of Trees......Page 423
3.5 Inference of Lateral Transfer Events via Topological Comparisons of Trees......Page 426
4 Notes......Page 427
References......Page 428
1 Introduction......Page 431
2.1 Data Files......Page 432
2.2 Program Settings......Page 433
2.3 Producing a Preliminary Recombination Hypothesis......Page 437
2.4 Making a Recombination-Free Dataset......Page 438
2.5 Navigating Through the Analysis Results......Page 439
2.6 Checking the Accuracy of Breakpoint Identification......Page 440
2.7 Checking the Accuracy of Recombinant Sequence Identification......Page 442
2.8 Evaluating How Well Recombination Signals Have Been Grouped into Recombination Events......Page 444
2.9 Accepting a Verified Recombination Event......Page 445
2.10 Saving Analysis Results......Page 446
3.2 Navigating Through the Results......Page 447
3.3 Checking the Accuracy of Breakpoint Identification......Page 448
3.4 Checking the Accuracy of Recombinant Sequence Identification......Page 449
3.5 Evaluating RDP4´s Grouping of Recombination Events......Page 450
3.6 Completing the Analysis......Page 452
3.7 Further Analyses......Page 453
4 Notes......Page 456
References......Page 458
1 Estimation of Species Trees......Page 459
2 Bayesian Inference of Species Trees......Page 460
3.1 Guenomu Programs......Page 462
3.1.4 Pairwise Distances Between Trees......Page 463
4 Input Files......Page 464
5 Running guenomu......Page 465
5.1 Optimization by Simulated Annealing......Page 467
5.2.1 Posterior Sample of Numeric Parameters......Page 468
5.2.2 Output Trees......Page 471
7 Notes......Page 474
References......Page 475
Erratum to: Sequence Segmentation with changeptGUI......Page 477
Index......Page 478

📜 SIMILAR VOLUMES

Bioinformatics: Volume I: Data, Sequence

📁 Bioinformatics: Volume I: Data, Sequence Analysis, and Evolution

✍ Jonathan M. Keith (eds.) 📂 Library 📅 2017 🏛 Humana Press 🌐 English

This second edition provides updated and expanded chapters covering a broad sampling of useful and current methods in the rapidly developing and expanding field of bioinformatics. Bioinformatics, Volume I: Data, Sequence Analysis, and Evolution, Second Edition is comprised of three section

Bioinformatics: Volume I: Data, Sequence

📁 Bioinformatics: Volume I: Data, Sequence Analysis, and Evolution

✍ Jonathan M. Keith (editor) 📂 Library 📅 2016 🏛 Humana 🌐 English

Bioinformatics: Volume I: Data, Sequence

📁 Bioinformatics: Volume I: Data, Sequence Analysis, and Evolution

✍ Jonathan M. Keith (eds.) 📂 Library 📅 2017 🏛 Humana Press 🌐 English

This second edition provides updated and expanded chapters covering a broad sampling of useful and current methods in the rapidly developing and expanding field of bioinformatics. Bioinformatics, Volume I: Data, Sequence Analysis, and Evolution, Second Edition is comprised of three sect

Bioinformatics: Data, Sequence Analysis

📁 Bioinformatics: Data, Sequence Analysis and Evolution

✍ Ilene Karsch Mizrachi (auth.), Jonathan M. Keith PhD (eds.) 📂 Library 📅 2008 🏛 Humana Press 🌐 English

Not only is the quantity of life science data expanding, but new types of biological data continue to be introduced as a result of technological development and a growing understanding of biological systems. Methods for analyzing these data are an increasingly important component of modern bio

Bioinformatics: Data, Sequence Analysis

📁 Bioinformatics: Data, Sequence Analysis and Evolution

✍ Ilene Karsch Mizrachi (auth.), Jonathan M. Keith PhD (eds.) 📂 Library 📅 2008 🏛 Humana Press 🌐 English

Bioinformatics: Data, Sequence Analysis

📁 Bioinformatics: Data, Sequence Analysis and Evolution

✍ Ilene Karsch Mizrachi (auth.), Jonathan M. Keith PhD (eds.) 📂 Library 📅 2008 🏛 Humana Press 🌐 English