Data Mining Techniques for the Life Sciences (Methods in Molecular Biology, 1415)

✍ Scribed by Oliviero Carugo (editor), Frank Eisenhaber (editor)

Publisher: Humana
Year: 2016
Tongue: English
Leaves: 549
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This volume details several important databases and data mining tools. Data Mining Techniques for the Life Sciences, Second Edition guides readers through archives of macromolecular three-dimensional structures, databases of protein-protein interactions, thermodynamics information on protein and mutant stability, “Kbdock” protein domain structure database, PDB_REDO databank, erroneous sequences, substitution matrices, tools to align RNA sequences, interesting procedures for kinase family/subfamily classifications, new tools to predict protein crystallizability, metabolomics data, drug-target interaction predictions, and a recipe for protein-sequence-based function prediction and its implementation in the latest version of the ANNOTATOR software suite. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls.

Authoritative and cutting-edge, Data Mining Techniques for the Life Sciences, Second Editionaims to ensure successful results in the further study of this vital field.

✦ Table of Contents

Preface
Contents
Contributors
Part I: Databases
Chapter 1: Update on Genomic Databases and Resources at the National Center for Biotechnology Information
1 Introduction
2 Primary Data Submission and Storage
2.1 Primary Raw Sequence Data: Sequence Read Archive (SRA)
2.2 Primary Sequence Data—Genome and Transcriptome Assemblies Rapidly
2.3 Primary Metadata: BioProject, BioSample
2.4 Submission Portal
3 Text Search and Retrieval system
3.1 Basic Organizing principles
3.1.1 Query Examples
3.1.2 Towards Discovery: Sensors and Adds, Faucets (Filters) and Alerts
3.2 Tools for Advanced Users
4 Genomic Databases; Public Reports
4.1 Sequence Read Archive (SRA)
4.2 NCBI Taxonomy
4.3 GenBank
4.4 Whole Genome Shotgun (WGS)
4.5 Genome Collection Database (Assembly)
4.6 BioProject
4.7 Genome
4.7.1 Genome Browser
4.7.2 Entrez Text-
4.7.3 Organelle and Plasmid
4.7.4 Graphical View
4.7.5 Assembly and Annotation Report
4.7.6 Protein Details Report
5 Searching Data by Sequence Similarity (BLAST)
5.1 Exploring NGS Experiments with SRA-BLAST
5.2 BLAST with Assembled Eukaryotic Genomes
5.3 Microbial Genomic BLAST: Reference and Representatives
6 FTP Resources for Genome Data
7 Conclusion
References
Chapter 2: Protein Structure Databases
1 Introduction
2 Structures and Structural Data
2.1 Terminology
2.2 The Protein Data Bank (PDB) and the wwPDB
2.3 Structural Data and Analyses
3 Atlases
3.1 The RCSB PDB
3.1.1 Summary Page
3.1.2 Other Information
3.1.3 Molecule of the Month
3.2 The PDBe
3.2.1 PDBeFold
3.2.2 PDBeMotif
3.2.3 PDBePISA
3.3 JenaLib
3.4 OCA
3.5 PDBsum
3.5.1 Summary Page
3.5.2 Quality Assessment
3.5.3 Enzyme Reactions
3.5.4 Figures from Key references
3.5.5 Secondary Structure and Topology Diagrams
3.5.6 Intermolecular Interactions
4 Homology Models and Obsolete Entries
4.1 Homology Modeling Servers
4.2 Threading Servers
4.3 Obsolete Entries
5 Fold Databases
5.1 Classification Schemes
5.2 Fold Comparison
6 Miscellaneous Databases
6.1 Selection of Data Sets
6.2 Uppsala Electron Density Server (EDS) and PDB_REDO
6.3 Curiosities
7 Summary
References
Chapter 3: The MIntAct Project and Molecular Interaction Databases
1 Introduction
2 Molecular Interaction Databases
3 The Manual Curation Process
4 Molecular Interaction Standards
5 IMEx Databases
6 The MIntAct Project
6.1 The IntAct Web-Based Curation Tool
7 Future Plans
References
Chapter 4: Applications of Protein Thermodynamic Database for Understanding Protein Mutant Stability and Designing Stable Mutants
1 Introduction
2 Thermodynamic Database for Proteins and Mutants, ProTherm
2.1 Contents of ProTherm
2.1.1 Sequence and Structure Information
2.1.2 Experimental Conditions
2.1.3 Thermodynamic Data
2.1.4 Literature
2.2 Search and Display Options in ProTherm
2.3 ProTherm Statistics
2.4 General Trends on Mutational Effects on Protein Stability
3 Factors Influencing the Stability of Proteins and Mutants
4 Prediction of Protein Mutant Stability
4.1 Prediction of Protein Stability Using Structural Information
4.2 Prediction of Protein Mutant Stability from Amino Acid Sequence
4.3 Prediction of Protein Stability upon Multiple Mutations
4.4 Applications and Evaluation of Protein Stability Prediction Tools
5 Conclusions
References
Chapter 5: Classification and Exploration of 3D Protein Domain Interactions Using Kbdock
1 Introduction
2 Materials
2.1 The Kbdock Database
2.2 The Kbdock Web Interface
2.3 3D Visualization
3 Methods
3.1 Browsing the Kbdock Database
3.2 Domain–Peptide Interactions
3.3 Structural Neighbor Interactions
3.4 Searching for DDI Docking Templates
4 Notes
References
Chapter 6: Data Mining of Macromolecular Structures
1 Introduction
1.1 The Protein Data Bank
1.2 The PDB_REDO Data Bank
2 Understanding Model Quality in the PDB
2.1 On the Availability of Diffraction Data for Crystallographic Models
2.2 Considering Family Ties Between Structural Models
2.3 Help the Aged
2.4 Annotation of PDB Entries and the PDB File Format
2.4.1 Considerations About the PDB File Format
2.4.2 PDB Remediation and New Formats
2.4.3 Format-Independent Annotation Issues in the PDB
2.4.4 Remediation and Re-annotation
2.5 The Effect of the Crystallographic Experiment
2.5.1 Diffraction Data Quality
2.5.2 Model Building and Refinement
3 Model Validation
3.1 PDB Validation Reports
3.2 About Validation Metrics
3.2.1 Bias in Validation
3.2.2 An Overview of Popular Model Quality Indicators
3.3 Validation of Non-protein Components
3.3.1 Pitfalls in Ligand Placement
3.3.2 About Metal Ions
3.4 Can a Deposited Model Be Improved?
3.4.1 The PDB_REDO Method
4 How to Select the Best Model for Analysis
4.1 Selecting Structure Models
5 Notes
References
Chapter 7: Criteria to Extract High-Quality Protein Data Bank Subsets for Structure Users
1 Introduction
2 Redundancy in the PDB
3 Missing Residues
4 Conformational Disorder and Occupancy
5 Atomic Displacement Parameters
6 Temperature
7 Resolution and Maps
8 Structure Validation
9 Estimated Standard Errors
10 Concluding Remarks
References
Chapter 8: Homology-Based Annotation of Large Protein Datasets
1 Introduction
2 Materials
2.1 The Pfam Database
2.2 Running Pfam Profile-HMM Models Against a Protein Dataset
2.3 Clustering Protein Sequences
2.3.1 CD-HIT
2.3.2 MCL
2.3.3 Jackhmmer
2.4 Remote Homology Detection Via Profile-HMM–Profile-HMM Alignments
2.4.1 HHblits
3 Methods
3.1 Redundancy Reduction
3.2 Family Annotation
3.3 Classification of Unannotated Regions
3.3.1 Identification of Regions with No Significant Match to Pfam Families
3.3.2 Creation of Dataset of All Regions not Matching Any Pfam family
3.3.3 Clustering of Unannotated Fragments Based on Sequence Similarity
3.3.4 Cluster Analysis
4 Notes
References
Part II: Computational Techniques
Chapter 9: Identification and Correction of Erroneous Protein Sequences in Public Databases
1 Introduction
2 Identification of Erroneous Sequences with the MisPred Pipeline
2.1 Rationale and Logic of MisPred Tools
2.2 Constituents of the MisPred Pipeline
2.3 Reliability of MisPred Tools
2.4 Types of Erroneous Entries Identified by MisPred in Public Databases
2.4.1 The UniProtKB/Swiss-Prot Database
2.4.2 The UniProtKB/TrEMBL Database
2.4.3 The EnsEMBL and NCBI/GNOMON Datasets
3 Correction of Erroneous Sequences with the FixPred Pipeline
3.1 Rationale and Logic of the FixPred Pipeline
3.2 Constituents of the FixPred Pipeline
3.3 Performance of the FixPred Pipeline
References
Chapter 10: Improving the Accuracy of Fitted Atomic Models in Cryo-EM Density Maps of Protein Assemblies Using Evolutionary Information from Aligned Homologous Proteins
1 Introduction
1.1 Integration of X-ray, NMR or Homology Models into Cryo-EM Density Maps
1.2 Evolutionary Conservation of Residues in Protein–Protein Complexes
1.3 Importance of Protein Structural and Sequence Alignment Databases in Evolutionary Studies
2 Materials and Methods
2.1 Simulated Cryo-EM Density Maps for Crystal Structures of Protein–Protein Complexes
2.2 Multiple Protein Cryo-EM Density Fitting Using GMFit
2.3 Detection of Core Interfacial Residues in Crystal and Fitted Complex Structures
2.4 Average Conservation Score Calculation for Core of the Interface
2.5 Calculation of F-Measure to Evaluate the Performance of Cryo-EM Density Fitting
2.6 Refinement of Fits with Lower Interface Conservation Scores Compared to the Crystal Structures
2.7 Statistical Analysis
2.8 Molecular Visualization and Scripting for Data Analysis
3 Results and Discussion
3.1 Large Scale Density Fitting for Estimation of Errors Based on Conservation Criteria
3.2 Comparison of Average Interface Conservation Scores Between Crystal Structures and Fitted Complexes
3.3 Evaluation of Density Fitting Based on F-Measure
3.4 Use of Interface Conservation Scores to Detect Fitting Errors and for Refining the Fits
3.5 A Case Study with the Closed-State RyR1 in Complex with FKBP12 Cryo-EM Density Map
4 Conclusion
References
Chapter 11: Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS
1 Introduction
2 Materials
3 Methods
3.1 Generating Matrices Using the PCA Subspace
3.2 Benchmarks
3.3 KDE and Refinement
4 Illustrative Examples
5 Discussion
6 Conclusions
References
Chapter 12: Promises and Pitfalls of High-Throughput Biological Assays
1 Introduction
2 Considerations
2.1 Dangers of “Fishing Expeditions”
2.1.1 Increased Type I Error Rate and Multiple Testing
2.1.2 Type II Errors and Statistical Power
2.1.3 Confounding
2.1.4 Experimental Design
2.2 A Strategy for Success
3 Methods
3.1 Approaches for Multiple Testing Correction
3.2 Type II Error and Power Analysis
3.3 Experimental Design and the Statistical Analysis Plan
3.4 Methods and Tools to Correct for Experimental Bias
3.5 Tools for Reproducible Research and Their Availability
3.6 Standards and Code Sharing Tools
3.7 Authoring Tools
4 Notes
5 Conclusion
References
Chapter 13: Optimizing RNA-Seq Mapping with STAR
1 Introduction
2 Materials
2.1 Hardware
2.2 Software
2.3 Input Files
3 Methods
3.1 Generating Genome Indices
3.1.1 Basic Command
3.1.2 Including Annotations
Selecting --sjdbOverhang
3.1.3 Advanced Parameters
3.2 Mapping Reads to the Genome
3.2.1 Basic Command
3.2.2 Including Annotations at the Mapping Step
3.2.3 Input Options
Input Files
Trimming the Read Sequences
3.2.4 Controlling Output of Alignments
Sorted and Unsorted SAM and BAM
Unmapped Reads
Attributes
SAM Read Groups
Output File Name Prefix
Standard Output
Temporary Output Directory
3.2.5 Filtering of the Alignments
Alignment Scoring
Minimum Alignment Score and Length
Paired-End Alignments
Mismatches
Soft-Clipping
Multimappers
3.2.6 Tuning Mapping Sensitivity
3.2.7 Filtering Splice Junctions
Filtering Introns
Filtering Output to SJ.out.tab
3.2.8 2-Pass Mapping
Multi-sample 2-Pass Mapping
Per-sample 2-Pass Mapping
3.2.9 Loading Genome into Shared Memory
3.3 Post-mapping Processing
3.3.1 Wiggle Files
3.3.2 Remove Duplicates
3.3.3 Transcriptomic Output
3.3.4 Counting Number of Reads per Gene
References
Part III: Prediction Methods
Chapter 14: Predicting Conformational Disorder
1 Introduction
2 Materials
3 Methods
3.1 Searching Databases Dedicated to IDPs
3.1.1 The Database of Disordered Protein Prediction (D2P2)
3.1.2 MobiDB
3.1.3 DisProt
3.1.4 IDEAL
3.1.5 The PED (Proteins Ensemble Database)
3.1.6 PDB (Proteins Data Bank)
3.2 Running Disorder Predictions
3.2.1 Metapredictors
DisMeta
GeneSilico MetaDisorder MD2
MetaPrDOS
MULTICOM
MFDp
MFDP2
PONDR-FIT
PredictProtein
MeDor
3.2.2 Individual Disorder Predictors
Predictors Trained on Datasets of Disordered Proteins
PreDisorder
DNDisorder
PONDR
DisProt VL2, VL3, and VSL2 and Derivatives
GlobPlot 2
DisEMBL
DISOPRED
RONN
DISpro
CSpritz
ESpritz
SPINE-D
DICHOT
OnD-CRF
PrDOS
POODLE-I
Predictors that Have Not Been Trained on Disordered Proteins
IUPred
FoldUnfold
DRIP-PRED
Binary Disorder Predictors
The Charge/Hydropathy Method and Its Derivative FoldIndex
The Cumulative Distribution Function (CDF)
The CH–CDF Plot
Nonconventional Disorder Predictors
The Hydrophobic Cluster Analysis (HCA)
3.2.3 Combining Predictors and Experimental Data
3.3 Identifying Regions of Induced Folding
3.3.1 ANCHOR
3.3.2 MoRFpred
3.4 General Procedure for Disorder Prediction
References
Chapter 15: Classification of Protein Kinases Influenced by Conservation of Substrate Binding Residues
1 Introduction
2 Materials and Methods
2.1 Dataset
2.1.1 Selection of Representative Kinase Complexes Bound to Their Substrates
2.2 Structure-Based Alignment of Representative Kinase–Peptide Complexes
2.3 Identification of Substrate Binding Residues
3 Results and Discussion
3.1 Kinase Substrate Binding Residues
3.2 Conserved Segments of Kinase Regions Identified as Substrate Binding Blocks
3.3 Prediction of Substrate Binding Residues in a Subfamily of Kinases Using Substrate Binding Blocks
3.4 Conserved Substrate Binding Residues in Various Protein Kinase Subfamilies
3.5 Implication of Conservation of Substrate Binding Residues in the Classification of Kinases
4 Conclusion
References
Chapter 16: Spectral–Statistical Approach for Revealing Latent Regular Structures in DNA Sequence
1 Introduction
2 HeteroGenome Database. Materials, Methodology, and Analysis of the Results
2.1 Spectral–Statistical Approach for Revealing DNA Sequences Similar to Approximate Tandem Repeats
2.2 Strategy of Searching for and Structuring Data in the HeteroGenome
2.3 Results of the HeteroGenome Data Analysis
2.3.1 Impact of Latent Periodicity on Chromosome Length
2.3.2 Analysis of Periodic Structure Preservation in the Regions of Heterogeneity
2.3.3 Revealing Latent Periodicity in the Genome Functional Regions
2.3.4 Density of Distributing Latent Periodicity Regions Along the Chromosomes
3 Spectral–Statistical Approach for Recognizing Latent Profile Periodicity
3.1 Methodology of Recognizing Latent Profile Periodicity
3.1.1 Model of Profile String and Notion of Latent Profile Periodicity
3.1.2 Methods for Estimating Period Length of Latent Profile Periodicity
3.1.3 Pattern Estimate for Etalon of Latent Profile Periodicity on Basis of Goodness-of-Fit Test
3.1.4 Methods, Reconstructing Spectrum of Deviation from Homogeneity and Confirming a Pattern Estimate for Etalon of Latent Profile Periodicity
3.2 Notion of 3-Regularity in Coding Regions of DNA Sequences
3.3 Results of the 2S-Approach Application to Recognizing Latent Profile Periodicity and Regularity in DNA Sequences
4 Conclusion
References
Chapter 17: Protein Crystallizability
1 Introduction
1.1 Protein Crystallization
1.2 Structural Genomics
2 Methods
2.1 Crystallization Target Selection
2.1.1 Overall Structural Determination Success
2.1.2 Probability of Protein Crystallization
2.2 Construct Optimization
2.3 Optimizing Initial Crystallization Conditions
3 Notes
3.1 Data
3.2 Methods
References
Chapter 18: Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using ngs.plot
1 Introduction
2 Materials
2.1 Datasets
2.2 Software
3 Methods
3.1 Command Line Protocol
3.1.1 Download Data and Folder Organization
3.1.2 A Basic ngs.plot Analysis Run
3.1.3 Incorporating More Complex Functionalities into an ngs.plot Analysis
3.1.4 Multiple Plots, Paired Samples for Normalization, and Gene/Region Ranking
3.1.5 Replotting and Plotting Correlations
3.2 A Web-Based Workflow Based on Galaxy
3.2.1 Introduction to the Galaxy Web Interface
3.2.2 Uploading Input Files
3.2.3 Running ngs.plot
3.2.4 Replotting
4 Notes
References
Chapter 19: Datamining with Ontologies
1 A Brief Overview of Ontologies
2 Datamining with Ontologies
2.1 Ontologies and Graph Structures
2.2 Distributing Data Over Ontology Graphs
2.3 Enrichment
2.4 Similarity
2.5 Further Uses of Ontologies in Datamining
2.6 Ontologies as Formalized Theories
3 Notes
3.1 Browsing and Manipulating Ontologies
3.2 Working with Large Ontologies
3.3 Choosing the “Right” Similarity Measure
References
Chapter 20: Functional Analysis of Metabolomics Data
1 Introduction
2 Methods
2.1 Data Preparation
2.2 Pathway Mapping and Visualization
2.2.1 Global View (with iPath)
2.2.2 Simple Visualization
2.2.3 Advanced Customization
2.2.4 Detailed View in KEGG
3 Enrichment Analysis
4 Notes
References
Chapter 21: Bacterial Genomic Data Analysis in the Next-Generation Sequencing Era
1 Introduction
2 Delving into Microbiology NGS Data Analysis
2.1 Pre-Processing
2.2 Alignment
2.3 De Novo Assembly
2.4 Scaffolding
2.5 Post-assembly
2.6 Variant Calling
2.7 Annotation
2.8 Complementary Tasks
3 Advanced Workflow Examples
3.1 Workflow #1: Pre-processing
3.2 Workflow #2: Bacterial Re-sequencing
3.3 Workflow #3: Bacterial De Novo Assembly
4 Conclusions
References
Chapter 22: A Broad Overview of Computational Methods for Predicting the Pathophysiological Effects of Non-synonymous Variants
1 Introduction
2 Materials
3 Methods
3.1 Pathogenicity Predictors for Non-synonymous Variants
3.1.1 PolyPhen-2
3.1.2 SIFT
3.1.3 MutationAssessor
3.1.4 CADD
3.1.5 MutationTaster2
3.1.6 Fathmm
3.1.7 PANTHER
3.1.8 SNPs&GO
3.1.9 EFIN
3.1.10 Align-GVGD
3.1.11 KD4V
3.1.12 MutPred
3.1.13 PROVEAN
3.1.14 EvoD
3.1.15 HOPE
3.1.16 SNPEffect
3.1.17 VEST
3.1.18 SNPs3D
3.2 Public Collections of Pre-computed Predictions
3.3 Consensus Methods
3.3.1 Condel
3.3.2 CAROL
3.3.3 COVEC
3.3.4 PON-P
3.3.5 PredictSNP
3.3.6 PaPI
3.3.7 Meta-SNP
3.4 Pathogenicity Prediction in Cancer
References
Chapter 23: Recommendation Techniques for Drug–Target Interaction Prediction and Drug Repositioning
1 Introduction
2 Materials and Methods
2.1 Recommendation Techniques for DTI Prediction
2.1.1 Background on Recommendation Algorithms
2.1.2 The DT-Hybrid Algorithm
2.1.3 An Extension to DT-Hybrid: p-Value-Based Selection of DTI Interactions
2.1.4 Applying DT-Hybrid for Drug Combinations Prediction
2.2 Beyond Hybrid Methods and Drug Repositioning
2.2.1 Limitations of Recommendation Algorithms
2.2.2 Tripartite Network Recommendation: An Approach to Drug Repositioning
3 Conclusions
References
Chapter 24: Protein Residue Contacts and Prediction Methods
1 Introduction
1.1 Definition of Contacts
1.2 Contact Evaluation
1.3 Contact Evaluation in CASP Competition
2 Materials
2.1 Machine Learning-Based Methods
2.2 Coevolution-
2.3 Brief Overview of DNcon
3 Methods
4 Case Study
5 Notes
References
Chapter 25: The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment
1 Introduction
2 Concepts in Protein Sequence Analysis and Function Prediction
3 ANNOTATOR: The Integration of Protein Sequence-Analytic Tools
3.1 Visualization
4 Conclusions
References
Part IV: Big Data
Chapter 26: Big Data, Evolution, and Metagenomes: Predicting Disease from Gut Microbiota Codon Usage Profiles
1 Introduction
2 Materials and Data Preparation
3 Methods
3.1 Required Software
3.2 Prediction of Expression Levels
3.3 Enrichment Analysis
3.4 Random Forest Classification
3.5 Metabolic Module Identification
4 Results
5 Conclusion
6 Notes
References
Chapter 27: Big Data in Plant Science: Resources and Data Mining Tools for Plant Genomics and Proteomics
1 Introduction
2 What Generates Big Data in Plant Science?
2.1 Plant Genomes Analyses
2.2 Databases and Tools for Plant Genomics
2.3 Plant Proteomes Analyses
2.4 Databases and Tools for Plant Proteomics
3 Big Data Initiatives and Integrated Environments for Data-Driven Plant Science
3.1 The National Institutes of Health’s Big Data to Knowledge Initiative (NIH-BD2K)
3.2 The Elixir Infrastructure
3.3 The iPlant Project
3.4 The 1KP Transcriptome Project
4 The iPlant Cyberinfrastructure and Protocols
4.1 The iPlant Data Store
4.2 The iPlant Discovery Environment
4.2.1 Protocol: Using the iPlant Collaborative Discovery Environment
4.3 The iPlant Atmosphere
5 Considerations for the Future of Big Data in Plant Science
References
Index

📜 SIMILAR VOLUMES

Data Mining Techniques for the Life Scie

📁 Data Mining Techniques for the Life Sciences (Methods in Molecular Biology, 609)

✍ Oliviero Carugo (editor), Frank Eisenhaber (editor) 📂 Library 📅 2009 🏛 Humana 🌐 English

<span>Most life science researchers will agree that biology is not a truly theoretical branch of science. The hype around computational biology and bioinformatics beginning in the nineties of the 20th century was to be short lived (1, 2). When almost no value of practical importance such as the opti

Data mining techniques for the life scie

📁 Data mining techniques for the life sciences

✍ Stefan Washietl, Ivo L. Hofacker (auth.), Oliviero Carugo, Frank Eisenhaber (eds 📂 Library 📅 2010 🏛 Humana Press 🌐 English

Data Mining Techniques for the Life Scie

📁 Data Mining Techniques for the Life Sciences

✍ Stefan Washietl, Ivo L. Hofacker (auth.), Oliviero Carugo, Frank Eisenhaber (eds 📂 Library 📅 2010 🏛 Humana Press 🌐 English

<p><P>Whereas getting exact data about living systems and sophisticated experimental procedures have primarily absorbed the minds of researchers previously, the development of high-throughput technologies has caused the weight to increasingly shift to the problem of interpreting accumulated data in

Data mining techniques for the life scie

📁 Data mining techniques for the life sciences

✍ Stefan Washietl, Ivo L. Hofacker (auth.), Oliviero Carugo, Frank Eisenhaber (eds 📂 Library 📅 2010 🏛 Humana Press 🌐 English

Data Mining Techniques for the Life Scie

📁 Data Mining Techniques for the Life Sciences

✍ Stefan Washietl, Ivo L. Hofacker (auth.), Oliviero Carugo, Frank Eisenhaber (eds 📂 Library 📅 2010 🏛 Humana Press 🌐 English

Data Mining Techniques for the Life Scie

📁 Data Mining Techniques for the Life Sciences

✍ Oliviero Carugo; Frank Eisenhaber 📂 Library 📅 2022 🏛 Humana 🌐 English

This third edition details new and updated methods and protocols on important databases and data mining tools. Chapters guides readers through archives of macromolecular sequences and three-dimensional structures, databases of protein-protein interactions, methods for prediction conformational disor