Scalable Big Data Analytics for Protein Bioinformatics: Efficient Computational Solutions for Protein Structures

✍ Scribed by Dariusz Mrozek

Publisher: Springer
Year: 2018
Tongue: English
Leaves: 331
Series: Computational Biology
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book presents a focus on proteins and their structures. The text describes various scalable solutions for protein structure similarity searching, carried out at main representation levels and for prediction of 3D structures of proteins. Emphasis is placed on techniques that can be used to accelerate similarity searches and protein structure modeling processes.
The content of the book is divided into four parts. The first part provides background information on proteins and their representation levels, including a formal model of a 3D protein structure used in computational processes, and a brief overview of the technologies used in the solutions presented in the book. The second part of the book discusses Cloud services that are utilized in the development of scalable and reliable cloud applications for 3D protein structure similarity searching and protein structure prediction. The third part of the book shows the utilization of scalable Big Data computational frameworks, like Hadoop and Spark, in massive 3D protein structure alignments and identification of intrinsically disordered regions in protein structures. The fourth part of the book focuses on finding 3D protein structure similarities, accelerated with the use of GPUs and the use of multithreading and relational databases for efficient approximate searching on protein secondary structures.
The book introduces advanced techniques and computational architectures that benefit from recent achievements in the field of computing and parallelism. Recent developments in computer science have allowed algorithms previously considered too time-consuming to now be efficiently used for applications in bioinformatics and the life sciences. Given its depth of coverage, the book will be of interest to researchers and software developers working in the fields of structural bioinformatics and biomedical databases.

✦ Table of Contents

Front Matter ....Pages i-xxvi
Front Matter ....Pages 1-1
Formal Model of 3D Protein Structures for Functional Genomics, Comparative Bioinformatics, and Molecular Modeling (Dariusz Mrozek)....Pages 3-27
Technological Roadmap (Dariusz Mrozek)....Pages 29-48
Front Matter ....Pages 49-49
Azure Cloud Services (Dariusz Mrozek)....Pages 51-67
Scaling 3D Protein Structure Similarity Searching with Azure Cloud Services (Dariusz Mrozek)....Pages 69-102
Cloud Services for Efficient Ab Initio Predictions of 3D Protein Structures (Dariusz Mrozek)....Pages 103-134
Front Matter ....Pages 135-135
Foundations of the Hadoop Ecosystem (Dariusz Mrozek)....Pages 137-150
Hadoop and the MapReduce Processing Model in Massive Structural Alignments Supporting Protein Function Identification (Dariusz Mrozek)....Pages 151-182
Scaling 3D Protein Structure Similarity Searching on Large Hadoop Clusters Located in a Public Cloud (Dariusz Mrozek)....Pages 183-214
Scalable Prediction of Intrinsically Disordered Protein Regions with Spark Clusters on Microsoft Azure Cloud (Dariusz Mrozek)....Pages 215-247
Front Matter ....Pages 249-249
Massively Parallel Searching of 3D Protein Structure Similarities on CUDA-Enabled GPU Devices (Dariusz Mrozek)....Pages 251-282
Exploration of Protein Secondary Structures in Relational Databases with Multi-threaded PSS-SQL (Dariusz Mrozek)....Pages 283-309
Back Matter ....Pages 311-315

✦ Subjects

Microsoft Azure; Cloud Computing; Algorithms; Analytics; Multithreading; Big Data; Bioinformatics; Parallel Programming; Relational Databases; Apache Spark; Apache Hadoop; Clusters; Protein Structure; Computational Biology

📜 SIMILAR VOLUMES

Scalable big data analytics for protein

📁 Scalable big data analytics for protein bioinformatics: efficient computational solutions for protein structures

✍ Mrozek, Dariusz 📂 Library 📅 2018 🏛 Springer 🌐 English

Data Analytics for Protein Crystallizati

📁 Data Analytics for Protein Crystallization

✍ Marc L. Pusey,Ramazan Savaş Aygün (auth.) 📂 Library 📅 2017 🏛 Springer International Publishing 🌐 English

This unique text/reference presents an overview of the computational aspects of protein crystallization, describing how to build robotic high-throughput and crystallization analysis systems. The coverage encompasses the complete data analysis cycle, including the set-up of screens by analyzing

Data mining for genomics and proteomics:

📁 Data mining for genomics and proteomics: Analysis of gene and protein expression data

✍ Dziuda D. 📂 Library 📅 2010 🏛 Wiley 🌐 English

Data Mining for Genomics and Proteomics uses pragmatic examples and a complete case study to demonstrate step-by-step how biomedical studies can be used to maximize the chance of extracting new and useful biomedical knowledge from data. It is an excellent resource for students and professionals invo

High-Performance Computational Solutions

📁 High-Performance Computational Solutions in Protein Bioinformatics

✍ Dariusz Mrozek (auth.) 📂 Library 📅 2014 🏛 Springer International Publishing 🌐 English

Recent developments in computer science enable algorithms previously perceived as too time-consuming to now be efficiently used for applications in bioinformatics and life sciences. This work focuses on proteins and their structures, protein structure similarity searching at main representation l

Protein Bioinformatics: From Protein Mod

📁 Protein Bioinformatics: From Protein Modifications and Networks to Proteomics

✍ Cathy H. Wu, Cecilia N. Arighi, Karen E. Ross 📂 Library 📅 2017 🏛 Humana Press 🌐 English

Protein Bioinformatics: From Protein Mod

📁 Protein Bioinformatics: From Protein Modifications and Networks to Proteomics

✍ Cathy H. Wu, Cecilia N. Arighi, Karen E. Ross (eds.) 📂 Library 📅 2017 🏛 Humana Press 🌐 English

This volume introduces bioinformatics research methods for proteins, with special focus on protein post-translational modifications (PTMs) and networks. This book is organized into four parts and covers the basic framework and major resources for analysis of protein sequence, structure, and fu