Practical Guide to Life Science Databases

✍ Scribed by Imad Abugessaisa (editor), Takeya Kasukawa (editor)

Publisher: Springer
Year: 2022
Tongue: English
Leaves: 228
Edition: 1st ed. 2021
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book provides the latest information of life science databases that center in the life science research and drive the development of the field. It introduces the fundamental principles, rationales and methodologies of creating and updating life science databases. The book brings together expertise and renowned researchers in the field of life science databases and brings their experience and tools at the fingertips of the researcher. The book takes bottom-up approach to explain the structure, content and the usability of life science database. Detailed explanation of the content, structure, query and data retrieval are discussed to provide practical use of life science database and to enable the reader to use database and provided tools in practice. The readers will learn the necessary knowledge about the untapped opportunities available in life science databases and how it could be used so as to advance basic research and applied research findings and transforming them to the benefit of human life.

Chapter 2 is available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.

✦ Table of Contents

Preface
Contents
Chapter 1: GENCODE Annotation for the Human and Mouse Genome: A User Perspective
1.1 Introduction
1.2 GENCODE Database Overview
1.3 Annotation Method Adopted in GENCODE
1.3.1 Overall Annotation Methods Adopted by GENCODE
1.3.2 Automated Annotation Approaches
1.3.3 Manual Curation Approaches
1.3.4 Merging the Automated and Manual Curation Results
1.3.5 Experimental Validation
1.3.6 Annotation from Mouse Genome
1.4 Data Format Available for GENCODE Annotation
1.5 GENCODE Data Access
1.5.1 Data Access from GENCODE Web Portal
1.5.2 Data Access from Ensembl Genome Browser
1.5.3 Data Access from UCSC Genome Browser
1.6 Use Cases to Utilize GENCODE
1.6.1 Use Case 1: Extracting Different Types of Genes from GENCODE for Downstream Analysis
1.6.2 Use Case 2: Exploration of the Annotation of lncRNA MALAT1
1.6.3 Use Case 3: Exploration of SUMO1P3 (Small Ubiquitin Like Modifier 1 Pseudogene 3)
1.7 Latest Update of the GENCODE Annotation
1.8 Conclusion
References
Chapter 2: The GeneCards Suite
2.1 Introduction
2.2 Database Overview
2.2.1 Importance and Current Status
2.2.2 Future Update and Availability of the Database
2.3 Content and Architecture of the Database
2.3.1 Main Database Features and Types of Data Stored
2.3.2 Data Collection and Curation Methods
2.3.3 Dataset Indexing/Accession Number/Identification
2.3.4 Quality Control Methods
2.3.5 Database Update and Maintenance Strategy
2.4 Database Access and Mining Methods
2.4.1 Tools and Techniques to Access, Discover, and Mine the Content of the Database
2.4.1.1 VarElect: The NGS Phenotyper of the GeneCards Suite
2.4.1.2 TGex: The Knowledge-Driven Clinical Genetics Analysis Platform of the GeneCards Suite
2.4.1.3 Analysis of Genomic Structural Variants (SVs) Enabled by GeneHancer
2.4.1.4 Gene Set Enrichment Analysis
2.4.2 How to Explore and Browse the Database
2.4.3 How to Query the Database
2.4.4 How to Upload/Download Data
2.5 Use Cases
2.5.1 Interpretation of Single Nucleotide Variants (SNVs)
2.5.2 Interpretation of Genomic Structural Variants (SVs)
2.5.3 GeneHancer-Powered Interpretation of SVs
2.5.4 Other VarElect Use Cases
2.6 Summary and Future Development of the Database
References
Chapter 3: Atlas of miRNAs and Their Promoters in Human and Mouse
3.1 Preamble
3.2 Database Content
3.3 Database Architecture
3.4 Using the FANTOM5 miRNA Atlas Interactively
3.4.1 Using the miRNA Expression Viewer
3.4.2 Using the Interactive miRNA Expression Heatmap
3.5 Database Access and Mining Methods
3.6 Summary and Future Development of the Database
References
Chapter 4: IHEC Data Portal
4.1 Preamble
4.1.1 Summary
4.1.2 Purpose
4.1.3 Source and Type of Dataset
4.1.4 Target User Group
4.2 Database Overview
4.2.1 Importance of the Dataset
4.2.2 Current Status of Achievements
4.2.3 Main Features of the Database
4.2.4 Future Updates and Availability
4.3 Content and Architecture of the Database
4.3.1 Type of Data Stored
4.3.2 Data Collection Methods
4.3.3 Curation Approaches
4.3.4 Processing Strategy
4.3.5 Dataset Indexing/Accession Number/Identification
4.3.6 Quality Control Method
4.3.7 Database Update and Maintenance Strategy
4.4 Database Access and Mining Methods
4.4.1 Tools and Techniques to Access Database Content
4.4.2 Software and Tools for Discovering and Mining the Database
4.4.3 How to Explore and Browse the Database
4.4.4 How to Query the Database
4.4.5 How to Upload/Download Data to the Database
4.4.6 Programming and Automated Techniques for Database Access
4.4.6.1 Web Services
4.4.6.2 FTP
4.4.6.3 API
4.4.6.4 Bioinformatics Tools (R/Python) Packages
4.4.7 Database Integration Strategy
4.5 Use Cases and Demo to Utilize the Database
4.5.1 Use Case 1: Navigating Blueprint hg38 Transcriptomic Data in the UCSC Genome Browser
4.5.2 Use Case 2: Discovering Available IHEC Datasets Matching Metadata Requirements
4.5.3 Use Case 3: Assessing Dataset Comparability in the Portal
4.6 Summary and Future Development of the Database
References
Chapter 5: ChIP-Atlas
5.1 Preamble
5.2 Database Overview
5.3 Content and Architecture of the Database
5.4 Use Cases and Demonstration of the Database
5.4.1 Each Data Mode
5.4.2 Peak Browser
5.4.3 Target Genes
5.4.4 Colocalization
5.4.5 Enrichment Analysis Using a Gene Set as a Query
5.4.6 Enrichment Analysis Using Genomic Coordinates as a Query
5.5 Database Access and Mining Methods
5.5.1 Downloading Each SRX Data
5.5.2 Assembled Peak-Call Data Used in Peak Browser´´ 5.5.3 Analyzed Data Used inTarget Genes´´ and ``Colocalization´´
References
Chapter 6: RefEX: Reference Expression Dataset
6.1 Introduction
6.2 Database Overview
6.2.1 Importance of Reference Gene Expression Datasets
6.2.2 Current Status of Reference Gene Expression Data
6.2.3 The Main Feature of RefEx
6.2.4 Future Update and Availability of the Database
6.3 Content and Architecture of the Database
6.3.1 EST
6.3.2 GeneChip
6.3.3 CAGE
6.3.4 RNA-Seq
6.4 Database Access and Mining Methods
6.4.1 Gene Expression Visualization Tool in RefEx
6.4.2 How to Query RefEx
6.4.3 How to Download Data from RefEx
6.4.4 Programmatic Technique to Access RefEx
6.5 Use-Cases and Demo to Utilize the Database
6.6 Summary and Future Development of the Database
References
Chapter 7: The Mouse Gene Expression Database (GXD)
7.1 Preamble
7.2 Database Overview
7.3 Data Structure, Data Curation, and Content of the Database
7.4 Database Access and Mining Methods
7.5 Use Cases and Demo to Use the Database
7.5.1 Simple Use Cases
7.5.1.1 Where and When Is a Given Gene Expressed?
7.5.1.2 What Genes Are Expressed in a Given Tissue/Anatomical Structure?
7.5.2 Intermediate Use Cases
7.5.2.1 Combining Search Parameters to Formulate Complex Queries
7.5.2.2 Differential Expression Search and Batch Search
7.5.3 Advanced Use Cases
7.6 Summary and Future Developments of the Database
References
Chapter 8: Protein Structural Changes Based on Structural Comparison
8.1 Introduction
8.2 Motion Tree
8.2.1 Overview
8.2.2 Illustration of Structural Changes with Motion Tree
8.2.3 Availability of Motion Tree
8.3 PSCDB
8.3.1 Overview
8.3.2 Data Construction
8.3.3 Browsing PSCDB
8.4 Future Work
References
Chapter 9: Single Cell Databases: An Emerging and Essential Tool
9.1 Introduction
9.1.1 Summary of the Databases/Data Repositories
9.1.2 Purpose of the Databases/Data Repositories
9.1.3 The Source and Type of the Dataset Stored
9.1.4 Target Users
9.2 Database Overview
9.2.1 The Importance of the Type of the Dataset Stored
9.2.2 The Current Status and What Has Been Done
9.2.3 The Main Feature(s) of the Databases/Data Repositories
9.2.4 Future Update and Availability of the Database
9.3 Content and Architecture of the Database
9.3.1 Type of the Data Stored
9.3.2 Data Collection Methods
9.3.3 Curation Approaches
9.3.4 Processing Strategy
9.3.5 Dataset Indexing and Accession Numbers
9.3.6 Quality Control Method
9.3.7 Database Update and Maintenance Strategy
9.4 Database Access and Data Tools
9.4.1 Accessing and Browsing the Content of Single Cell Databases (Fig. 9.2)
9.4.2 How to Query the Database
9.4.3 How to Upload/Download Data to the Database and Fig. 9.4
9.4.4 Programming and Automated Technique to Access the Database
9.4.5 Database Integration Strategy
9.5 Use-Cases and Capabilities of Single Cell Database
9.5.1 Simple Use-Case Example
9.5.2 Intermediate Use-Case Example
9.5.3 Advanced Use-Case and Fig. 9.7 with Panels
9.6 Summary and Future Development of the Database
References
Chapter 10: scIVA: Single Cell Database and Tools for Interactive Visualisation and Analysis
10.1 Introduction
10.2 Database Overview
10.2.1 scRNA-seq Technology and Single Cell Data
10.2.2 Landscape of scRNA-seq Data Analysis Tools
10.2.2.1 Overview of scIVA
10.3 scIVA Data Input and Preprocessing
10.3.1 Data Collection Methods and Types of the Data Stored
10.3.2 Quality Control Methods and Curation Approaches
10.4 Database Access and Mining Methods
10.4.1 Single Gene Visualisation
10.4.1.1 Single Gene Analysis
10.4.1.2 Gene List Analysis
10.5 Use Cases and Demo to Utilise the scIVA Database Framework
10.6 Future Update and Availability of scIVA
10.7 Summary and Future Development of the Database
References
Chapter 11: Access and Visualise High Quality Gene Expression Data with Stemformatics
11.1 Introduction to Stemformatics
11.1.1 The Integrated Data Atlases
11.1.2 The Myeloid Atlas
11.1.3 General Features of the Atlas Page
11.2 Data Processing and Sample Annotation
11.2.1 Data Selection
11.2.2 Data Processing
11.2.3 Sample Annotation
11.3 System Architecture
11.3.1 The Challenge of System Design in Research Environments
11.3.2 Stemformatics System Architecture
11.4 User Interfaces and Use-Cases
11.4.1 Use-Case 1 (Basic Level): Find Datasets of Relevance and View Details
11.4.2 Use-Case 2 (Intermediate): Combine Sample Groups on the Myeloid Atlas
11.4.3 Use-Case 3 (Intermediate/Advanced): Project One´s Own Data Onto the Blood Atlas
11.4.4 Use-Case 4 (Advanced): Use the API to Download All Sample Metadata for Datasets Containing Blood Samples
11.5 Summary
References

📜 SIMILAR VOLUMES

Practical Guide to Life Science Database

📁 Practical Guide to Life Science Databases

✍ Imad Abugessaisa; Takeya Kasukawa 📂 Library 🌐 English

Practical Guide to Database Design

📁 Practical Guide to Database Design

✍ Hogan, Rex 📂 Library 📅 2018 🏛 CRC Press LLC 🌐 English

A Practical Guide to Database Design

📁 A Practical Guide to Database Design

✍ Rex Hogan 📂 Library 📅 2018 🏛 CRC Press 🌐 English

"Fully updated and expanded from the previous edition, A Practical Guide to Database Design, Second Edition, is intended for those involved in the design or development of a database system or application. It begins by focusing on how to create a logical data model where data is stored "where it b

Practical Guide to Large Database Migrat

📁 Practical Guide to Large Database Migration

✍ Preston Zhang 📂 Library 📅 2019 🏛 CRC Press 🌐 English

It is a major challenge to migrate very large databases from one system, say for example, to transfer critical data from Oracle to SQL Server. One has to consider several issues such as loss of data being transferred, the security of the data, the cost and effort, technical aspects of the software i

A Practical Guide to Database Design

📁 A Practical Guide to Database Design

✍ Rex Hogan 📂 Library 📅 2018 🏛 CRC Press 🌐 English

<p>Fully updated and expanded from the previous edition, <strong>A Practical Guide to Database Design, Second Edition </strong>is intended for those involved in the design or development of a database system or application. It begins by illustrating how to develop a Third Normal Form data model wher

Concise Guide to Databases: A Practical

📁 Concise Guide to Databases: A Practical Introduction

✍ Peter Lake, Paul Crowther (auth.) 📂 Library 📅 2013 🏛 Springer-Verlag London 🌐 English

<p>This easy-to-read textbook/reference presents a comprehensive introduction to databases, opening with a concise history of databases and of data as an organisational asset. As relational database management systems are no longer the only database solution, the book takes a wider view of database