Data Science with Semantic Technologies: Theory, Practice and Application

✍ Scribed by Archana Patel, Narayan C. Debnath, Bharat Bhushan

Publisher: Wiley-Scrivener
Year: 2022
Tongue: English
Leaves: 456
Series: Advances in Intelligent and Scientific Computing
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

DATA SCIENCE WITH SEMANTIC TECHNOLOGIES

This book will serve as an important guide toward applications of data science with semantic technologies for the upcoming generation and thus becomes a unique resource for scholars, researchers, professionals, and practitioners in this field.

To create intelligence in data science, it becomes necessary to utilize semantic technologies which allow machine-readable representation of data. This intelligence uniquely identifies and connects data with common business terms, and it also enables users to communicate with data. Instead of structuring the data, semantic technologies help users to understand the meaning of the data by using the concepts of semantics, ontology, OWL, linked data, and knowledge-graphs. These technologies help organizations to understand all the stored data, adding the value in it, and enabling insights that were not available before. As data is the most important asset for any organization, it is essential to apply semantic technologies in data science to fulfill the need of any organization.

Data Science with Semantic Technologies provides a roadmap for the deployment of semantic technologies in the field of data science. Moreover, it highlights how data science enables the user to create intelligence through these technologies by exploring the opportunities and eradicating the challenges in the current and future time frame. In addition, this book provides answers to various questions like: Can semantic technologies be able to facilitate data science? Which type of data science problems can be tackled by semantic technologies? How can data scientists benefit from these technologies? What is knowledge data science? How does knowledge data science relate to other domains? What is the role of semantic technologies in data science? What is the current progress and future of data science with semantic technologies? Which types of problems require the immediate attention of researchers?

Audience

Researchers in the fields of data science, semantic technologies, artificial intelligence, big data, and other related domains, as well as industry professionals, software engineers/scientists, and project managers who are developing the software for data science. Students across the globe will get the basic and advanced knowledge on the current state and potential future of data science.

✦ Table of Contents

Cover
Half-Title Page
Title Page
Copyright Page
Contents
Preface
1 A Brief Introduction and Importance of Data Science
1.1 What is Data Science? What Does a Data Scientist Do?
1.2 Why Data Science is in Demand?
1.3 History of Data Science
1.4 How Does Data Science Differ from Business Intelligence?
1.5 Data Science Life Cycle
1.6 Data Science Components
1.7 Why Data Science is Important
1.8 Current Challenges
1.8.1 Coordination, Collaboration, and Communication
1.8.2 Building Data Analytics Teams
1.8.3 Stakeholders vs Analytics
1.8.4 Driving with Data
1.9 Tools Used for Data Science
1.10 Benefits and Applications of Data Science
Benefits of Data Science
Applications of Data Science
1.11 Conclusion
References
2 Exploration of Tools for Data Science
2.1 Introduction
2.2 Top Ten Tools for Data Science
2.3 Python for Data Science
2.3.1 Python Datatypes
2.3.2 Helpful Rules for Python Programming
2.3.3 Jupyter Notebook for IPython
2.3.4 Your First Python Program
2.4 R Language for Data Science
2.4.1 R Datatypes
2.4.2 Your First R Program
2.5 SQL for Data Science
2.6 Microsoft Excel for Data Science
2.6.1 Detection of Outliers in Data Sets Using Microsoft Excel
2.6.2 Regression Analysis in Excel Using Microsoft Excel
2.7 D3.JS for Data Science
2.8 Other Important Tools for Data Science
2.8.1 Apache Spark Ecosystem
2.8.2 MongoDB Data Store System
2.8.3 MATLAB Computing System
2.8.4 Neo4j for Graphical Database
2.8.5 VMWare Platform for Virtualization
2.9 Conclusion
References
3 Data Modeling as Emerging Problems of Data Science
3.1 Introduction
3.2 Data
3.2.1 Unstructured Data
3.2.2 Semistructured Data
3.2.3 Structured Data
3.2.4 Hybrid (Un/Semi)-Structured Data
3.2.5 Big Data
3.3 Data Model Design
3.4 Data Modeling
3.4.1 Records-Based Data Model
3.4.2 Non–Record-Based Data Model
3.5 Polyglot Persistence Environment
References
4 Data Management as Emerging Problems of Data Science
4.1 Introduction
4.2 Perspective and Context
4.2.1 Life Cycle
4.2.2 Use
4.3 Data Distribution
4.4 CAP Theorem
4.5 Polyglot Persistence
References
5 Role of Data Science in Healthcare
5.1 Predictive Modeling—Disease Diagnosis and Prognosis
5.1.1 Supervised Machine Learning Models
5.1.2 Clustering Models
5.1.2.1 Centroid-Based Clustering Models
5.1.2.2 Expectation Maximization (EM) Algorithm
5.1.2.3 DBSCAN
5.1.3 Feature Engineering
5.2 Preventive Medicine—Genetics/Molecular Sequencing
5.2.1 Technologies for Sequencing
5.2.2 Sequence Data Analysis with BioPython
5.2.2.1 Sequence Data Formats
5.2.2.2 BioPython
5.3 Personalized Medicine
5.4 Signature Biomarkers Discovery from High Throughput Data
5.4.1 Methodology I — Novel Feature Selection Method with Improved Mutual Information and Fisher Score
5.4.1.1 Algorithm for the Novel Feature Selection Method with Improved Mutual Information and Fisher Score
5.4.1.2 Computing F-Score Values for the Features
5.4.1.3 Block Diagram for the Method-1
5.4.2 Feature Selection Methodology-II — Entropy Based Mean Score with mRMR
5.4.2.1 Algorithm for the Feature Selection Methodology-II
5.4.2.2 Introduction to mRMR Feature Selection
5.4.2.3 Data Sets
5.4.2.4 Identification of Biomarkers Using Rank Product
5.4.2.5 Fold Change Values
Conclusion
References
6 Partitioned Binary Search Trees (P(h)-BST): A Data Structure for Computer RAM
6.1 Introduction
6.2 P(h)-BST Structure
6.2.1 Preliminary Analysis
6.2.2 Terminology and Conventions
6.3 Maintenance Operations
6.3.1 Operations Inside a Class
6.3.2 Operations Between Classes (Outside a Class)
6.4 Insert and Delete Algorithms
6.4.1 Inserting a New Element
6.4.2 Deleting an Existing Element
6.5 P(h)-BST as a Generator of Balanced Binary Search Trees
6.6 Simulation Results
6.6.1 Data Structures and Abstract Data Types
6.6.2 Analyzing the Insert and Delete Process in Random Case
6.6.3 Analyzing the Insert Process in Ascending (Descending) Case
6.6.4 Comparing P(2)-BST/P(8)-BST to Red-Black/AVL Trees
6.7 Conclusion
Acknowledgments
References
7 Security Ontologies: An Investigation of Pitfall Rate
7.1 Introduction
7.2 Secure Data Management in the Semantic Web
7.3 Security Ontologies in a Nutshell
7.4 InFra_OE Framework
7.5 Conclusion
References
8 IoT-Based Fully-Automated Fire Control System
8.1 Introduction
8.2 Related Works
8.3 Proposed Architecture
8.4 Major Components
8.4.1 Arduino UNO
8.4.2 Temperature Sensor
8.4.3 LCD Display (16X2)
8.4.4 Temperature Humidity Sensor (DHT11)
8.4.5 Moisture Sensor
8.4.6 CO2 Sensor
8.4.7 Nitric Oxide Sensor
8.4.8 CO Sensor (MQ-9)
8.4.9 Global Positioning System (GPS)
8.4.10 GSM Modem
8.4.11 Photovoltaic System
8.5 Hardware Interfacing
8.6 Software Implementation
LM35 interfacing and loop programming
DHT11 interfacing and loop programming
MQ-X Interfacing and Loop Programming
Moisture Sensor Interfacing and Loop Programming
8.7 Conclusion
References
9 Phrase Level-Based Sentiment Analysis Using Paired Inverted Index and Fuzzy Rule
9.1 Introduction
9.2 Literature Survey
9.3 Methodology
9.3.1 Construction of Inverted Wordpair Index
9.3.1.1 Sentiment Analysis Design Framework
9.3.1.2 Sentiment Classification
9.3.1.3 Preprocessing of Data
9.3.1.4 Algorithm to Find the Score
9.3.1.5 Fuzzy System
9.3.1.6 Lexicon-Based Sentiment Analysis
9.3.1.7 Defuzzification
9.3.2 Performance Metrics
9.4 Conclusion
References
10 Semantic Technology Pillars: The Story So Far
10.1 The Road that Brought Us Here
10.2 What is a Semantic Pillar?
10.2.1 Machine Learning
10.2.2 The Semantic Approach
10.3 The Foundation Semantic Pillars: IRI’s, RDF, and RDFS
10.3.1 Internationalized Resource Identifier (IRI)
10.3.2 Resource Description Framework (RDF)
10.3.2.1 Alternative Technologies to RDF: Property Graphs
10.3.3 RDF Schema (RDFS)
10.4 The Semantic Upper Pillars: OWL, SWRL, SPARQL, and SHACL
10.4.1 The Web Ontology Language (OWL)
10.4.1.1 Axioms to Define Classes
10.4.1.2 The Open World Assumption
10.4.1.3 No Unique Names Assumption
10.4.1.4 Serialization
10.4.2 The Semantic Web Rule Language
10.4.2.1 The Limitations of Monotonic Reasoning
10.4.2.2 Alternatives to SWRL
10.4.3 SPARQL
10.4.3.1 The SERVICE Keyword and Linked Data
10.4.4 SHACL
10.4.4.1 The Fundamentals of SHACL
10.5 Conclusion
References
11 Evaluating Richness of Security Ontologies for Semantic Web
11.1 Introduction
11.2 Ontology Evaluation: State-of-the-Art
11.2.1 Domain-Dependent Ontology Evaluation Tools
11.2.2 Domain-Independent Ontology Evaluation Tools
11.3 Security Ontology
11.4 Richness of Security Ontologies
11.5 Conclusion
References
12 Health Data Science and Semantic Technologies
12.1 Health Data
12.2 Data Science
12.3 Health Data Science
12.4 Examples of Health Data Science Applications
12.5 Health Data Science Challenges
12.6 Health Data Science and Semantic Technologies
12.6.1 Natural Language Processing (NLP)
12.6.2 Clinical Data Sharing and Data Integration
12.6.3 Ontology Engineering and Quality Assurance (QA)
12.7 Application of Data Science for COVID-19
12.8 Data Challenges During COVID-19 Outbreak
12.9 Biomedical Data Science
12.10 Conclusion
References
13 Hybrid Mixed Integer Optimization Method for Document Clustering Based on Semantic Data Matrix
13.1 Introduction
13.2 A Method for Constructing a Semantic Matrix of Relations Between Documents and Taxonomy Concepts
13.3 Mathematical Statements for Clustering Problem
13.3.1 Mathematical Statements for PDC Clustering Problem
13.3.2 Mathematical Statements for CC Clustering Problem
13.3.3 Relations between PDC Clustering and CC Clustering
13.4 Heuristic Hybrid Clustering Algorithm
13.5 Application of a Hybrid Optimization Algorithm for Document Clustering
13.6 Conclusion
Acknowledgment
References
14 Role of Knowledge Data Science During COVID-19 Pandemic
14.1 Introduction
14.1.1 Global Health Emergency
14.1.2 Timeline of the COVID-19
14.2 Literature Review
14.3 Model Discussion
14.3.1 COVID-19 Time Series Dataset
14.3.2 FBProphet Forecasting Model
14.3.3 Data Preprocessing
14.3.4 Data Visualization
14.4 Results and Discussions
14.4.1 Analysis and Forecasting: The World
14.4.2 Performance Metrics
14.4.3 Analysis and Forecasting: The Top 20 Countries
14.5 Conclusion
References
15 Semantic Data Science in the COVID-19 Pandemic
15.1 Crises Often Are Catalysts for New Technologies
15.1.1 Definitions
15.1.2 Methodology
15.2 The Domains of COVID-19 Semantic Data Science Research
15.2.1 Surveys
15.2.2 Semantic Search
15.2.2.1 Enhancing the CORD-19 Dataset with Semantic Data
15.2.2.2 CORD-19-on-FHIR -Semantics for COVID-19 Discovery
15.2.2.3 Semantic Search on Amazon Web Services (AWS)
15.2.2.4 COVID*GRAPH
15.2.2.5 Network Graph Visualization of CORD-19
15.2.2.6 COVID-19 on the Web
15.2.3 Statistics
15.2.3.1 The Johns Hopkins COVID-19 Dashboard
15.2.3.2 The NY Times Dataset
15.2.4 Surveillance
15.2.4.1 An IoT Framework for Remote Patient Monitoring
15.2.4.2 Risk Factor Discovery
15.2.4.3 COVID-19 Surveillance in a Primary Care Network
15.2.5 Clinical Trials
15.2.6 Drug Repurposing
15.2.7 Vocabularies
15.2.8 Data Analysis
15.2.8.1 CODO
15.2.8.2 COVID-19 Phenotypes
15.2.8.3 Detection of “Fake News”
15.2.8.4 Ontology-Driven Weak Supervision for Clinical Entity Classification
15.2.9 Harmonization
15.3 Discussion
15.3.1 Privacy Issues
15.3.2 Domains that May Currently be Under Utilized
15.3.2.1 Detection of Fake News
15.3.2.2 Harmonization
15.3.3 Machine Learning and Semantic Technology: Synergy Not Competition
15.3.4 Conclusion
Acknowledgment
References
Index
EULA

📜 SIMILAR VOLUMES

Data Science with Semantic Technologies:

📁 Data Science with Semantic Technologies: Deployment and Exploration

✍ Archana Patel, Narayan C. Debnath 📂 Library 📅 2023 🏛 CRC Press 🌐 English

Gone are the days when data was interlinked with related data by humans and human interpretation was required. Data is no longer just data. It is now considered a Thing or Entity or Concept with meaning, so that a machine not only understands the concept but also extrapolates the way humans do. D

Data Science with Semantic Technologies:

📁 Data Science with Semantic Technologies: New Trends and Future Developments

✍ Archana Patel, Narayan C. Debnath 📂 Library 📅 2023 🏛 CRC Press 🌐 English

As data is an important asset for any organization, it is essential to apply semantic technologies in data science to fulfill the need of any organization. This volume of a two-volume handbook set provides a roadmap for new trends and future developments of data science with semantic techno

Graph Theory with Algorithms and its App

📁 Graph Theory with Algorithms and its Applications: In Applied Science and Technology

✍ Santanu Saha Ray 📂 Library 📅 2012 🏛 Springer 🌐 English

The book has many important features which make it suitable for both undergraduate and postgraduate students in various branches of engineering and general and applied sciences. The important topics interrelating Mathematics & Computer Science are also covered briefly. The book is useful to readers

Graph Theory with Algorithms and its App

📁 Graph Theory with Algorithms and its Applications: In Applied Science and Technology

✍ Santanu Saha Ray (auth.) 📂 Library 📅 2013 🏛 Springer India 🌐 English

The book has many important features which make it suitable for both undergraduate and postgraduate students in various branches of engineering and general and applied sciences. The important topics interrelating Mathematics & Computer Science are also covered briefly. The book is useful to reade

Graph Theory with Algorithms and its App

📁 Graph Theory with Algorithms and its Applications: In Applied Science and Technology

✍ Santanu Saha Ray 📂 Library 📅 2012 🏛 Springer 🌐 English

The book has many important features which make it suitable for both undergraduate and postgraduate students in various branches of engineering and general and applied sciences. The important topics interrelating Mathematics & Computer Science are also covered briefly. The book is useful t

Big Data Analytics: Theory, Techniques,

📁 Big Data Analytics: Theory, Techniques, Platforms, and Applications (SpringerBriefs in Applied Sciences and Technology)

✍ Ümit Demirbaga, Gagangeet Singh Aujla, Anish Jindal, Oğuzhan Kalyon 📂 Library 📅 2024 🏛 Springer 🌐 English

This book introduces readers to big data analytics. It covers the background to and the concepts of big data, big data analytics, and cloud computing, along with the process of setting up, configuring, and getting familiar with the big data analytics working environments in the first two ch