𝔖 Scriptorium
✦   LIBER   ✦

📁

Data Analytics: Concepts, Techniques, and Applications

✍ Scribed by Mohiuddin Ahmed (editor), Al-Sakib Khan Pathan (editor)


Publisher
CRC Press
Year
2018
Tongue
English
Leaves
451
Edition
1
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Large data sets arriving at every increasing speeds require a new set of efficient data analysis techniques. Data analytics are becoming an essential component for every organization and technologies such as health care, financial trading, Internet of Things, Smart Cities or Cyber Physical Systems. However, these diverse application domains give rise to new research challenges. In this context, the book provides a broad picture on the concepts, techniques, applications, and open research directions in this area. In addition, it serves as a single source of reference for acquiring the knowledge on emerging Big Data Analytics technologies.

✦ Table of Contents


Cover
Half Title
Title Page
Copyright Page
Dedication
Contents
Acknowledgments
Preface
List of Contributors
SECTION I: DATA ANALYTICS CONCEPTS
1 An Introduction to Machine Learning
1.1 A Definition of Machine Learning
1.1.1 Supervised or Unsupervised?
1.2 Artificial Intelligence
1.2.1 The First AI Winter
1.3 ML and Statistics
1.3.1 Rediscovery of ML
1.4 Critical Events: A Timeline
1.5 Types of ML
1.5.1 Supervised Learning
1.5.2 Unsupervised Learning
1.5.3 Semisupervised Learning
1.5.4 Reinforcement Learning
1.6 Summary
1.7 Glossary
References
2 Regression for Data Analytics
2.1 Introduction
2.1.1 Chapter Roadmap
2.1.2 What Is Regression?
2.2 Linear Regression
2.2.1 Dataset Description
2.2.2 Problem Definition
2.2.2.1 To Wrap Up the Whole Thing
2.2.3 Probabilistic Interpretation
2.2.4 Optimization Method
2.2.5 Block Diagram
2.2.6 Overview of the Model
2.3 Logistic Regression
2.3.1 Problem Definition
2.3.2 Logistic Function
2.3.3 Probabilistic Interpretation
2.3.4 Optimization Method
2.3.5 Overview of the Model
2.4 Problems of Regression
2.4.1 Underfitting and Overfitting
2.4.2 Outlier
2.4.3 Hyper-Parameter
2.5 Conclusion
References
3 Big Data-Appropriate Clustering via Stochastic Approximation and Gaussian Mixture Models
3.1 Introduction
3.1.1 Chapter Roadmap
3.2 Stochastic Approximation Algorithm
3.2.1 Convergence Result
3.3 Gaussian Mixture Model
3.4 An SAA for Maximum Likelihood Estimation of GMMs
3.5 Simulation Results
3.5.1 Practical Considerations
3.5.2 Simulating Gaussian Mixture Models
3.5.3 Comparisons
3.5.4 Results
3.6 MNIST Application
3.6.1 Data Preprocessing
3.6.2 Results
3.7 Conclusions
References
4 Information Retrieval Methods for Big Data Analytics on Text
4.1 Introduction to Information Retrieval
4.2 Vector Space Models
4.2.1 Document-Term Matrix
4.2.2 Distance Metrics
4.2.2.1 Euclidean Distance
4.2.2.2 Mahalanobis Distance
4.2.2.3 Jaccard Index
4.2.2.4 Cosine Similarity
4.2.2.5 Word Mover’s Distance
4.2.3 Term Frequency–Inverse Document Frequency
4.3 Information Retrieval Approaches
4.3.1 Latent Semantic Analysis
4.3.2 word2vec
4.3.2.1 CBOW Model
4.3.2.2 Skip-Gram Model
4.3.3 fastText
4.4 Walk-Through of IR: An Illustrative Example
4.5 Applications
4.5.1 Sentiment Extraction
4.5.2 Text Categorization/Spam Detection
4.5.3 Translation to Other Languages
4.5.4 Automated Q&A and Chatbots
4.5.5 Text Summarization
4.5.6 Resume Short Listing
4.5.7 Replacing Medical Codes from Patient’s Prescription
4.5.8 Writing Automatically/Text Generation/Poetry
4.6 Conclusions
References
5 Big Graph Analytics
5.1 Introduction
5.1.1 Motivation and Challenges of Big Graph Analytics
5.1.2 Frameworks for Big Graph Analytics
5.1.3 Organization and Goal of This Chapter
5.2 Distributed Frameworks for Analyzing Big Static Graphs
5.2.1 Vertex-Centric Frameworks
5.2.1.1 Classification of Vertex-Centric Frameworks
5.2.2 Block-Centric Frameworks
5.2.2.1 BLOGEL
5.2.2.2 GIRAPH++
5.2.3 Subgraph-Centric Frameworks
5.2.3.1 NScale
5.2.4 Matrix-Based Frameworks
5.2.4.1 PEGUSUS
5.2.5 DBMS-Based Frameworks
5.2.5.1 Pregelix
5.2.5.2 DG-SPARQL
5.3 Single-Machine Frameworks for Analyzing Big Static Graphs
5.4 Distributed Frameworks for Analyzing Big Dynamic Graphs
5.4.1 Frameworks for Analyzing Temporal Graphs
5.4.1.1 DeltaGraph
5.4.1.2 Chronos
5.4.2 Frameworks for Analyzing Streaming Graphs
5.5 Single-Machine Frameworks for Analyzing Big Dynamic Graphs
5.5.1 LLAMA and SLOTH
5.5.2 STINGER
5.6 Conclusions
Notes
References
SECTION II: DATA ANALYTICS TECHNIQUES
6 Transition from Relational Database to Big Data and Analytics
6.1 Introduction
6.1.1 Background, Motivation, and Aim
6.1.2 Chapter Organization
6.2 Transition from Relational Database to Big Data
6.2.1 Relational Database
6.2.2 Introduction to Big Data
6.2.3 Relational Data vs. Big Data
6.3 Evolution of Big Data
6.3.1 Facts and Predictions about the Data Generated
6.3.2 Applications of Big Data
6.3.3 Fundamental Principle and Properties of Big Data
6.3.3.1 Issues with Traditional Architecture for Big Data Processing
6.3.3.2 Fundamental Principle for Scalable Database System
6.3.3.3 Properties of Big Data System
6.3.4 Generalized Framework for Big Data Processing
6.3.4.1 Storage and Precomputation Layer
6.3.4.2 Knowledge Discovery Layer (Serving Layer)
6.3.4.3 Real-Time Data Processing Layer (Speed Layer)
6.4 Big Data Analytics
6.4.1 Big Data Characteristics and Related Challenges
6.4.1.1 Volume
6.4.1.2 Velocity
6.4.1.3 Variety
6.4.2 Why Big Data Analytics?
6.4.2.1 Text Analytics
6.4.2.2 Audio Analytics
6.4.2.3 Video Analytics
6.4.2.4 Social Media Analytics
6.4.2.5 Predictive Analytics
6.4.3 Challenges in Big Data Analytics
6.4.3.1 Collect and Store Data
6.4.3.2 Data Management
6.4.3.3 Data Analysis
6.4.3.4 Security for Big Data
6.4.3.5 Visualization of Data
6.5 Tools and Technologies for Big Data Processing
6.5.1 Tools
6.5.1.1 Thrift
6.5.1.2 ZooKeeper
6.5.1.3 Hadoop DFS
6.5.2 Resource Management
6.5.3 NoSQL Database: Unstructured Data Management
6.5.3.1 Apache HBase
6.5.3.2 Apache Cassandra
6.5.4 Data Processing
6.5.4.1 Batch Processing
6.5.4.2 Distributed Stream Processing
6.5.4.3 Graph Processing
6.5.4.4 High-Level Languages for Data Processing
6.5.5 Data Analytics at the Speed Layer
6.6 Future Work and Conclusion
6.6.1 Future Work on Real-Time Data Analytics
6.6.2 Conclusion
References
7 Big Graph Analytics: Techniques, Tools, Challenges, and Applications
7.1 Introduction
7.2 Graph + Big Data = Big Graph
7.2.1 The Scale of Big Graph: How Big Is Big Graph?
7.2.2 V’s of Big Graph
7.2.3 Graph Databases
7.3 Big Graph Analytics
7.3.1 Definition
7.3.2 Relationships: The Basics of Graph Analytics
7.4 Big Graph Analytics Approaches
7.4.1 In-Memory Big Graph Analytics
7.4.2 SSD-Based Big Graph Analytics
7.4.3 Disk-Based Big Graph Analytics
7.4.4 Other Big Graph Analytics Frameworks
7.5 Graph Analytic Techniques
7.5.1 Centrality Analysis
7.5.1.1 Degree Centrality
7.5.1.2 Eigenvector Centrality
7.5.1.3 Katz Centrality
7.5.1.4 PageRank Centrality
7.5.1.5 Closeness Centrality
7.5.1.6 Betweenness Centrality
7.5.2 Path Analysis
7.5.3 Community Analysis
7.5.4 Connectivity Analysis
7.6 Algorithms for Big Graph Analytics
7.6.1 PageRank
7.6.2 Connected Component
7.6.3 Distributed Minimum Spanning Tree
7.6.4 Graph Search
7.6.5 Clustering
7.7 Issues and Challenges of Big Graph Analytics
7.7.1 High-Degree Vertex
7.7.2 Sparseness
7.7.3 Data-Driven Computations
7.7.4 Unstructured Problems
7.7.5 In-Memory Challenge
7.7.6 Communication Overhead
7.7.7 Load Balancing
7.8 Applications of Big Graph Analytics
7.8.1 Social Network Analysis
7.8.2 Behavior Analytics
7.8.3 Biological Networks
7.8.4 Recommendation Systems
7.8.5 Smart Cities
7.8.6 Geospatial Data and Logistics
7.8.7 Insurance Fraud Detection
7.9 Conclusions
References
8 Application of Game Theory for Big Data Analytics
8.1 Introduction
8.1.1 Chapter Roadmap
8.2 Basics of Classical and Evolutionary Game Theory
8.2.1 Classical Game Theory
8.2.2 Evolutionary Game Theory
8.2.3 Nash Equilibrium
8.2.4 Pareto Efficiency
8.2.5 Repeated Game
8.2.6 Bayesian Game
8.2.7 Chicken Game
8.2.8 Tit-for-Tat Game
8.2.9 Stackelberg Game
8.2.10 Potential Game
8.3 Game-Theoretic Application in Big Data Analytics
8.4 Limitations and Future Work
8.5 Conclusion
References
9 Project Management for Effective Data Analytics
9.1 Introduction
9.1.1 Chapter Roadmap
9.2 Big Data Projects
9.3 Project Management Body of Knowledge
9.4 Projects in Controlled Environment 2
9.5 Agile
9.6 ISO 21500:2012
9.7 Key Insights
9.8 Conclusion
References
10 Blockchain in the Era of Industry 4.0
10.1 Introduction
10.1.1 Chapter Roadmap
10.2 Emergence of Industrial Revolutions
10.2.1 Fourth Industrial Revolution (Industry 4.0)
10.2.2 Definition of Industry 4.0
10.2.3 Core Components of Industry 4.0
10.3 Blockchain and Cryptocurrency
10.3.1 Definition of Blockchain
10.3.2 Components of Blockchain
10.3.3 Working Procedure and Algorithm
10.3.4 Cryptocurrency
10.4 Blockchain’s Impact on Industry 4.0
10.4.1 How Blockchain Supports Industry 4.0
10.4.2 Application Domains of Blockchain in Industry 4.0
10.4.3 Adaptation Issues and Open Research Challenges
10.4.4 Challenges Associated with Law, Policy, and Standardization
10.4.5 Recommendations for Adaptation
10.5 Potential Use Case and Comparative Analysis
10.5.1 Use Case: DSC
10.5.2 Comparative Analysis
10.6 Conclusion
References
11 Dark Data for Analytics
11.1 Introduction
11.1.1 Chapter Roadmap
11.2 Origin of Dark Data
11.3 Risks of Dark Data
11.4 Dark Data Analytics: An Untapped Opportunity
11.4.1 Implication of Dark Data in the Health Sector
11.4.2 Dark Data for Gaining Market Advantage
11.4.3 Dark Data for Social Media Insights
11.4.4 Retailers Providing Personalization with the Help of Dark Data
11.5 Different Ways to Eliminate Dark Data
11.5.1 Tools and Technique for Collecting and Analyzing Dark Data
11.5.2 A Brief Introduction to DeepDive
11.5.3 Six Steps to Identify and Manage Dark Data
11.6 Dark Data Solution Provided by Companies
11.6.1 AI Foundry’s Agile Solutions for Transformation of Dark Data
11.6.2 Dark Data Fracking by Datumize
11.6.3 Nuix Information Governance Solution
11.6.4 Deloitte: Insight’s Way to Start Extracting Value from Dark Data
11.7 International Data Corporation’s Research on Organization’s Ability to Derive Value from Dark Data
11.8 Recommendations on Managing Dark Data
11.9 Conclusion
References
SECTION III: DATA ANALYTICS APPLICATIONS
12 Big Data: Prospects and Applications in the Technical and Vocational Education and Training Sector
12.1 Introduction
12.1.1 What Is Big Data?
12.1.2 Chapter Roadmap
12.2 Big Data Technologies
12.2.1 Big Data Architecture Framework
12.2.2 Big Data Learning Experience Cycle
12.2.3 Benefits of Big Data
12.2.3.1 Enabling Personalized Learning
12.2.3.2 Proper Decision-Making
12.2.3.3 Measure Return on Investment
12.2.3.4 Performance Prediction
12.2.3.5 Determination of Student Behavior
12.3 Tools, Algorithms, and Analytic Platforms for Educational Purposes
12.4 Recommendation and Conclusion
References
13 Sports Analytics: Visualizing Basketball Records in Graphical Form
13.1 Introduction
13.1.1 Chapter Roadmap
13.2 Background and Related Work
13.3 Design Details
13.3.1 Converting Text Format Data
13.3.2 Drawing Charts and Graphs
13.3.2.1 Line Chart
13.3.2.2 Track Lines
13.3.2.3 Dynamic Elements
13.3.3 Technologies
13.3.4 System Usage
13.3.4.1 General Usage
13.3.4.2 Specific Usage
13.4 User Study
13.4.1 Design and Participants
13.4.2 Measures
13.4.2.1 Search Efficiency and Clarity
13.4.2.2 Usability and Learnability
13.4.2.3 Visual Appeal
13.4.2.4 User Experience and Improvement
13.4.3 Materials and Apparatus
13.4.4 Procedures
13.5 Results
13.5.1 Search and Clarity
13.5.2 Usability and Learnability
13.5.3 Visual Appeal
13.5.4 Overall Experience
13.5.5 Subjective Report
13.6 Discussion
13.6.1 Limitations and Future Work
13.7 Conclusions
Acknowledgments
Declaration of Conflicting Interests
Funding
Notes
References
14 Analysis of Traffic Offenses in Transportation: Application of Big Data Analysis
14.1 Introduction
14.1.1 Chapter Roadmap
14.2 Material and Methods
14.2.1 Data Inclusion Criteria
14.2.2 Data Preprocessing
14.2.3 Data Analysis
14.2.3.1 Linear Regression
14.2.3.2 Nonlinear (Polynomial) Regression
14.2.4 Data Analysis Algorithms
14.2.5 Statistical Analysis
14.3 Results
14.3.1 Top 15 Traffic Offenses
14.3.2 Directly Time-Related Offenses
14.3.3 Offenses against Year of Occurrence
14.3.4 Offenses against Month of Occurrence
14.3.4.1 Regression Model
14.3.4.2 Top 15 Traffic Offense Frequencies against Month
14.3.4.3 Obtained Results
14.3.5 Offenses against Weekday of Occurrence
14.3.5.1 Regression Model
14.3.5.2 Top 15 Traffic Offense Frequency against the Day of the Week
14.3.5.3 Obtained Results
14.3.6 Offenses against Time of Occurrence
14.3.6.1 Regression Model
14.3.6.2 Top 15 Traffic Offense Frequency against Time Period
14.3.6.3 Obtained Results
14.3.7 Summary of the Proposed Regression Models
14.4 Discussion
14.5 Conclusion
References
15 Intrusion Detection for Big Data
15.1 Big Data and Intrusion Detection System
15.1.1 Chapter Roadmap
15.2 What is Big Data?
15.3 Security Issues with Big Data
15.4 Intrusion Detection System
15.5 Classification of Intrusion Detection Systems
15.5.1 Location-Based Classification
15.5.2 Evaluation Criteria-Based Classification
15.6 Collaborative Intrusion Detection and Big Data
15.6.1 Why Is It Necessary?
15.7 Architecture of CIDS
15.7.1 Centralized Architecture
15.7.2 Hierarchical Architecture
15.7.3 Distributed Architecture
15.8 Building Blocks of CIDS
15.8.1 Local Monitoring
15.8.2 Membership Management
15.8.3 Correlation and Aggregation
15.8.4 Data Dissemination
15.8.5 Global Monitoring
15.9 Attacks on Collaborative Intrusion System
15.9.1 External Attacks
15.9.1.1 Disclosure Attack
15.9.2 Evasion Attack
15.9.2.1 Internal Attack
15.10 Cloud Framework and Collaborative Intrusion Detection System
15.11 Coordinated Attacks
15.11.1 Large-Scale Stealthy Scans
15.11.2 Worm Outbreaks
15.11.3 Distributed Denial-of-Service Attacks
15.12 State-of-the-Art Existing Literatures
15.13 Future Direction and Conclusion
References
16 Health Care Security Analytics
16.1 Introduction
16.1.1 Chapter Roadmap
16.2 Health Care in the Era of Industry 4.0
16.3 Taxonomy of Cyberattacks in The Health Care Domain
16.3.1 Attacks on Medical Devices
16.3.1.1 Magnetic Resonance Imaging (MRI)
16.3.1.2 Robotic Surgical Machine
16.3.1.3 Active Patient Monitoring Devices
16.3.2 Cyber-Physical Attacks
16.3.2.1 Attacks on Building Controls System
16.3.3 Insider Threat
16.4 Hacker’s Entry
16.4.1 Reconnaissance
16.4.1.1 Footprinting
16.4.1.2 Network Mapping
16.4.1.3 Scanning
16.4.2 Hacker’s Access Hospital Network
16.4.2.1 Phishing Attack
16.4.2.2 Ransomware
16.4.2.3 USB Stick
16.4.2.4 Password Cracker
16.4.2.5 Black Hole Attack
16.4.2.6 Rogue Access Points
16.5 Countermeasures
16.6 Conclusions
References
Index


📜 SIMILAR VOLUMES


Data Analytics Concepts Techniques and A
✍ Mohiuddin Ahmed, Al-Sakib Khan Pathan 📂 Library 📅 2019 🏛 CRC Press 🌐 English

Large data sets arriving at every increasing speeds require a new set of efficient data analysis techniques. Data analytics are becoming an essential component for every organization and technologies such as health care, financial trading, Internet of Things, Smart Cities or Cyber Physical Systems.

Data analytics: concepts, techniques, an
✍ Ahmed, Mohiuddin; Pathan, Al-Sakib Khan 📂 Library 📅 2019 🏛 CRC Press 🌐 English

Large data sets arriving at every increasing speeds require a new set of efficient data analysis techniques. Data analytics are becoming an essential component for every organization and technologies such as health care, financial trading, Internet of Things, Smart Cities or Cyber Physical Systems.

Data analytics: concepts, techniques, an
✍ Ahmed, Mohiuddin; Pathan, Al-Sakib Khan 📂 Library 📅 2019 🏛 CRC Press 🌐 English

Large data sets arriving at every increasing speeds require a new set of efficient data analysis techniques. Data analytics are becoming an essential component for every organization and technologies such as health care, financial trading, Internet of Things, Smart Cities or Cyber Physical Systems.

Process Analytics: Concepts and Techniqu
✍ Seyed-Mehdi-Reza Beheshti, Boualem Benatallah, Sherif Sakr, Daniela Grigori, Ham 📂 Library 📅 2016 🏛 Springer International Publishing 🌐 English

<p><p>This book starts with an introduction to process modeling and process paradigms, then explains how to query and analyze process models, and how to analyze the process execution data. In this way, readers receive a comprehensive overview of what is needed to identify, understand and improve bus

Data Mining for Business Analytics: Conc
✍ Galit Shmueli, Peter C. Bruce, Peter Gedeck, Nitin R. Patel 📂 Library 📅 2019 🏛 Wiley 🌐 English

<p><span>Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python</span><span> presents an applied approach to data mining concepts and methods, using Python software for illustration</span></p><p><span>Readers will learn how to implement a variety of popular data mining

Data Mining for Business Analytics: Conc
✍ Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel, Kenneth C. Lichtenda 📂 Library 📅 2017 🏛 Wiley 🌐 English

<p><b><i>Data Mining for Business Analytics: Concepts, Techniques, and Applications in R </i></b><b>presents an applied approach to data mining concepts and methods, using R software for illustration</b></p> <p>Readers will learn how to implement a variety of popular data mining algorithms in R (a f