<span>This book presents theory and applications of recently introduced butterfly optimization algorithm (BOA). It also highlights hybridization process in the basic structure of BOA with in-depth analysis of complexity. This book also describes the constraint handling process. The newly introduced
Big Data Analytics: Theory, Techniques, Platforms, and Applications (SpringerBriefs in Applied Sciences and Technology)
β Scribed by Γmit Demirbaga, Gagangeet Singh Aujla, Anish Jindal, OΔuzhan Kalyon
- Publisher
- Springer
- Year
- 2024
- Tongue
- English
- Leaves
- 299
- Edition
- 2024
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
This book introduces readers to big data analytics. It covers the background to and the concepts of big data, big data analytics, and cloud computing, along with the process of setting up, configuring, and getting familiar with the big data analytics working environments in the first two chapters. The third chapter provides comprehensive information on big data processing systems - from installing these systems to implementing real-world data applications, along with the necessary codes. The next chapter dives into the details of big data storage technologies, including their types, essentiality, durability, and availability, and reveals their differences in their properties. The fifth and sixth chapters guide the reader through understanding, configuring, and performing the monitoring and debugging of big data systems and present the available commercial and open-source tools for this purpose. Chapter seven gives information about a trending machine learning, Bayesian network: a probabilistic graphical model, by presenting a real-world probabilistic application to understand causal, complex, and hidden relationships for diagnosis and forecasting in a scalable manner for big data. Special sections throughout the eighth chapter present different case studies and applications to help the readers to develop their big data analytics skills using various big data analytics frameworks.
The book will be of interest to business executives and IT managers as well as university students and their course leaders, in fact all those who want to get involved in the big data world.
β¦ Table of Contents
Foreword
Preface
Contents
List ofβ¬Figures
1 Introduction
1.1 Essential Big Data Analytics Properties
1.2 Big Data Analytics Techniques
1.3 Overview of This Book
2 Big Data
2.1 Definition of Big Data
2.2 Characteristics of Big Data
2.3 The 5 Vs of Big Data
2.3.1 Volume
2.3.2 Value
2.3.3 Variety
2.3.4 Velocity
2.3.5 Veracity
2.4 Challenges in Big Data
2.4.1 Data Collection and Storage Challenges
2.4.2 Data Quality and Integrity Challenges
2.4.3 Privacy and Security Concerns
2.4.4 Issues with Extracting Value from Big Data
2.5 Harnessing the Potential of Big Data
2.5.1 Advanced Analytics and Machine Learning Opportunities
2.5.2 Data Visualisation and Communication Opportunities
2.5.3 Future Directions and Emerging Trends
3 Big Data Analytics
3.1 What Is Big Data Analytics?
3.2 The Types of Big Data Analytics
3.2.1 Descriptive Analytics
3.2.2 Diagnostic Analytics
3.2.3 Predictive Analytics
3.2.4 Prescriptive Analytics
3.2.5 Cognitive Analytics
3.3 The Advantages of Big Data Analytics
3.3.1 Risk Management
3.3.2 Cost Reduction
3.3.3 Advanced Data-Driven Decision-Making
3.3.4 Improving New Product Development
3.4 The Challenges of Big Data Analytics
3.4.1 Lack of Knowledge Professionals
3.4.2 Misunderstanding of Big Data
3.4.3 Data Growth Issues
3.4.4 Confusion on Big Data Tool Selection
3.4.5 Data Security and Privacy
3.5 The Steps of Big Data Analytics
3.5.1 Big Data Acquisition
3.5.2 Big Data Preprocessing
3.5.3 Big Data Storage
3.5.4 Big Data Analysis
4 Cloud Computing for Big Data Analytics
4.1 What is Cloud Computing?
4.2 The History of Cloud Computing
4.2.1 Computing Generations
4.3 Cloud Computing Units
4.3.1 Cloud Computing Service Models
4.3.2 Cloud Computing Deployment Models
4.4 Multi-Cloud Strategies in Big Data Analytics
4.5 Cloud Computing Platforms for Big Data Analytics
4.5.1 Amazon Web Services (AWS)
4.5.2 Microsoft Azure
4.5.3 Google Cloud Platform (GCP)
4.5.4 Comparison of Cloud Computing Providers
4.6 Learning Outcomes of the Chapter
5 Big Data Analytics Platforms
5.1 Main Characteristics of Big Data Analytics Platforms
5.1.1 Distributed Computing
5.1.2 Data Ingestion and Integration
5.1.3 Data Storage and Management
5.1.4 Data Processing and Analysis
5.1.5 Machine Learning and Advanced Analytics
5.1.6 Data Visualisation and Reporting
5.1.7 Scalability and Performance
5.1.8 Security and Governance
5.2 Desired Properties of a Big Data System
5.2.1 Robustness and Fault Tolerance
5.2.2 Scalability
5.2.2.1 Scaling Solutions for Big Data
5.2.3 Generalisation
5.2.4 Extensibility
5.2.5 Low Latency Reads and Updates
5.2.6 Minimal Maintenance
5.2.7 Debuggability
5.3 Big Data Processing Systems
5.4 Big Data Processing with Hadoop
5.4.1 MapReduce Paradigm
5.4.2 Hadoop Distributed File System (HDFS)
5.4.3 Yet Another Resource Negotiator (YARN)
5.4.4 Installing Multi-node Hadoop Cluster
5.4.4.1 Prerequisites
5.4.4.2 Downloading and Setting Values
5.4.4.3 Setting Up a Multi-node Cluster
5.4.4.4 Starting the Cluster
5.5 Apache Spark for Big Data Processing
5.5.1 Apache Spark Core
5.5.1.1 MLlib for Machine Learning
5.5.1.2 Spark Streaming for Real-Time Data Processing
5.5.1.3 Spark SQL for Interactive Queries
5.5.1.4 GraphX for Graph Processing
5.5.2 Deploying Spark on YARN
5.5.2.1 Prerequisites
5.5.2.2 Installation
5.5.2.3 Integrate Spark with YARN
5.5.3 Case Study
5.6 Apache Hive for Data Engineering
5.6.1 Deploying Hive on YARN
5.6.2 Installation
5.6.3 Integration of Hive with Hadoop YARN
5.6.4 Case Study
5.7 Apache Sqoop for Data Ingestion
5.7.1 Installation
5.7.2 Configuration of Apache Sqoop
5.7.3 Case Study
5.8 Streaming Data Ingestion with Apache Flume
5.8.1 Installation
5.8.2 Configuration of Apache Flume and Case Study
5.9 Apache Mahout: Distributed Machine Learning for Big Data Analytics
5.9.1 Installation and Configuration of Apache Mahout
5.9.2 Case Study
5.10 Learning Outcomes of the Chapter
6 Big Data Storage Solutions
6.1 Importance of Storage Systems for Big Data
6.2 Traditional Storage Systems for Big Data
6.2.1 Relational Databases
6.2.2 Data Warehouses
6.2.3 Network Attached Storage (NAS)
6.2.4 Storage Area Networks (SAN)
6.3 Big Data Storage Solutions
6.3.1 Hadoop Distributed File System (HDFS)
6.3.2 NoSQL Databases
6.3.3 Cloud Storage Solutions
6.3.4 Object Storage Systems
6.3.5 In-Memory Databases
6.4 Choosing the Right Big Data Storage Solution
6.4.1 Factors to Consider
6.4.2 Scalability and Performance Requirements
6.5 Future Trends in Big Data Storage
6.5.1 Advances in Storage Technologies
6.5.2 Edge Computing and Distributed Storage
6.5.3 AI and Machine Learning in Storage
6.6 Learning Outcomes of the Chapter
7 Big Data Monitoring
7.1 Understanding Monitoring
7.2 Identifying the Types of Monitoring
7.2.1 Proactive Monitoring
7.2.2 Reactive Monitoring
7.3 The Need for Monitoring
7.4 The Components of Monitoring
7.4.1 Alerts/Notifications
7.4.2 Events
7.4.3 Logs
7.4.4 Metrics
7.4.5 Incidence
7.4.6 Debugging Ability
7.5 Available Monitoring Tools for Big Data Systems
7.5.1 DataDog
7.5.2 SequenceIQ
7.5.3 Sematext
7.5.4 Apache Chukwa
7.5.5 Nagios
7.5.6 Ganglia
7.5.7 DMon
7.5.8 SmartMonit
7.6 Learning Outcomes of the Chapter
8 Debugging Big Data Systems for Big Data Analytics
8.1 Debugging for Real-World Performance Problems
8.2 Debugging Steps
8.3 Problems in Big Data Systems
8.3.1 Data Locality
8.3.2 Resource Heterogeneity
8.3.3 Network Issues
8.3.4 Resource Over-Allocation
8.3.5 Unnecessary Speculation
8.3.6 Poor Scheduling Policy
8.4 Root Cause Analysis in Big Data Systems
8.4.1 Importance of Root Cause Analysis in Big Data Analytics
8.4.2 Root Cause Analysis Steps
8.4.3 Tools and Techniques for RCA in Big Data Systems
8.4.4 Challenges and Considerations in RCA for Big Data Systems
8.5 Available Diagnosis Tools for Big Data Systems
8.5.1 Mantri
8.5.2 TACC Stats
8.5.3 DCDB Wintermute
8.5.4 AutoDiagn
8.6 Learning Outcomes of the Chapter
9 Machine Learning for Big Data Analytics
9.1 Harnessing Machine Learning for Big Data Insights
9.2 Supervised Machine Learning for Big Data Analytics
9.2.1 Challenges of Applying Supervised Machine Learning to Big Data Analytics
9.2.2 Pre-processing Big Data for Supervised Machine Learning
9.2.3 Popular Supervised Machine Learning Algorithms for Big Data Analytics
9.3 Unsupervised Machine Learning for Big Data Analytics
9.3.1 K-means Clustering
9.3.2 Hierarchical Clustering
9.3.3 DBSCAN
9.3.4 Gaussian Mixture Models (GMM)
9.3.5 Principal Component Analysis (PCA)
9.3.6 t-SNE
9.3.7 Apriori Algorithm
9.3.8 Isolation Forest
9.3.9 Expectation-Maximisation Algorithm
9.3.10 Spectral Clustering
9.3.11 Mean Shift
9.4 Neural Networks Algorithms
9.4.1 The Components of Neural Networks
9.4.2 The Types of Neural Networks
9.5 Probabilistic Learning for Big Data Analytics
9.5.1 Fundamentals of Probabilistic Learning
9.5.2 Scalable Algorithms for Probabilistic Learning
9.5.3 Applications of Probabilistic Learning in Big Data Analytics
9.6 Performance Evaluation and Optimisation Techniques
9.6.1 Evaluation Metrics for Supervised Machine Learning Algorithms
9.6.2 Cross-Validation Techniques
9.6.3 Hyperparameter Optimisation Techniques
9.7 Learning Outcomes of the Chapter
10 Real-World Big Data Analytics Case Studies
10.1 Government Sector
10.1.1 Enhancing Public Services Through Data-Driven Governance
10.1.2 Predictive Analytics for Smart City Planning
10.1.3 Security and Surveillance: Big Data in Government
10.1.4 Election Forecasting and Voter Analytics
10.2 Healthcare Industry
10.2.1 Revolutionising Healthcare with Big Data Analytics
10.2.2 Precision Medicine: Tailoring Treatments with Data
10.2.3 Disease Outbreak Prediction and Prevention
10.3 Entertainment Industry
10.3.1 Content Personalization and Recommendation Systems
10.3.2 Box Office Predictions and Revenue Optimization
10.3.3 Audience Engagement and Social Media Analytics
10.4 Banking Sector
10.4.1 Risk Assessment and Credit Scoring
10.4.2 Customer Relationship Management (CRM) and Personalization
10.4.3 Fraud Detection and Security
10.4.4 Strategic Decision-Making and Regulatory Compliance
10.5 Retail Industry
10.5.1 Inventory Management and Demand Forecasting
10.5.2 Customer Segmentation and Personalization
10.5.3 Supply Chain Optimization and Vendor Management
10.5.4 Enhanced Customer Experience Through In-Store Analytics
10.6 Energy and Utilities
10.6.1 Grid Management and Smart Grids
10.6.2 Predictive Maintenance and Asset Optimization
10.6.3 Energy Generation and Renewable Integration
10.6.4 Energy Efficiency and Demand Response
10.6.5 Environmental Sustainability and Emissions Reduction
10.7 Learning Outcomes of the Chapter
11 Big Data Analytics in Smart Grids
11.1 Smart Grids
11.2 Big Data Analytics in Smart Grid
11.2.1 Need of Big Data Analytics for Smart Grids
11.2.2 Big Data and Cloud Computing
11.3 Example of Big Data Analytics in Smart Grid
11.3.1 Data Pre-processing
11.3.2 Machine Learning Models
11.3.3 Results and Evaluations
11.4 Learning Outcomes of the Chapter
12 Big Data Analytics in Bioinformatics
12.1 Big Data: Bioinformatic Perspective
12.1.1 Big Data Problems in Bioinformatics
12.2 Frameworks for Big Genome Data
12.3 Biological Databases
12.4 Big Data Analytics in Bioinformatics
12.4.1 Hadoop and MapReduce in Bioinformatics Analytics
12.4.2 Bioinformatics Pipelines and Workflows for Big Data
12.4.3 Analysis Pipelines and Tools with Hadoop (MapReduce) Framework
12.4.4 Deep Learning in Bioinformatics
12.5 Variant Detection in Genome:A Case Study
12.5.1 Genom Data Copying to HDFS
12.5.2 Big Genome Data Processing Using MapReduce
12.6 Learning Outcomes of the Chapter
π SIMILAR VOLUMES
<span>This book presents theory and applications of recently introduced butterfly optimization algorithm (BOA). It also highlights hybridization process in the basic structure of BOA with in-depth analysis of complexity. This book also describes the constraint handling process. The newly introduced
<span>This book presents theory and applications of recently introduced butterfly optimization algorithm (BOA). It also highlights hybridization process in the basic structure of BOA with in-depth analysis of complexity. This book also describes the constraint handling process. The newly introduced
<span>This book presents theory and applications of recently introduced butterfly optimization algorithm (BOA). It also highlights hybridization process in the basic structure of BOA with in-depth analysis of complexity. This book also describes the constraint handling process. The newly introduced
<p><span>This book offers a comprehensive exploration of a new manufacturing mode designed to revolutionize the reliability of interdependent networks. It provides the necessary theoretical foundation and practical implementation guidance for Fractal Manufacturing with Variable Quantum Flow, an inno
<span>This monograph provides a comprehensive and rigorous exposition of the basic concepts and most important modern research results concerning blockchain and its applications. The book includes the required cryptographic fundamentals underpinning the blockchain technology, since understanding of