<p></p><p>This book is intended to present the state of the art in research on machine learning and big data analytics.Β The accepted chaptersΒ covered many themes includingΒ artificial intelligence and data mining applications,Β machine learning and applications, deep learning technology for big data
Big Data Management and Analytics (Future Computing Paradigms and Applications)
- Publisher
- World Scientific
- Year
- 2024
- Tongue
- English
- Leaves
- 288
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
With the proliferation of information, big data management and analysis have become an indispensable part of any system to handle such amounts of data. The amount of data generated by the multitude of interconnected devices increases exponentially, making the storage and processing of these data a real challenge.Big data management and analytics have gained momentum in almost every industry, ranging from finance or healthcare. Big data can reveal key insights if handled and analyzed properly; it has great application potential to improve the working of any industry. This book covers the spectrum aspects of big data; from the preliminary level to specific case studies. It will help readers gain knowledge of the big data landscape.Highlights of the topics covered include description of the Big Data ecosystem; real-world instances of big data issues; how the Vs of Big Data (volume, velocity, variety, veracity, valence, and value) affect data collection, monitoring, storage, analysis, and reporting; structural process to get value out of Big Data and recognize the differences between a standard database management system and a big data management system.Readers will gain insights into choice of data models, data extraction, data integration to solve large data problems, data modelling using machine learning techniques, Spark's scalable machine learning techniques, modeling a big data problem into a graph database and performing scalable analytical operations over the graph and different tools and techniques for processing big data and its applications including in healthcare and finance.
β¦ Table of Contents
Contents
Foreword
Preface
About the Authors
Acknowledgments
List of Figures
List of Tables
Chapter 1 Introduction to Big Data
1.1 Data: The New Oil and the New Soil
1.2 What is Big Data and What are its Sources
1.2.1 Big Data Generated by Machines
1.2.2 Big Data Generated by Humans
1.2.3 Big Data Generated by Organizations
1.3 Characteristics of Big Data
1.3.1 Volume
1.3.2 Velocity
1.3.3 Variety
1.3.4 Veracity
1.3.5 Valence
1.3.6 Value
1.4 Importance of Big Data: Popular Use Cases
1.5 Chapter Summary
References
Chapter 2 Big Data Management and Modeling
2.1 Big Data Management
2.1.1 Data Acquisition/Ingestion
2.1.2 Data Storage
2.1.3 Data Quality
2.1.4 Data Operations
2.1.5 Data Scalability
2.1.6 Data Security
2.2 Challenges in Big Data Management: Case Study
2.3 Big Data Modeling
2.3.1 Data Model Structures
2.3.2 Data Model Operations
2.3.3 Data Model Constraints
2.4 Types of Data Models
2.4.1 Relational Data Model
2.4.2 Semi-Structured Data Model
2.4.3 Unstructured Data Model: Vector Space Data Model
2.4.4 Graph Data Model
2.5 Chapter Summary
References
Chapter 3 Big Data Processing
3.1 Requirements for Big Data Processing
3.2 Big Data Retrieval
3.2.1 Relational Data Query
3.2.2 JSON Data Query Using MongoDB and Aerospike
3.3 Big Data Integration
3.3.1 Big Data Integration Problems
3.4 Big Data Processing Pipeline
3.4.1 Data Transformation Operations in Big Data Processing Pipeline
3.4.1.1 Map and Reduce Operations
3.4.1.2 Aggregation Operations
3.4.1.3 Analytical Operations
3.5 Big Data Management and Processing Using Splunk and Datameer
3.5.1 Splunk
3.5.2 Datameer
3.6 Chapter Summary
References
Chapter 4 Big Data Analytics and Machine Learning
4.1 Introduction to Machine Learning
4.1.1 Machine Learning Techniques
4.2 Machine Learning Process
4.2.1 Acquire
4.2.2 Prepare
4.2.2.1 Exploratory Data Analysis (EDA)
4.2.2.1.1 Summary Statistics
4.2.2.1.2 Visualization Methods
4.2.2.2 Pre-Processing
4.2.2.2.1 Data Cleaning
4.2.2.2.1.1 Addressing Data Quality Issues
4.2.2.2.2 Feature Selection/Engineering
4.2.2.2.3 Feature Transformation
4.2.3 Analyze
4.2.3.1 Classification
4.2.3.1.1. Building and Applying a Classification Model
4.2.3.1.2 Classification Algorithms
4.2.4 Evaluation of Machine Learning Models
4.2.4.1 Evaluation Metrics
4.3 Scaling Up Machine Learning Algorithms
4.4 Chapter Summary
References
Chapter 5 Big Data Analytics Through Visualization
5.1 Graph Definition
5.1.1 Examples of Graph Analytics for Big Data
5.1.1.1 Social Media
5.1.1.2 Biological Networks
5.1.1.3 Personal Information Networks
5.2 Graph Analytics from the Perspective of Big Data
5.3 Techniques for Graph Analytics
5.3.1 Basic Definitions
5.3.2 Path Analytics
5.3.3 Connectivity Analytics
5.3.4 Community Analytics
5.3.5 Centrality Analytics
5.4 Large-Scale Graph Processing
5.4.1 Parallel Programming Model for Graphs
5.5 Chapter Summary
References
Chapter 6 Taming Big Data with Spark 2.0
6.1 Introduction to Spark 2.0
6.1.1 Why Spark 2.0 Replaced Hadoop
6.2 Resilient Distributed Datasets
6.3 Spark 2.0
6.3.1 Language Processing with Spark 2.0
6.3.2 Analysis of Streaming Data with Spark 2.0
6.3.3 Streaming API
6.3.4 Kafka
6.3.4.1 Kafka Streaming
6.3.5 Apache Spark Streaming
6.4 Spark Machine Learning Library
6.5 Chapter Summary
References
Chapter 7 Managing Big Data in Cloud Storage
7.1 Large-Scale Data Storage
7.1.1 Challenges of Storing Large Data in Distributed Systems
7.2 Hadoop Distributed File System (HDFS)
7.2.1 HDFS Permission Checks
7.2.2 HDFS Shell Commands
7.2.3 Chaining and Scripting HDFS Commands
7.2.4 Loading Data on HDFS
7.3 Hadoop User Experience (HUE)
7.3.1 Features of HUE
7.3.2 HUE Components
7.4 Chapter Summary
References
Chapter 8 Big Data in Healthcare
8.1 Digitalization in Healthcare Sector
8.1.1 Use of Big Data in Medical Care
8.2 Big Data in Public Health
8.2.1 Big Data Surveillance Using Machine Learning
8.2.2 Big Data in Public Health Training
8.2.3 Limitations and Open Issues for Big Data While Using Machine Learning in Public Health
8.3 The Four Vβs of Big Data in Healthcare
8.4 Big Data in Genomics
8.5 Architectural Framework
8.5.1 Methodology of Big Data Analytics in Healthcare
8.5.2 Advantages of Big Data Analytics to Healthcare
8.5.3 Challenges of Big Data in Healthcare
8.6 Chapter Summary
References
Chapter 9 Big Data in Finance
9.1 Digitalization in Financial Industry
9.2 Sources of Financial Data
9.3 Challenges of Using Big Data in Financial Research
9.4 Financial Big Data
9.4.1 FBD Management
9.4.2 FBD Analytics
9.5 Theoretical Framework of Big Data in Financial Services
9.6 Popular Use Cases of FBD Analytics
9.7 Chapter Summary
References
Chapter 10 Enabling Tools and Technologies for Big Data Analytics
10.1 Big Data Management and Modeling Tools
10.1.1 Data Modeling Tools
10.1.2 Vector Data Model with Lucene
10.1.3 Graph Data Model with Gephi
10.1.4 Data Management Tools
10.1.4.1 Redis
10.1.4.2 Aerospike
10.1.4.3 AsterixDB
10.1.4.4 Solr
10.1.4.5 Vertica
10.2 Big Data Integration and Processing Tools
10.2.1 Big Data Processing Using Splunk and Datameer
10.3 Big Data Machine Learning Tools
10.3.1 KNIME
10.3.1.1 Exploring Data with KNIME Plots
10.3.1.2 Handling Missing Values in KNIME
10.3.1.3 Classification Using Decision Tree in KNIME
10.3.1.4 Evaluation of Decision Tree in KNIME
10.3.2 Spark MLlib
10.4 Big Data Graph Analytics Tools
10.4.1 Giraph
10.4.2 GraphX
10.4.3 Neo4j
10.5 Chapter Summary
References
Index
π SIMILAR VOLUMES
<p>Within this context, big data analytics (BDA) can be an important tool given that many analytic techniques within the big data world have been created specifically to deal with complexity and rapidly changing conditions. The important task for public sector organizations is to liberate analytics
<p>First designed to generate personalized recommendations to users in the 90s, recommender systems apply knowledge discovery techniques to usersβ data to suggest information, products, and services that best match their preferences. In recent decades, we have seen an exponential increase in the vol
<p><p>This book reviews the theoretical concepts, leading-edge techniques and practical tools involved in the latest multi-disciplinary approaches addressing the challenges of big data. Illuminating perspectives from both academia and industry are presented by an international selection of experts i
<p><p>This book highlights major issues related to big data analysis using computational intelligence techniques, mostly interdisciplinary in nature. It comprises chapters on computational intelligence technologies, such as neural networks and learning algorithms, evolutionary computation, fuzzy sys
<p>This book considers all aspects of managing the complexity of Multimedia Big Data Computing (MMBD) for IoT applications and develops a comprehensive taxonomy. It also discusses a process model that addresses a number of research challenges associated with MMBD, such as scalability, accessibility,