<p>Volume II of this series discusses the technology used to implement a big data analysis capability within a service-oriented organization. It discusses the technical architecture necessary to implement a big data analysis capability, some issues and challenges in big data analysis and utilization
Technologies and Applications for Big Data Value
β Scribed by Edward Curry; SΓΆren Auer; Arne J. Berre; Andreas Metzger; Maria S. Perez; Sonja Zillner
- Publisher
- Springer Nature
- Year
- 2022
- Tongue
- English
- Leaves
- 555
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part "Technologies and Methods" contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part "Processes and Applications" details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems.
β¦ Table of Contents
Preface
Acknowledgements
Contents
Editors and Contributors
About the Editors
Contributors
Technologies and Applications for Big Data Value
1 Introduction
2 What Is Big Data Value?
3 The Big Data Value PPP
4 Big Data Value Association
5 Big Data Value Reference Model
5.1 Chapter Analysis
6 Big Data and AI Pipeline
6.1 Chapter Analysis
7 AI, Data and Robotics Framework and Enablers
7.1 Chapter Analysis
8 Summary
References
Part I Technologies and Methods
Supporting Semantic Data Enrichment at Scale
1 Introduction
2 A Two-Phase Approach
2.1 Scenario: Weather-Based Digital Marketing Analytics
2.2 Semantics as the Enrichment Enabler
2.3 Challenges
2.4 Approach Overview
3 Achieving Semantic Enrichment of Tabular Data at Scale
3.1 The Architectural View
3.2 Achieving Scalability
3.3 Discussion on the Limitations
4 Evaluation of the Approach
5 Related Work
6 Conclusions
References
Trade-Offs and Challenges of Serverless Data Analytics
1 Introduction
1.1 On the Path to Serverless Data Analytics: The ServerMix Model
2 Fundamental Trade-Offs of Serverless Architectures
2.1 Disaggregation
2.2 Isolation
2.3 Simple Scheduling
2.4 Summary
3 Revisiting Related Work: The ServerMix Approach
3.1 Serverless Data Analytics
3.2 Serverless Container Services
4 CloudButton: Towards Serverless Data Analytics
4.1 High-performance Serverless Runtime
4.2 Mutable Shared Data for Serverless Computing
4.3 Novel Serverless Cloud Programming Abstractions: The CloudButton Toolkit
5 Conclusions and Future Directions
References
Big Data and AI Pipeline Framework: Technology Analysis from a Benchmarking Perspective
1 Introduction
2 The Big Data and AI Pipeline Framework
3 Big Data and AI Pipeline Examples for IoT, Graph and SpatioTemporal Data: From the DataBio Project
4 DataBench Pipeline Framework and Blueprints
5 Technical Benchmarks Related to the Big Data and AI Pipeline Framework
6 DataBench Toolbox
7 Conclusions
References
An Elastic Software Architecture for Extreme-Scale Big DataAnalytics
1 Introduction
2 Elastic Software Architecture
2.1 Applicability to the Smart City Domain
2.2 ELASTIC Layered Software Architecture: Overview
2.3 Distributed Data Analytics Platform
2.3.1 Application Programming Interfaces (APIs)
2.3.2 Data Accessibility and Storage
2.4 Computation Orchestrator
2.4.1 Task-based Programming Model
2.4.2 Runtime System
2.5 Non-functional Requirements Tool
2.5.1 Real Time
2.5.2 Energy
2.5.3 Security
2.5.4 Communications Quality
2.6 Hybrid Fog Computing Platform
2.6.1 Cloud: Nuvla
2.6.2 Edge: KonnektBox and NuvlaBox
2.6.3 Fog Components
2.6.4 Distributed Storage
2.6.5 Communication Middleware
3 Conclusions
References
Privacy-Preserving Technologies for Trusted Data Spaces
1 Introduction
2 Tackling Privacy Concerns: The Solution of Federated Learning
2.1 Building Collaborative Models with Federated Learning
2.2 Where Can We Use Federated Machine Learning?
2.3 MUSKETEER's Vision
3 Privacy Operation Modes (POMs)
3.1 POM 1
3.2 POM 2
3.3 POM 3
3.4 POM 4
3.5 POM 5
3.6 POM 6
3.7 Algorithms
4 Setting Your Own Federated Learning Test Case: Technical Perspective
4.1 How to Properly Train Your Machine Learning Model?
4.2 The MUSKETEER Platform
4.3 MUSKETEER Client Connector Components
4.4 Give It a Try
5 Federated Machine Learning in Action: An Efficiency Assessment
5.1 Robots Learn from Each Other
5.2 Defining Quality Data
5.3 Federated Data Sharing Is Better than Playing Alone
6 Use Case Scenario: Improving Welding Quality Assessment Thanks to Federated Learning
6.1 A Twofold Challenge
6.2 Training an Algorithm While Preserving the Sovereignty of Data Providers
6.3 Less Data but More Information
7 Conclusion
References
Leveraging Data-Driven Infrastructure Management to Facilitate AIOps for Big Data Applications and Operations
1 Introduction to Data-Driven Infrastructure
2 Modelling Data-Driven Applications
2.1 Application Modelling Concepts
2.2 Data-Driven Infrastructure Management Concepts
3 Application Performance Modelling
4 Metric Collection and Quality of Service Monitoring
5 Automated Decision Making
6 Operationalizing Application Alterations
6.1 Types of Alteration Actions
6.2 Considerations When Modelling Alteration Actions
6.3 Alteration Actions in BigDataStack
7 Example Use-Case: Live Grocery Recommendation
8 Conclusions
References
Leveraging High-Performance Computing and Cloud Computing with Unified Big-Data Workflows: The LEXIS Project
1 High-Performance Computing, Cloud and Big Data in Science, Research and Industryβand LEXIS
2 LEXIS Basics: Integrated Data-Heavy Workflows on Cloud/HPC
2.1 LEXIS Vision
2.2 LEXIS Hardware Resources
2.3 LEXIS Orchestration
2.4 LEXIS Pilots
2.5 Billing for LEXIS Usage
3 Secure Identity and Access Management in LEXIS β LEXIS AAI
3.1 Why an Own LEXIS AAI?
3.2 Realisation of LEXIS AAI Following Zero-Trust' Security Model
3.3 RBAC Matrix and Roles in LEXIS
4 Big Data Management for Workflows β LEXIS DDI
4.1 Concept of Unified Data Management in LEXIS
4.2 Choice of DDI System, Integration of Distributed Storage Systems
4.3 Reflecting AAI Roles in the DDI
4.4 Immersion in the EuropeanFAIR' Research Data Landscape
4.5 APIs of the LEXIS DDI, and Data Transfer Within Workflows
5 The LEXIS Portal: A User-friendly Entry Point to the `World of HPC/Cloud/Big Data'
5.1 Portal: Concept and Basic Portal Capabilities
5.2 Workflow and Data Management and Visualisation via the Portal
6 Conclusions
References
Part II Processes and Applications
The DeepHealth Toolkit: A Key European Free and Open-Source Software for Deep Learning and Computer Vision Ready to Exploit Heterogeneous HPC and Cloud Architectures
1 Context: The European AI and HPC Landscape and the DeepHealth Project
2 A General Overview of the DeepHealth Toolkit
3 The European Distributed Deep Learning Library
4 The European Computer Vision Library
5 The Back End and the Front End
6 Complements to Leverage HPC/Cloud Infrastructures
6.1 Use of Docker Images Orchestrated by Kubernetes
6.2 COMPSs
6.3 StreamFlow
7 Conclusions
References
Applying AI to Manage Acute and Chronic Clinical Condition
1 Overview
2 Intensive Care Medicine and Physiological Data
2.1 Physiological Data Acquisition
2.1.1 Time Series Data
2.1.2 Publicly Available Datasets
3 Artificial Intelligence in ICU
3.1 Challenges
3.1.1 Data Integrity
3.1.2 Alert Fatigue
3.1.3 Bringing AI Systems to Clinical Trials
3.2 AI Methodology
3.2.1 Expert Systems
3.2.2 Decision Trees
3.2.3 Ensemble Methods
3.2.4 Neural Networks
4 Use Case: Prediction of Tidal Volume to Promote Lung Protective Ventilation
4.1 The ATTITUDE Study
5 Future of ML and AI in ICU
References
3D Human Big Data Exchange Between the Healthcare and Garment Sectors
1 Introduction
2 The Process to Harmonize 3D Datasets
2.1 The Anthropometric Data Dictionary
2.2 3D Templates
3 Anonymization and Privacy
3.1 The Anonymization of Body Scans
3.2 Architectural Solution Ensuring Data Security and Privacy at Hospitals
4 The Secure Exchange of 3D Data with Blockchain
5 The Application of BodyPass Results in Healthcare: Obesity
5.1 CT Image Processing
5.2 Data Query Processing
6 The Application of the BodyPass Ecosystem in the Apparel Industry
6.1 The Use of 3D Data for Designing Sports Technical Clothing
6.2 Use of 3D Personal Data in Online Services for the Apparel Industry
6.2.1 Manufacturing
6.2.2 Design
6.2.3 Marketing and Operations
7 Conclusions
References
Using a Legal Knowledge Graph for Multilingual Compliance Services in Labor Law, Contract Management, and GeothermalEnergy
1 Introduction: Building the Legal Knowledge Graph for Smart Compliance Services in Multilingual Europe
2 The Lynx Services Platform: LySP
3 Lynx Compliance Solutions
3.1 Compliance Services in Labor Law (Cuatrecasas, Spain)
3.2 Smart Contract Management (Cybly, Austria)
3.3 Compliance Solution for Geothermal Energy (DNV GL, the Netherlands)
4 Key Findings, Challenges, and Outlook: LySPβThe Lynx Services Platform
References
Big Data Analytics in the Banking Sector: Guidelines and Lessons Learned from the CaixaBank Case
1 Introduction
2 Challenges and Requirements for Big Data in the Banking Sector
3 Use Cases Description and Experiments' Definition: Technical and Business KPIs
3.1 Analysis of Relationships Through IP Addresses
3.2 Advanced Analysis of Bank Transfer Payment in Financial Terminals
3.3 Enhanced Control of Customers in Online Banking
4 I-BiDaaS Solutions for the Defined Use Cases
4.1 Analysis of Relationships Through IP Addresses
4.1.1 Architecture
4.1.2 Data Generation
4.1.3 Data Analytics
4.1.4 Visualisations
4.1.5 Results
4.2 Advanced Analysis of Bank Transfer Payment in Financial Terminals
4.2.1 Architecture
4.2.2 Data Analytics
4.2.3 Visualisations
4.2.4 Results
4.3 Enhanced Control of Customers in Online Banking
4.3.1 Architecture
4.3.2 Data Analytics
4.3.3 Visualisations
4.3.4 Results
4.4 Relation to the BDV Reference Model and the BDV Strategic Research and Innovation Agenda (SRIA)
5 Lessons Learned, Guidelines and Recommendations
6 Conclusion
References
Data-Driven Artificial Intelligence and Predictive Analytics for the Maintenance of Industrial Machinery with Hybrid and Cognitive Digital Twins
1 Introduction
2 Digital Twin Pipeline and COGNITWIN Toolbox
3 Maintenance of Industrial Machinery and Related Work
4 Maintenance of Spiral Welded Pipe Machinery
5 Components and Digital Twin Pipeline for Steel Pipe Welding
6 COGNITWIN Digital Twin Pipeline Architecture
6.1 Digital Twin Data Acquisition and Collection
6.2 Digital Twin Data Representation
6.3 Hybrid and Cognitive Digital Twins
6.4 Digital Twin Visualisation and Control
7 Conclusions
References
Big Data Analytics in the Manufacturing Sector: Guidelines and Lessons Learned Through the Centro Ricerche FIAT (CRF) Case
1 Introduction
2 Requirements for Big Data in the Manufacturing Sector
3 Use Cases Description and Experiments' Definition: Technical and Business KPIs
3.1 Production Process of Aluminium Die-Casting
3.2 Maintenance and Monitoring of Production Assets
4 I-BiDaaS Solutions for the Defined Use Cases
4.1 Production Process of Aluminium Die-Casting
4.1.1 Architecture
4.1.2 Data Analytics
4.1.3 Visualizations
4.1.4 Results
4.1.5 Synthetic Data Generation and Quality Assessment
4.2 Maintenance and Monitoring of Production Assets
4.2.1 Architecture
4.2.2 Data Analytics
4.2.3 Visualizations
4.2.4 Results
5 Discussion
5.1 Lessons Learned, Challenges and Guidelines
5.2 Connection to BDV Reference Model, BDV SRIA, and AI, Data and Robotics SRIDA
6 Conclusion
References
Next-Generation Big Data-Driven Factory 4.0 Operations and Optimization: The Boost 4.0 Experience
1 Introduction
1.1 Big Data-Centric Factory 4.0 Operations
2 Mass Injection Moulding 4.0 Smart Digital Operations: The Philips Trial
2.1 Data-Driven Digital Shopfloor Automation Process Challenges
2.2 Big Data-Driven Shopfloor Automation Value for Injection Moulding 4.0
2.3 Implementation of Big Data-Driven Quality Automation Solutions for Injection Moulding 4.0
2.4 Big Data Shopfloor Quality Automation Large-Scale Trial Performance Results
2.5 Observations and Lessons Learned
3 Production Data Platform Trials for Intelligent Maintenance at BENTELER Automotive
3.1 Data-Driven Digital Shopfloor Maintenance Process Challenges
3.2 Implementation of a Big Data Production Platform for Intelligent Maintenance in Automotive
3.3 Big Data-Driven Intelligent Maintenance Large-Scale Trial Performance Results
3.4 Observations and Lessons Learned
4 Predictive Maintenance and Quality Control on Autonomous and Flexible Production Lines: The FCA Trial
4.1 Data-Driven Digital Process Challenges
4.2 Big Data Manufacturing Process Value in Autonomous Assembly Lines in Automotive
4.3 Implementation of Big Data Solutions
4.4 Large-Scale Trial Performance Results
4.5 Observations and Lessons Learned
5 Conclusions
References
Big Data-Driven Industry 4.0 Service Engineering Large-Scale Trials: The Boost 4.0 Experience
1 Introduction
2 Boost 4.0 Universal Big Data Reference Model
2.1 Boost 4.0 Objectives
2.2 Boost 4.0 Lighthouse Factories and Large-Scale Trials
2.3 Boost 4.0 Universal Big Data Reference Architecture
2.4 Mapping Boost 4.0 Large-Scale Trials to the Digital Factory Alliance (DFA) Service Development Reference Architecture (SD-RA)
3 Big Data-Driven Intra-Logistics 4.0 Process Planning Powered by Simulation in Automotive: Volkswagen Autoeuropa Trial
3.1 Big Data-Driven Intra-Logistic Planning and Commissioning 4.0 Process Challenges
3.2 Big Data Intra-Logistic Planning and Commissioning Process Value
3.3 Big-Data Pipelines for Intra-Logistic Planning and Commissioning Solutions in Automotive
3.4 Large-Scale Trial of Big Data-Driven Intra-Logistic Planning and Commissioning Solutions for Automotive
3.5 Observations and Lessons Learned
4 From Sheep to Shop Supply Chain Track and Trace in High-End Textile Sector: Piacenza Business Network Trial
4.1 Data-Driven Textile Business Network Tracking and Tracing Challenges
4.2 Supply Chain Track and Trace Process Value
4.3 Distributed Ledger Implementation for Supply Chain Visibility
4.4 Observations and Lessons Learned
5 Conclusions
References
Model-Based Engineering and Semantic Interoperability for Trusted Digital Twins Big Data Connection Across the ProductLifecycle
1 Introduction
2 Boost 4.0 Testbed for Digital Twin Data Continuity Across the Product Lifecycle
3 FILL GmbH Model-Based Machine Tool Engineering and Big Data-Driven Cybernetics Large-Scale Trial
3.1 Big Data-Driven Model-Based Machine Tool Engineering Business Value
3.2 Implementation of Big Data-Driven Machine Tool Cybernetics
3.3 Large-Scale Trial Performance Results
3.4 Observations and Lessons Learned
4 +GF+ Trial for Big Data-Driven Zero Defect Factory 4.0
4.1 Value of a Quality Information Framework for Dimensional Control and Semantic Interoperability
4.2 Value of Semantically Driven Big Data Zero Defect Factory
4.3 Semantic Big Data Pipeline Implementation for Zero Defect Factories
4.4 Large-Scale Trial Performance Results
4.5 Observations and Lessons Learned
5 Trimek Trial for Zero Defect Manufacturing (ZDM) Powered by Massive Metrology 4.0
5.1 Massive 3D Point Cloud Analytics and Metrology 4.0 Challenges
5.2 Implementation of Massive Metrology 4.0 Big Data Workflow
5.3 Large-Scale Trial Performance Results
6 Conclusions
References
A Data Science Pipeline for Big Linked Earth Observation Data
1 Introduction
2 The Green City Use Case
2.1 Data Sources
2.2 Copernicus Sentinel Data
2.3 Other Geospatial Data
3 The Data Science Pipeline
3.1 Ingestion, Processing, Cataloguing and Archiving
3.2 Dataset Discovery
3.3 Knowledge Discovery
3.4 Transformation into RDF
3.5 Interlinking
3.6 Publishing
3.7 Storage and Querying
3.8 Search/Browse/Explore/Visualize
4 Implementing the Green City Use Case Using Linked Geospatial Data Software
4.1 Ingestion
4.2 Dataset Discovery
4.3 Knowledge Discovery
4.4 Transformation into RDF
4.5 Storage/Querying
4.6 Publishing
4.7 Interlinking
4.8 Exploration and Visualization
5 Summary
References
Towards Cognitive Ports of the Futures
1 Introduction
2 Challenges for Port-Oriented Cognitive Services
3 Scalability
4 International Data Spaces Architecture Reference Model
5 Interoperability
5.1 Introduction
5.2 Semantic Interoperability
5.3 Application Programming Interfaces for Serverless Platforms
6 Standardization
7 Business Outcomes and Challenges
References
Distributed Big Data Analytics in a Smart City
1 Introduction
1.1 Processing Distributed Data Sources
1.1.1 The CLASS Software Architecture
1.2 Big Data Analytics on Smart Cities: Applications
2 Big Data Analytics Algorithms
2.1 Description of Data Processing Algorithms
2.1.1 In-Vehicle Sensor Fusion
2.1.2 Street Camera Object Detection
2.1.3 Object Tracking
2.1.4 Data Deduplication
2.1.5 Trajectory Prediction
2.1.6 Warning Area Filtering
2.1.7 Collision Detection
2.1.8 Vehicles Emissions Model
2.1.9 Data Aggregation: Data Knowledge Base
2.1.10 Visualization
2.1.11 Predictive Models
2.2 Integration Toward a Relevant Smart City Use-Case
3 Distribution of Big Data Analytics Across the City Infrastructure
3.1 Smart City Infrastructure
3.1.1 City Infrastructure: MASA
3.1.2 Vehicle Infrastructure
3.2 Big Data Analytics Distribution
3.2.1 Real Scenarios for Collision Avoidance
3.3 Real-Time Requirements
4 Conclusions
References
Processing Big Data in Motion: Core Components and System Architectures with Applications to the Maritime Domain
1 Challenges of Big Streaming Data
2 Core Components and System Architectures
2.1 The Case for Data Synopses
2.2 Distributed Online Machine Learning and Data Mining
2.3 Distributed and Online CEF
2.4 Geo-distributed Cross-Platform Optimisation
3 Real-Life Application to a Maritime Use Case
3.1 Background on Maritime Situation Awareness (MSA)
3.2 Building Blocks of MSA Workflows in the Big Data Era
3.2.1 Maritime Data Sources
3.2.2 Maritime Data Fusion
3.2.3 SDE Operator For Trajectory Simplification
3.2.4 Complex Maritime Event Processing
3.2.5 ML-Based Anomaly Detection
3.2.6 MSA Workflow Optimisation
4 Future Research and Development Directions
References
Knowledge Modeling and Incident Analysis for Special Cargo
1 Introduction
2 Special Cargo Ontology
2.1 Methodology and Principles for Ontology Construction
2.2 Requirement Workflow
2.3 Analysis Workflow
2.4 Design Workflow
2.5 Implementation Workflow
2.6 Test Workflow
2.7 Evaluation Workflow
2.8 Summary
3 Case Study: Lane Analysis and Route Advisor
4 Natural Language Processing for Incident Handling
4.1 Random Forest Decision Trees
4.2 Implementation
4.3 Results and Discussion
5 Statistics and Machine Learning to Improve Risk Assessment
5.1 Logistic Regression
5.2 Methodology
5.3 Statistical Implementation
5.4 Recursive Feature Elimination
5.5 Results
5.6 Discussion
5.7 Comparison of the Statistical and RFE Models
6 Summary, Challenges, and Conclusion
References
π SIMILAR VOLUMES
<span>This book constitutes the refereed post-conference proceedings of the 10</span><span><sup>th</sup></span><span> International Conference on Big Data Technologies and Applications, BDTA 2020, and the 13</span><span><sup>th</sup></span><span> International Conference on Wireless Internet, WiCON
<p>The objective of this book is to introduce the basic concepts of big data computing and then to describe the total solution of big data problems using HPCC, an open-source computing platform.<br>The book comprises 15 chapters broken into three parts. The first part, <i>Big Data Technologies</i>,
Volume II of this series discusses the technology used to implement a big data analysis capability within a service-oriented organization. It discusses the technical architecture necessary to implement a big data analysis capability, some issues and challenges in big data analysis and utilization th
With the advent of such advanced technologies as cloud computing, the Internet of Things, the Medical Internet of Things, the Industry Internet of Things and sensor networks as well as the exponential growth in the usage of Internet-based and social media platforms, there are enormous oceans of data
<p><p>This edited book will serve as a source of reference for technologies and applications for multimodality data analytics in big data environments. After an introduction, the editors organize the book into four main parts on sentiment, affect and emotion analytics for big multimodal data; unsupe