Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Hardware Architectures

✍ Scribed by Sudeep Pasricha (editor), Muhammad Shafique (editor)

Publisher: Springer
Year: 2023
Tongue: English
Leaves: 418
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book presents recent advances towards the goal of enabling efficient implementation of machine learning models on resource-constrained systems, covering different application domains. The focus is on presenting interesting and new use cases of applying machine learning to innovative application domains, exploring the efficient hardware design of efficient machine learning accelerators, memory optimization techniques, illustrating model compression and neural architecture search techniques for energy-efficient and fast execution on resource-constrained hardware platforms, and understanding hardware-software codesign techniques for achieving even greater energy, reliability, and performance benefits.

✦ Table of Contents

Preface
Acknowledgments
Contents
Part I Efficient Hardware Acceleration for Embedded Machine Learning
Massively Parallel Neural Processing Array (MPNA): A CNN Accelerator for Embedded Systems
1 Introduction
1.1 State of the Art and Their Limitations
1.2 Motivational Case Study and Research Challenges
1.3 Our Novel Contributions
2 Preliminaries
2.1 Convolutional Neural Networks (CNNs)
2.2 Systolic Array (SA)
3 Our Design Methodology
4 Dataflow Optimization
4.1 Data Reuse Analysis
4.2 Proposed Dataflow Patterns
5 The MPNA Architecture
5.1 Overview of Our MPNA Architecture
5.2 Heterogeneous Systolic Arrays (SAs)
5.3 Accumulation Block
5.4 Pooling-and-Activation Block
5.5 Hardware Configuration
6 Evaluation Methodology
7 Experimental Results and Discussion
7.1 Systolic Array Evaluation
7.2 Comparison with Other CNN Accelerators
7.2.1 Performance Evaluation
7.2.2 Power and Energy Consumption
7.2.3 Area Footprint
8 Conclusion
References
Photonic NoCs for Energy-Efficient Data-Centric Computing
1 Introduction
2 Related Work
3 Data Formats and Approximations
3.1 Floating-Point Data
3.2 Integer Data
3.3 Applications Considered for Approximation
4 Crosstalk and Optical Loss in PNoCs
5 ARXON Framework: Overview
5.1 Loss-Aware Power Management for Approximation
5.2 Relaxed Crosstalk Mitigation Strategy
5.3 Relaxed MR Tuning Strategy
5.4 Integrating Multilevel Signaling
6 ARXON Evaluation and Simulation Results
6.1 Simulation Setup
6.2 Impact of ARXON on Considered Applications
6.3 MR Tuning Relaxation-Based Analyses
6.4 Power Dissipation Breakdown
7 Conclusion
References
Low- and Mixed-Precision Inference Accelerators
1 Introduction
2 Background: Extreme Quantization and Network Variety
2.1 Neural Network Architecture Design Space
2.2 Binary Quantization
2.3 Ternary Quantization
2.4 Mixed-Precision
3 Accelerators for Low- and Mixed-Precision Inference
3.1 Characterization Criteria
3.1.1 Flexibility
3.1.2 Performance Characteristics
3.2 Five Low- and Mixed-Precision Accelerators Reviewed
3.2.1 XNOR Neural Engine (XNE)
3.2.2 ChewBaccaNN
3.2.3 Completely Unrolled Ternary Inference Engine (CUTIE)
3.2.4 Binary Neural Network Accelerator in 10-nm FinFet
3.2.5 BrainTTA
4 Comparison and Discussion
5 Summary and Conclusions
References
Designing Resource-Efficient Hardware Arithmeticfor FPGA-Based Accelerators Leveraging Approximationsand Mixed Quantizations
1 Introduction
2 Integer Arithmetic for Embedded Machine Learning
2.1 Fixed-Point Representation
2.2 Accurate Custom Signed Multipliers
2.3 Approximate Custom Signed Multipliers
2.4 Comparison of Multiplier Designs
3 Arithmetic for Novel Number Representation Schemes
3.1 Posit Number Representation Scheme-Based Arithmetic
3.2 Fixed-Point-Based Posit Arithmetic
3.3 Results
4 Conclusion
References
Efficient Hardware Acceleration of Emerging Neural Networks for Embedded Machine Learning: An Industry Perspective
1 Introduction
2 Background
2.1 Computer Vision
2.1.1 Convolutional Neural Networks
2.1.2 Emerging Deep Learning Architectures for Vision
2.2 Natural Language Processing
2.3 Deep Learning Based Recommendation Systems
2.4 Graph Neural Networks
3 Common Layers Across Neural Networks
4 Efficient Implementation of Emerging NN Operators
4.1 Efficient Mapping and Acceleration of Special DNN Layers
4.1.1 First Layer
4.1.2 Eltwise Layer
4.1.3 Fully Connected Layers
4.1.4 Maxpool/Average Pool
4.1.5 Activation Functions
4.2 Efficient Mapping and Acceleration of Layers in New Neural Networks
4.2.1 Channel Separable Depthwise Convolution Layers
4.2.2 Group Convolution Layers
4.2.3 Transposed Convolution/Deconvolution Layers
4.2.4 Dilated Convolution/Atrous Convolution Layers
5 Efficient Mapping and Acceleration of Layers in Emerging Neural Networks
5.1 Transformers
5.1.1 Input Embedding and Positional Encoding
5.1.2 Multi-headed Self-Attention
5.1.3 Point-Wise Feed-Forward
5.1.4 Enabling Transformers on the Edge
5.1.5 Summary of Design Considerations for Transformers
5.2 Graph Neural Networks
5.2.1 Compute Phases of GNN
5.2.2 Design Considerations
5.2.3 GNN Data Flow
5.2.4 Additional Opportunities for Hardware Acceleration
6 Future Trends: Networks and Applications
References
Part II Memory Design and Optimization for Embedded Machine Learning
An Off-Chip Memory Access Optimization for Embedded Deep Learning Systems
1 Introduction
1.1 Overview
1.2 Design Constraints for Embedded DL Systems
2 Preliminaries
2.1 Deep Learning
2.2 Hardware Accelerators for Embedded DL Systems
2.3 DRAM Fundamentals
2.3.1 Organization
2.3.2 Operations
3 DRAM Access Optimization for Embedded DL Systems
3.1 Overview
3.2 Reduction of DRAM Accesses
3.3 Employment of Low Latency DRAM
3.3.1 Devising the Data Mapping Policy in DRAM
3.3.2 Analysis for the EDP of DRAM Accesses
4 Experimental Evaluations
4.1 Reduction of DRAM Accesses
4.2 Impact of Different DRAM Mapping Policies on EDP
4.3 Further Discussion
5 Conclusion
References
In-Memory Computing for AI Accelerators: Challengesand Solutions
1 Introduction
1.1 Machine Learning in Modern Times
1.2 Hardware Implications of DNNs
2 In-Memory Computing Architectures
2.1 RRAM/SRAM-Based IMC Architectures
2.1.1 RRAM Device
2.1.2 IMC Architecture
2.1.3 Challenges with IMC Architectures
3 Interconnect Challenges and Solutions
3.1 Interconnect for IMC-Based Planar AI Accelerators
3.2 On-Package Communication for Chiplet-Based AI Accelerators
3.3 Interconnect for Monolithic 3D (M3D)-Based AI Accelerators
4 Evaluation Frameworks for IMC-Based AI Accelerator
4.1 Evaluation Frameworks for Monolithic AI Accelerators
4.2 Evaluation Framework for Chiplet-Based AI Accelerators
5 Conclusion
References
Efficient Deep Learning Using Non-volatile Memory Technology in GPU Architectures
1 Introduction
2 Related Work
3 Methodology
3.1 Circuit-Level NVM Characterization
3.2 Microarchitecture-Level Cache Design Exploration
3.3 Architecture-Level Iso-Capacity Analysis
3.4 Architecture-Level Iso-Area Analysis
4 Experimental Results
4.1 Performance and Energy Results for Iso-Capacity
4.2 Performance and Energy Results for Iso-Area
4.3 Scalability Analysis
5 Discussion
6 Conclusion
References
SoC-GANs: Energy-Efficient Memory Management for System-on-Chip Generative Adversarial Networks
1 Introduction
2 Background: DCGAN Hardware Acceleration and Its Design Challenges
3 Memory-Efficient Hardware Architecture for Generative Adversarial Networks (GANs)
3.1 2-D Distributed On-Chip Memory Array
3.2 Data Re-Packaging Unit
3.2.1 Pixel Row Index Computation Block
3.2.2 Pixel Column Index Computation Block
3.2.3 RAM-Block Index Computation
3.2.4 RAM-Channel Index Computation
3.2.5 RAM Index Computation
3.2.6 SPRAM Address Computation
4 Results and Discussion
4.1 Experimental Setup
4.2 Processing Time Evaluation
4.3 Memory Accesses Evaluation
4.4 Area Utilization Evaluation
5 Conclusion
References
Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network Inference
1 Introduction
2 Background
2.1 Deep Neural Networks
2.2 DRAM Organization and Operation
2.3 Reducing DRAM Parameters
3 EDEN Framework
3.1 EDEN: A High-Level Overview
3.2 Boosting DNN Error Tolerance
3.3 DNN Error Tolerance Characterization
3.4 DNN to DRAM Mapping
3.5 DNN Inference with Approximate DRAM
4 Enabling EDEN with Error Models
5 Memory Controller Support
6 DNN Accuracy Evaluation
6.1 Methodology
6.2 Accuracy Validation of the Error Models
6.3 Error Tolerance of Baseline DNNs
6.4 Curricular Retraining Evaluation
6.5 Coarse-Grained DNN Characterization and Mapping
6.6 Fine-Grained DNN Characterization and Mapping
7 System-Level Evaluation
7.1 CPU Inference
7.2 Accelerators
8 Related Work
9 Discussion and Challenges
9.1 Discussion
9.2 Challenges
10 Conclusion
References
Part III Emerging Substrates for Embedded Machine Learning
On-Chip DNN Training for Direct Feedback Alignment in FeFET
1 Introduction
2 Background
2.1 DNN Training Methods
2.2 DNN Acceleration in Resistive Memory
2.3 Ferroelectric Field-Effect Transistor
3 An FeFET-Based DNN Training Accelerator Architecture for Direct Feedback Alignment
3.1 Overall Architecture
3.2 FeFET Switching Characterization
3.3 FeFET-Based Random Number Generator
3.4 Low-Power ADC Based on FE Layer Tuning
3.5 Pipeline
4 Evaluation
4.1 Experimental Setup
4.2 Experimental Results
5 Conclusion
References
Platform-Based Design of Embedded Neuromorphic Systems
1 Introduction
2 Platform-Based Design Methodology
3 Software Design Space Exploration
3.1 Performance-Oriented DSE
3.2 Energy-Oriented DSE
3.3 Reliability-Oriented DSE
4 Summary
References
Light Speed Machine Learning Inference on the Edge
1 Introduction
2 Background and Related Work
3 Overview of Noncoherent Optical Computation
4 Binarized Neural Networks
5 ROBIN Architecture
5.1 Tuning Circuit Design
5.2 Device-Level Optimization
5.2.1 Fabrication-Process Variation Resilience
5.2.2 Multi-Bit Precision MRs
5.2.3 Single-Bit MRs
5.2.4 Broadband MRs
5.3 Architecture Design
5.3.1 Decomposing Vector Operations
5.3.2 Vector Dot Product (VDP) Unit Design
5.3.3 Optical Wavelength Reuse in VDP Units
5.3.4 ROBIN Pipelining and Scheduling
6 Experiments and Results
6.1 Simulation Setup
6.2 Fabrication-Process Variation Analysis
6.3 ROBIN Architecture Optimization Analysis
6.4 Comparison with State-of-the-Art Optical and Electronic DNN/BNN Accelerators
6.5 Comparison to CPU-Based Inference
7 Conclusion
References
Low-Latency, Energy-Efficient In-DRAM CNN Acceleration with Bit-Parallel Unary Computing
1 Introduction
2 Concept of Bit-Parallel Rate-Coded Unary (Stochastic) Computing
3 ATRIA: Overview
3.1 Structure of a PE in ATRIA
3.2 Functioning of a PE in ATRIA
3.3 System Integration and Controller Design
3.4 Overhead Analysis
4 Evaluation
4.1 Modeling and Setup for Evaluation
4.2 Precision Error and Accuracy Results
4.3 Per-MAC Latency Results
4.4 CNN Inference Performance Results
5 Conclusions
References
Index

📜 SIMILAR VOLUMES

Embedded Machine Learning for Cyber-Phys

📁 Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Software Optimizations and Hardware/Software Codesign

✍ Sudeep Pasricha (editor), Muhammad Shafique (editor) 📂 Library 📅 2023 🏛 Springer 🌐 English

<span>This book presents recent advances towards the goal of enabling efficient implementation of machine learning models on resource-constrained systems, covering different application domains. The focus is on presenting interesting and new use cases of applying machine learning to innovative appli

Embedded Machine Learning for Cyber-Phys

📁 Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Use Cases and Emerging Challenges

✍ Sudeep Pasricha (editor), Muhammad Shafique (editor) 📂 Library 📅 2023 🏛 Springer 🌐 English

<p><span>This book presents recent advances towards the goal of enabling efficient implementation of machine learning models on resource-constrained systems, covering different application domains. The focus is on presenting interesting and new use cases of applying machine learning to innovative ap

IoT Streams for Data-Driven Predictive M

📁 IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning (Communications in Computer and Information Science)

✍ Joao Gama (editor), Sepideh Pashami (editor), Albert Bifet (editor), Moamar Saye 📂 Library 📅 2021 🏛 Springer 🌐 English

<span>This book constitutes selected papers from the Second International Workshop on IoT Streams for Data-Driven Predictive Maintenance, IoT Streams 2020, and First International Workshop on IoT, Edge, and Mobile for Embedded Machine Learning, ITEM 2020, co-located with ECML/PKDD 2020 and held in S

Machine Learning for Cyber Physical Syst

📁 Machine Learning for Cyber Physical Systems

✍ Beyerer Jürgen, Niggemann Oliver, Kühnert Christian (eds.) 📂 Library 📅 2016 🏛 Springer 🌐 English

The work presents new approaches to Machine Learning for Cyber Physical Systems, experiences and visions. It contains some selected papers from the international Conference ML4CPS – Machine Learning for Cyber Physical Systems, which was held in Karlsruhe, September 29th, 2016. Cyber Physical System

Intelligent Workloads at the Edge: Deliv

📁 Intelligent Workloads at the Edge: Deliver cyber-physical outcomes with data and machine learning using AWS IoT Greengrass

✍ Indraneel Mitra, Ryan Burke 📂 Library 📅 2022 🏛 Packt Publishing 🌐 English

<p><b>Explore IoT, data analytics, and machine learning to solve cyber-physical problems using the latest capabilities of managed services such as AWS IoT Greengrass and Amazon SageMaker</b></p><h4>Key Features</h4><ul><li>Accelerate your next edge-focused product development with the power of AWS I

Intelligent Workloads at the Edge: Deliv

📁 Intelligent Workloads at the Edge: Deliver cyber-physical outcomes with data and machine learning using AWS IoT Greengrass

✍ Indraneel Mitra, Ryan Burke 📂 Library 📅 2022 🏛 Packt Publishing 🌐 English