𝔖 Scriptorium
✦   LIBER   ✦

📁

Towards Heterogeneous Multi-core Systems-on-Chip for Edge Machine Learning: Journey from Single-core Acceleration to Multi-core Heterogeneous Systems

✍ Scribed by Vikram Jain, Marian Verhelst


Publisher
Springer
Year
2023
Tongue
English
Leaves
199
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


This book explores and motivates the need for building homogeneous and heterogeneous multi-core systems for machine learning to enable flexibility and energy-efficiency. Coverage focuses on a key aspect of the challenges of (extreme-)edge-computing, i.e., design of energy-efficient and flexible hardware architectures, and hardware-software co-optimization strategies to enable early design space exploration of hardware architectures. The authors investigate possible design solutions for building single-core specialized hardware accelerators for machine learning and motivates the need for building homogeneous and heterogeneous multi-core systems to enable flexibility and energy-efficiency. The advantages of scaling to heterogeneous multi-core systems are shown through the implementation of multiple test chips and architectural optimizations.



✦ Table of Contents


Preface
Acknowledgments
Contents
List of Abbreviations
List of Figures
List of Tables
1 Introduction
1.1 Machine Learning at the (Extreme) Edge
1.1.1 Applications
1.1.2 Algorithms
1.1.3 Hardware
1.2 Open Challenges for ML Acceleration at the (Extreme) Edge
1.3 Book Contributions
2 Algorithmic Background for Machine Learning
2.1 Support Vector Machines
2.2 Deep Learning Models
2.2.1 Neural Networks
2.2.2 Training
2.2.3 Inference: Neural Network Topologies
2.2.4 Model Compression
2.3 Feature Extraction
2.4 Conclusion
3 Scoping the Landscape of (Extreme) Edge Machine Learning Processors
3.1 Hardware Acceleration of ML Workloads: A Primer
3.1.1 Core Mathematical Operation
3.1.2 General Accelerator Template
3.2 Evaluation Metrics
3.3 Survey of (Extreme) Edge ML Hardware Platforms
3.4 Evaluating the Surveyed Hardware Platforms
3.5 Insights and Trends
3.6 Conclusion
4 Hardware–Software Co-optimization Through Design Space Exploration
4.1 Motivation
4.2 Exploration Methodology
4.2.1 ZigZag
4.2.2 Post-Processing of ZigZag's Results
4.3 DNN Workload Comparison
4.3.1 Exploration Setup
4.3.2 Visualization of the Complete Trade-Off Space
4.3.3 Impact of HW Architecture on Optimal Workload
4.3.4 Impact of Workload on Optimal HW Architecture
4.4 Conclusion
5 Energy-Efficient Single-Core Hardware Acceleration
5.1 Motivation
5.2 Metrics for Hardware Optimization
5.3 State-of-the-Art in Object Detection on FPGA
5.4 Cost-Aware Algorithmic Optimization
5.4.1 Object Detection Algorithms
5.4.2 Quantization of Tiny-YOLOv2
Post-training Quantization
Quantization-Aware Training
5.5 Cost-Aware Architecture Optimization
5.5.1 Hardware Mapping of Convolutional Layers
5.5.2 Hardware Architecture of the Accelerator
5.6 Cost-Aware System Optimization
5.6.1 Data Communication Architecture
5.6.2 Tiling Strategy
5.7 Implementation Results
5.8 Conclusion
6 TinyVers: A Tiny Versatile All-Digital Heterogeneous Multi-core System-on-Chip
6.1 Motivation
6.2 Algorithmic Background
6.2.1 Convolution and Dense Operation
6.2.2 Deconvolution
6.2.3 Support Vector Machines (SVMs)
6.3 TinyVers Hardware Architecture
6.3.1 Smart Sensing Modes for TinyML
6.3.2 Power Management
6.4 FlexML Accelerator
6.4.1 FlexML Architecture Overview
6.4.2 Dataflow Reconfiguration
6.4.3 Efficient Zero-Skipping for Deconvolution and Blockwise Structured Sparsity
6.4.4 Support Vector Machine
6.5 Deployment of Neural Networks on TinyVers
6.6 Design for Test and Fault-Tolerance
6.7 Chip Implementation and Measurement
6.7.1 Peak Performance Analysis
6.7.2 Workload Benchmarks
6.7.3 Power Management
6.7.4 Instantaneous Power Trace
Keyword Spotting Application
Machine Monitoring Application
6.8 Comparison with SotA
6.9 Conclusion
7 DIANA: DIgital and ANAlog Heterogeneous Multi-core System-on-Chip
7.1 Motivation
7.2 Design Choices
7.2.1 Dataflow Concepts
7.2.2 Design Space Exploration
7.2.3 A Reconfigurable Heterogeneous Architecture
7.2.4 Optimization Strategies for Multi-core
7.3 System Architecture
7.3.1 The RISC-V CPU and Network Control
7.3.2 Memory System
7.4 AIMC Computing Core
7.4.1 AIMC Core Microarchitecture
7.4.2 Memory Control Unit (MCU)
7.4.3 AIMC Macro
7.4.4 Output Buffer and SIMD Unit
7.5 Digital DNN Accelerator
7.6 Measurements
7.6.1 Efficiency vs. Accuracy Trade-Off in the Analog Macro
7.6.2 Peak Performance and Efficiency Characterization
7.6.3 Workload Performance Characterization
7.6.4 SotA Comparison
7.7 Conclusion
8 Networks-on-Chip to Enable Large-Scale Multi-core ML Acceleration
8.1 Motivation
8.2 Background
8.2.1 Network-on-Chips
8.2.2 AXI Protocol
Burst
Multiple Outstanding Transaction
8.3 Interconnect Architecture of PATRONoC
8.4 Implementation Results
8.5 Performance Evaluation
8.5.1 Uniform Random Traffic
8.5.2 Synthetic Traffic
8.5.3 DNN Workload Traffic
8.6 Related Work
8.7 Conclusion
9 Conclusion
9.1 Overview and Contributions
9.2 Suggestions for Future Work
9.2.1 The Low Hanging Fruits
9.2.2 Medium Term
9.2.3 Moonshot
9.3 Closing Remarks
References
References
Index


📜 SIMILAR VOLUMES


Multi-Core Embedded Systems (Embedded Mu
✍ Georgios Kornaros 📂 Library 📅 2010 🏛 CRC Press 🌐 English

Details a real-world product that applies a cutting-edge multi-core architecture Increasingly demanding modern applications—such as those used in telecommunications networking and real-time processing of audio, video, and multimedia streams—require multiple processors to achieve computational perfo

Multi-Core Embedded Systems (Embedded Mu
✍ Georgios Kornaros 📂 Library 📅 2010 🏛 CRC Press 🌐 English

Details a real-world product that applies a cutting-edge multi-core architecture Increasingly demanding modern applications—such as those used in telecommunications networking and real-time processing of audio, video, and multimedia streams—require multiple processors to achieve computational perfo

Multi-Core Embedded Systems (Embedded Mu
✍ Georgios Kornaros, Editor 📂 Library 📅 2010 🏛 CRC Press 🌐 English

Details a real-world product that applies a cutting-edge multi-core architecture Increasingly demanding modern applications—such as those used in telecommunications networking and real-time processing of audio, video, and multimedia streams—require multiple processors to achieve computational perfo

Embedded Memory Design for Multi-Core an
✍ Baker Mohammad (auth.) 📂 Library 📅 2014 🏛 Springer-Verlag New York 🌐 English

<p>This book describes the various tradeoffs systems designers face when designing embedded memory. Readers designing multi-core systems and systems on chip will benefit from the discussion of different topics from memory architecture, array organization, circuit design techniques and design for tes

Embedded Memory Design for Multi-Core an
✍ Baker Mohammad 📂 Library 📅 2013 🏛 Springer 🌐 English

This book describes the various tradeoffs systems designers face when designing embedded memory. Readers designing multi-core systems and systems on chip will benefit from the discussion of different topics from memory architecture, array organization, circuit design techniques and design for test.

Networks-on-Chips: Theory and Practice (
✍ Fayez Gebali, Haytham Elmiligi, and M. Watheq El-Kharashi 📂 Library 📅 2009 🌐 English

The implementation of networks-on-chip (NoC) technology in VLSI integration presents a variety of unique challenges. To deal with specific design solutions and research hurdles related to intra-chip data exchange, engineers are challenged to invoke a wide range of disciplines and specializations whi