Thinking Machines: Machine Learning and Its Hardware Implementation

✍ Scribed by Shigeyuki Takano

Publisher: Academic Press
Year: 2021
Tongue: English
Leaves: 324
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Thinking Machines: Machine Learning and Its Hardware Implementation covers the theory and application of machine learning, neuromorphic computing and neural networks. This is the first book that focuses on machine learning accelerators and hardware development for machine learning. It presents not only a summary of the latest trends and examples of machine learning hardware and basic knowledge of machine learning in general, but also the main issues involved in its implementation. Readers will learn what is required for the design of machine learning hardware for neuromorphic computing and/or neural networks.

This is a recommended book for those who have basic knowledge of machine learning or those who want to learn more about the current trends of machine learning.

✦ Table of Contents

Front Cover
Thinking Machines
Copyright
Contents
List of figures
List of tables
Biography
Preface
Acknowledgments
Outline
1 Introduction
1.1 Dawn of machine learning
1.1.1 IBM Watson challenge on Jeopardy!
1.1.2 ImageNet challenge
1.1.3 Google AlphaGo challenge of professional Go player
1.2 Machine learning and applications
1.2.1 Definition
1.2.2 Applications
1.3 Learning and its performance metrics
1.3.1 Preparation before learning
1.3.1.1 Preparation of dataset
1.3.1.2 Cleaning of training data
1.3.2 Learning methods
1.3.2.1 Gradient descent method and back propagation
1.3.2.2 Taxonomy of learning
1.3.3 Performance metrics and verification
1.3.3.1 Inference performance metrics
1.3.3.2 Verification for inference
1.3.3.3 Learning performance metrics
1.4 Examples
1.4.1 Industry4.0
1.4.2 Transaction (block chain)
1.5 Summary of machine learning
1.5.1 Difference from artificial intelligence
1.5.2 Hype cycle
2 Traditional microarchitectures
2.1 Microprocessors
2.1.1 Core microarchitecture
2.1.2 Programming model of a microprocessor
2.1.2.1 Compiling-flow of a microprocessor
2.1.2.2 Concept of programming model on microprocessors
2.1.3 Microprocessor meets its complexity
2.1.4 Pros and cons on superscalar microprocessors
2.1.5 Scaling of register file
2.1.6 Branch predication and its penalty
2.2 Many-core processors
2.2.1 Concept of many-core
2.2.2 Programming model
2.2.2.1 Coherency
2.2.2.2 Library for threading
2.3 Digital signal processors (DSPs)
2.3.1 Concept of DSP
2.3.2 DSP microarchitecture
2.3.2.1 DSP functionality
2.3.2.2 DSP addressing modes
2.4 Graphics processing units (GPU)
2.4.1 Concept of GPU
2.4.2 GPU microarchitecture
2.4.3 Programming model on graphics processing units
2.4.4 Applying GPUs to a computing system
2.5 Field-programmable gate arrays (FPGAs)
2.5.1 Concept of FPGA
2.5.2 FPGA microarchitecture
2.5.3 FPGA design flow
2.5.4 Applying FPGAs to computing system
2.6 Dawn of domain-specific architectures
2.6.1 Past computer industry
2.6.2 History of machine learning hardware
2.6.3 Revisiting machine learning hardware
2.7 Metrics of execution performance
2.7.1 Latency and throughput
2.7.2 Number of operations per second
2.7.3 Energy and power consumptions
2.7.4 Energy-efficiency
2.7.5 Utilization
2.7.6 Data-reuse
2.7.7 Area
2.7.8 Cost
3 Machine learning and its implementation
3.1 Neurons and their network
3.2 Neuromorphic computing
3.2.1 Spike timing dependent plasticity and learning
3.2.2 Neuromorphic computing hardware
3.2.3 Address-event representation
3.2.3.1 Concept of address-event representation
3.2.3.2 Router architectures for AER
3.3 Neural network
3.3.1 Neural network models
3.3.1.1 Shallow neural networks
3.3.1.2 Deep neural networks
3.3.2 Previous and current neural networks
3.3.3 Neural network hardware
3.3.3.1 Dot-product implementation
3.4 Memory cell for analog implementation
4 Applications, ASICs, and domain-specific architectures
4.1 Applications
4.1.1 Concept of applications
4.2 Application characteristics
4.2.1 Locality
4.2.1.1 Concept of locality
4.2.1.2 Hardware perspective for locality
4.2.2 Deadlock
4.2.2.1 System model
4.2.2.2 Deadlock property
4.2.3 Dependency
4.2.3.1 Concept of data dependency
4.2.3.2 Data dependency representation
4.2.4 Temporal and spatial operations
4.3 Application-specific integrated circuit
4.3.1 Design constraints
4.3.1.1 Wire-delay
4.3.1.2 Energy and power consumption
4.3.1.3 Number of transistors and I/Os
4.3.1.4 Bandwidth hierarchy
4.3.2 Modular structure and mass production
4.3.3 Makimoto's wave
4.3.4 Design flow
4.4 Domain-specific architecture
4.4.1 Introduction to domain-specific architecture
4.4.1.1 Concept of domain-specific architecture
4.4.1.2 Guidelines of domain-specific architecture
4.4.2 Domain-specific languages
4.4.2.1 Halide
4.5 Machine learning hardware
4.6 Analysis of inference and training on deep learning
4.6.1 Analysis of inference on deep learning
4.6.1.1 Scale of parameters and activations
4.6.1.2 Operations and execution cycles
4.6.1.3 Energy consumption
4.6.1.4 Energy efficiency
4.6.2 Analysis of training on deep learning
4.6.2.1 Data amount and computation
4.6.2.2 Execution cycles
4.6.2.3 Energy consumption
5 Machine learning model development
5.1 Development process
5.1.1 Development cycle
5.1.2 Cross-validation
5.1.2.1 Hold-out method
5.1.2.2 Cross-validation
5.1.2.3 Validation for sequential data
5.1.3 Software stacks
5.2 Compilers
5.2.1 ONNX
5.2.2 NNVM
5.2.3 TensorFlow XLA
5.3 Code optimization
5.3.1 Extracting data-level parallelism
5.3.1.1 Vectorization
5.3.1.2 SIMDization
5.3.2 Memory access optimization
5.3.2.1 Data structure
5.3.2.2 Memory allocation
5.3.2.3 Data word alignment on memory and cache space
5.4 Python script language and virtual machine
5.4.1 Python and optimizations
5.4.2 Virtual machine
5.5 Compute unified device architecture
6 Performance improvement methods
6.1 Model compression
6.1.1 Pruning
6.1.1.1 Concept of pruning
6.1.1.2 Pruning methods
6.1.1.3 Example of pruning: deep compression
6.1.1.4 Unstructured and structured pruning
6.1.1.5 Difference from dropout and DropConnect
6.1.2 Dropout
6.1.2.1 Concept of a dropout
6.1.2.2 Dropout method
6.1.3 DropConnect
6.1.3.1 Concept of DropConnect
6.1.3.2 DropConnect method
6.1.4 Distillation
6.1.4.1 Concept of distillation
6.1.4.2 Distillation method
6.1.4.3 Effect of distillation
6.1.5 Principal component analysis
6.1.5.1 Concept of PCA
6.1.5.2 PCA method
6.1.6 Weight-sharing
6.1.6.1 Concept of weight-sharing
6.1.6.2 Example weight-sharing method
6.1.6.3 Tensor approximation
6.2 Numerical compression
6.2.1 Quantization and numerical precision
6.2.1.1 Direct quantization
6.2.1.2 Linear quantization
6.2.1.3 Lower numerical precision
6.2.2 Impact on memory footprint and inference accuracy
6.2.2.1 Memory footprint
6.2.2.2 Side-effect on inference accuracy
6.2.2.3 Effect on execution performance
6.2.2.4 Effect on reduction of energy consumption
6.2.3 Edge-cutting and clipping
6.3 Encoding
6.3.1 Run-length coding
6.3.1.1 Concept of run-length coding
6.3.1.2 Implementation of run-length coding
6.3.1.3 Effect of run-length coding
6.3.2 Huffman coding
6.3.2.1 Concept of Huffman coding
6.3.2.2 Implementation of Huffman coding
6.3.3 Effect of compression
6.3.3.1 Compression of the parameters
6.3.3.2 Compression for activations
6.3.3.3 Compression for both parameters and activations
6.4 Zero-skipping
6.4.1 Concept of zero-skipping
6.4.2 CSR and CSC sparsity representations
6.4.2.1 Concept of CSR and CSC encoding
6.5 Approximation
6.5.1 Concept of approximation
6.5.2 Activation function approximation
6.5.2.1 Hard-tanh
6.5.2.2 Hard-sigmoid
6.5.2.3 ReLU6
6.5.3 Multiplier approximation
6.5.3.1 Shifter representation
6.5.3.2 LUT representation
6.6 Optimization
6.6.1 Model optimization
6.6.1.1 Combined optimization
6.6.1.2 Memory access optimization
6.6.1.3 Fused layer
6.6.2 Data-flow optimization
6.6.2.1 Concept of data-flow optimization
6.6.2.2 Data-reuse
6.6.2.3 Constraint by reuse
6.7 Summary of performance improvement methods
7 Case study of hardware implementation
7.1 Neuromorphic computing
7.1.1 Analog logic circuit
7.1.2 Digital logic circuit
7.1.2.1 Many-core
7.1.2.2 ASIC
7.2 Deep neural network
7.2.1 Analog logic circuit
7.2.2 DSPs
7.2.3 FPGAs
7.2.4 ASICs
7.3 Quantum computing
7.4 Summary of case studies
7.4.1 Case study for neuromorphic computing
7.4.2 Case study for deep neural network
7.4.3 Comparison between neuromorphic computing and deep neural network hardware
8 Keys to hardware implementation
8.1 Market growth predictions
8.1.1 IoT market
8.1.2 Robotics market
8.1.3 Big data and machine learning markets
8.1.4 Artificial intelligence market in drug discovery
8.1.5 FPGA market
8.1.6 Deep learning chip market
8.2 Tradeoff between design and cost
8.3 Hardware implementation strategies
8.3.1 Requirements of strategy planning
8.3.1.1 Constructing strategy planning
8.3.1.2 Strategy planning
8.3.2 Basic strategies
8.3.2.1 Performance, usability, and risk control
8.3.2.2 Compatibility from traditional systems
8.3.2.3 Integration into traditional system
8.3.3 Alternative factors
8.4 Summary of hardware design requirements
9 Conclusion
A Basics of deep learning
A.1 Equation model
A.1.1 Feedforward neural network model
A.1.2 Activation functions
A.1.2.1 Concept of activation function
A.1.2.2 Rectified linear unit
A.1.3 Output layer
A.1.4 Learning and back propagation
A.1.4.1 Loss and cost functions
A.1.4.2 Back propagation
A.1.5 Parameter initialization
A.2 Matrix operation for deep learning
A.2.1 Matrix representation and its layout
A.2.2 Matrix operation sequence for learning
A.2.3 Learning optimization
A.2.4 Bias-variance problem
A.2.4.1 Regularization
A.2.4.2 Momentum
A.2.4.3 Adam
B Modeling of deep learning hardware
B.1 Concept of deep learning hardware
B.1.1 Relationship between parameter space and propagation
B.1.2 Basic deep learning hardware
B.2 Data-flow on deep learning hardware
B.3 Machine learning hardware architecture
C Advanced network models
C.1 CNN variants
C.1.1 Convolution architecture
C.1.1.1 Linear convolution
C.1.1.2 Higher rank convolution
C.1.1.3 Linear transposed convolution
C.1.1.4 Channels
C.1.1.5 Complexity
C.1.2 Back propagation for convolution
C.1.2.1 Back propagation for linear convolution
C.1.2.2 Back propagation for high-rank convolution
C.1.3 Convolution variants
C.1.3.1 Lightweight convolution
C.1.3.2 Pruning the convolution
C.1.4 Deep convolutional generative adversarial networks
C.2 RNN variants
C.2.1 RNN architecture
C.2.2 LSTM and GRU cells
C.2.2.1 Long-short term memory
C.2.2.2 Gated recurrent unit
C.2.3 Highway networks
C.3 Autoencoder variants
C.3.1 Stacked denoising autoencoders
C.3.2 Ladder networks
C.3.3 Variational autoencoders
C.3.3.1 Concept of variational autoencoders
C.3.3.2 Modeling of variational autoencoders
C.4 Residual networks
C.4.1 Concept of residual networks
C.4.2 Effect of residual network
C.5 Graph neural networks
C.5.1 Concept of graph neural networks
D National research and trends and investment
D.1 China
D.1.1 Next generation AI development plan
D.2 USA
D.2.1 SyNAPSE program
D.2.2 UPSIDE program
D.2.3 MICrONS program
D.3 EU
D.4 Japan
D.4.1 Ministry of Internal Affairs and Communications
D.4.2 MEXT
D.4.3 METI
D.4.4 Cabinet office
E Machine learning and social
E.1 Industry
E.1.1 Past industries
E.1.2 Next industry
E.1.3 Open-sourced software and hardware
E.1.4 Social business and shared economy
E.2 Machine learning and us
E.2.1 Replaceable domain with machine learning
E.2.2 Consolidation of industry
E.2.3 A simplified world
E.3 Society and individuals
E.3.1 Introduction of programming into education
E.3.2 Change in values
E.3.3 Social support
E.3.4 Crime
E.4 Nation
E.4.1 Police and prosecutors
E.4.2 Administrative, legislative, and judicial
E.4.3 Military affairs
Bibliography
Index
Back Cover

📜 SIMILAR VOLUMES

VLSI and Hardware Implementations using

📁 VLSI and Hardware Implementations using Modern Machine Learning Methods

✍ Sandeep Saini (editor), Kusum Lata (editor), G.R. Sinha (editor) 📂 Library 📅 2022 🏛 CRC Press 🌐 English

<p>Machine learning is a potential solution to resolve bottleneck issues in VLSI via optimizing tasks in the design process. This book aims to provide the latest machine learning based methods, algorithms, architectures, and frameworks designed for VLSI design. Focus is on digital, analog, and mixed

Mastering Java machine learning : master

📁 Mastering Java machine learning : mastering and implementing advanced techniques in machine learning

✍ Choppella, Krishna; Kamath, Uday 📂 Library 📅 2017 🌐 English

Machine Learning with Python: Theory and

📁 Machine Learning with Python: Theory and Implementation

✍ Amin Zollanvari 📂 Library 📅 2023 🏛 Springer 🌐 English

<span>This book is meant as a textbook for undergraduate and graduate students who are willing to understand essential elements of machine learning from both a theoretical and a practical perspective. The choice of the topics in the book is made based on one criterion: whether the practical utility

Machine Learning with Python: Theory and

📁 Machine Learning with Python: Theory and Implementation

✍ Amin Zollanvari 📂 Library 📅 2023 🏛 Springer 🌐 English

Machine Learning and Its Applications

📁 Machine Learning and Its Applications

✍ Peter Wlodarczak 📂 Library 📅 2020 🏛 CRC Press 🌐 English

In recent years, machine learning has gained a lot of interest. Due to the advances in processor technology and the availability of large amounts of data, machine learning techniques have provided astounding results in areas such as object recognition or natural language processing. New approaches,

Machine Learning and Its Applications

📁 Machine Learning and Its Applications

✍ Peter Wlodarczak 📂 Library 📅 2020 🏛 CRC Press 🌐 English