Shine a spotlight into the deep learning "black box". This comprehensive and detailed guide reveals the mathematical and architectural concepts behind deep learning models, so you can customize, maintain, and explain them more effectively. InsideMath and Architectures of Deep Learning you will find
Math and Architectures of Deep Learning Version 10
β Scribed by Krishnendu Chaudhury
- Publisher
- Manning Publications
- Year
- 2021
- Tongue
- English
- Leaves
- 494
- Edition
- MEAP Edition
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Table of Contents
Math and Architectures of Deep Learning MEAP V10
Copyright
Welcome
Brief contents
Chapter 1: An overview of machine learning and deep learning
1.1 A first look at machine/deep learning - a paradigm shift in computation
1.2 A Function Approximation View of Machine Learning: Models and their Training
1.3 A simple machine learning model - the cat brain
1.4 Geometrical View of Machine Learning
1.5 Regression vs Classification in Machine Learning
1.6 Linear vs Nonlinear Models
1.7 Higher Expressive Power through multiple non-linear layers: Deep Neural Networks
Chapter Summary
Chapter 2: Introduction to Vectors, Matrices and Tensors from Machine Learning and Data Science point of view
2.1 Vectors and their role in Machine Learning and Data Science
2.1.1 Geometric View of Vectors and its significance in Machine Learning and Data Science
2.2 Python code to create and access vectors and sub-vectors, slice and dice vectors, via Numpy and PyTorch parallel code
2.2.1 Python Numpy code for introduction to Vectors
2.2.2 PyTorch code for introduction to Vectors
2.3 Matrices and their role in Machine Learning and Data Science
2.4 Python Code: Introduction to Matrices, Tensors and Images via Numpy and PyTorch parallel code
2.4.1 Python Numpy code for introduction to Tensors, Matrices and Images
2.4.2 PyTorch code for introduction to Tensors and Matrices
2.5 Basic Vector and Matrix operations in Machine Learning and Data Science
2.5.1 Matrix and Vector Transpose
2.5.2 Dot Product of two vectors and its role in Machine Learning and Data Science
2.5.3 Matrix Multiplication and Machine Learning, Data Science
2.5.4 Length of a Vector aka L2 norm and its role in Machine Learning
2.5.5 Geometric intuitions for Vector Length - Model Error in Machine Learning
2.5.6 Geometric intuitions for the Dot Product - Feature Similarity in Machine Learning and Data Science
2.6 Orthogonality of Vectors and its physical significance
2.7 Python code: Basic Vector and Matrix operations via Numpy
2.7.1 Python numpy code for Matrix Transpose
2.7.2 Python numpy code for Dot product
2.7.3 Python numpy code for Matrix vector multiplication
2.7.4 Python numpy code for Matrix Matrix Multiplication
2.7.5 Python numpy code for Transpose of Matrix Product
2.7.6 Python numpy code for Matrix Inverse
2.8 Multidimensional Line and Plane Equations and their role in Machine Learning
2.8.1 Multidimensional Line Equation
2.8.2 Multidimensional Planes and their role in Machine Learning
2.9 Linear Combination, Linear Dependence, Vector Span and Basis Vectors, their Geometrical Significance, Collinearity Preservation
2.10 Linear Transforms - Geometric and Algebraic interpretations
2.11 Multidimensional Arrays, Multi-linear Transforms and Tensors
2.11.1 Array View: Multidimensional arrays of numbers
2.12 Linear Systems and Matrix Inverse
2.12.1 Linear Systems with zero or near zero Determinants; Ill Conditioned Systems
2.12.2 Over and Under Determined Linear Systems in Machine Learning and Data Science
2.12.3 Moore Penrose Pseudo-Inverse of a Matrix: solving Over or Under Determined Linear Systems
2.12.4 Pseudo Inverse of a Matrix: A Beautiful Geometric Intuition
2.12.5 Python numpy code to solve over-determined systems
2.13 Eigenvalues and Eigenvectors - swiss army knives in Machine Learning and Data Science
2.13.1 Python numpy code to compute eigenvectors and eigenvalues
2.14 Orthogonal (Rotation) Matrices and their Eigenvalues and Eigenvectors
2.14.1 Python numpy code for orthogonality of rotation matrices
2.14.2 Python numpy code for eigenvalues and vectors of rotation matrices - axis of rotation
2.15 Matrix Diagonalization
2.15.1 Python Numpy code for Matrix diagonalization
2.15.2 Solving Linear Systems without Inverse via Diagonalization
2.15.3 Python Numpy code for Solving Linear Systems via diagonalization
2.15.4 Matrix powers using diagonalization
2.16 Spectral Decomposition of a Symmetric Matrix
2.16.1 Python numpy code for Spectral Decomposition of Matrix
2.17 An application relevant to Machine Learning - finding the axes of a hyper-ellipse
2.17.1 Python numpy code for Hyper Ellipses
Chapter Summary
Chapter 3: Introduction to Vector Calculus from Machine Learning point of view
3.1 Significance of the sign of the separating surface in binary classification
3.2 Estimating Model Parameters: Training
3.3 Minimizing Error during Training a Machine Learning Model: Gradient Vectors
3.3.1 Derivatives, Partial Derivatives, Change in function value and Tangents, Gradients
3.3.2 Level Surface representation and Loss Minimization
3.4 Python numpy and PyTorch code for Gradient Descent, Error Minimization and Model Training
3.4.1 Numpy and PyTorch code for Linear Models
Autograd: PyTorch Automatic Gradient Computation
3.4.2 Non-linear Models in PyTorch
3.4.3 A Linear Model for the cat-brain in PyTorch
3.5 Convex, Non-convex functions; Global and Local Minima
3.6 Multi-dimensional Taylor series and Hessian Matrix
3.6.1 1D Taylor Series recap
3.6.2 Multi-dimensional Taylor series and Hessian Matrix
3.7 Convex sets and functions
Chapter Summary
Chapter 4: Linear Algebraic Tools in Machine Learning and Data Science
4.1 Quadratic Forms and their Minimization
4.1.1 Symmetric Positive (Semi)definite Matrices
4.2 Spectral and Frobenius Norm of a Matrix
4.3 Principal Component Analysis
4.3.1 Application of PCA in Data Science: Dimensionality Reduction
4.3.2 Python Numpy code: PCA and dimensionality reduction
4.3.3 Drawback of PCA from Data Science viewpoint
4.3.4 Application of PCA in Data Science: Data Compression
4.4 Singular Value Decomposition
4.4.1 Application of SVD: PCA computation
4.4.2 Application of SVD: Solving arbitrary Linear System
4.4.3 Rank of a Matrix
4.4.4 Python numpy code for linear system solving via SVD
4.4.5 Python numpy code for PCA computation via SVD
4.4.6 Application of SVD: Best low rank approximation of a matrix
4.5 Machine Learning Application: Document Retrieval
4.5.1 TF-IDF and Cosine Similarity in Machine Learning based Document Retrieval
4.5.2 Latent Semantic Analysis (LSA)
4.5.3 Python/Numpy code to compute LSA on a toy dataset
4.5.4 Python/Numpy code to compute and visualize LSA/SVD on a 500 3 dataset
Chapter Summary
Chapter 5: Probability Distributions for Machine Learning and Data Science
5.1 Probability - the classical frequentist view
5.1.1 Random Variables
5.1.2 Population Histograms
5.2 Probability Distributions
5.3 Impossible and certain events, Sum of probabilities of exhaustive, mutually exclusive events, Independent events
5.3.1 Probabilities of Impossible and Certain Events
5.3.2 Exhaustive and mutually exclusive events
5.3.3 Independent Events
5.4 Joint Probabilities and their distributions
5.4.1 Marginal Probabilities
5.4.2 Dependent Events and their Joint Probability Distribution
5.5 Geometrical View: Sample point distributions for dependent and independent variables
5.5.1 Python Numpy code to draw random samples from a discrete joint probability distribution
5.6 Continuous Random Variables and Probability Density
5.7 Properties of distributions - Expected Value, Variance and Covariance
5.7.1 Expected Value aka Mean
5.7.2 Variance, Covariance, Standard Deviation
5.8 Sampling from a Distribution
5.9 Some famous probability distributions
5.9.1 Uniform Random Distributions
5.9.2 Gaussian (aka Normal) Distribution
5.9.3 Binomial Distribution
5.9.4 Multinomial Distribution
5.9.5 Bernoulli Distribution
5.9.6 Categorical Distribution and one-hot vectors
Chapter Summary
Chapter 6: Bayesian Tools for Machine Learning and Data Science
6.1 Conditional Probability and Bayes Theorem with recap of Joint and Marginal Probability
6.1.1 Joint and Marginal Probability Revisited
6.1.2 Conditional Probability and Bayes Theorem
6.2 Entropy
6.2.1 Entropy of Gaussian
6.2.2 Python PyTorch code to compute Entropy of a Gaussian
6.3 Cross Entropy
6.3.1 Python PyTorch code to compute Cross Entropy
6.4 KL Divergence
6.4.1 KL Divergence between Gaussians
6.4.2 Python PyTorch code to compute KL Divergence
6.5 Conditional Entropy
6.6 Model Parameter Estimation
6.6.1 Likelihood, Evidence, Posterior and Prior Probabilities
6.6.2 The log-likelihood trick
6.6.3 Maximum Likelihood Parameter Estimation (MLE)
6.6.4 Maximum A Posteriori (MAP) Parameter Estimation and Regularization
6.7 Latent Variables and Evidence Maximization
6.8 Maximum Likelihood Parameter Estimation for Gaussian
6.8.1 Python PyTorch code for Maximum Likelihood Estimation and Maximum A Posteriori Estimation
6.9 Gaussian Mixture Models
6.9.1 Probability Density Function (PDF) of the GMM
6.9.2 Latent variable for class selection and physical interpretations of the GMM PDF terms
6.9.3 Classification via GMM
6.9.4 Maximum Likelihood Estimation of GMM parameters (GMM Fit)
6.9.5 Python PyTorch code for GMM Fit
Chapter Summary
Chapter 7: Function Approximation: How Neural Networks model the world
7.1 Most real world problems can be expressed as functions
7.1.1 Logical Functions in real world problems
7.1.2 Classifier functions in real world problems
7.1.3 General Functions in real world problems
7.2 The basic building block aka Neuron: Perceptron
7.2.1 Heaviside Step Function
7.2.2 Hyperplanes
7.2.3 Perceptrons and Classification
7.2.4 Modeling common logic gates with perceptrons
7.3 Towards more expressive power: Multi Layer Perceptrons (MLPs)
7.3.1 MLP for Logical XOR
7.4 Layered networks of Perceptrons aka MLPs aka Neural Networks
7.4.1 Layering
7.4.2 All Logical Functions can be modeled with MLPs
7.4.3 Cybenko's Universal Approximation Theorem
7.4.4 MLPs for polygonal decision boundaries
7.5 Chapter Summary
Chapter 8: Training Neural Networks: Forward and Backpropagation
8.1 Differentiable step like functions
8.1.1 Sigmoid Function
8.1.2 TanH Function
8.2 Why Layering
8.3 Linear Layer
8.3.1 Linear layer expressed as a matrix-vector multiplication
8.3.2 Forward Propagation and Grand Output function for an MLP of Linear Layers
8.4 Training and Backpropagation
8.4.1 Loss and its Minimization: Goal of Training
8.4.2 Loss Surface and Gradient Descent
8.4.3 Why Gradient provides the best direction for descent
8.4.4 Gradient Descent and Local Minima
8.4.5 The backpropagation algorithm
8.4.6 Putting it all together: Overall Training Algorithm
8.5 Training a Neural Network in PyTorch
8.6 Chapter Summary
Chapter 9: Loss, Optimization and Regularization
9.1 Loss Functions
9.1.1 Quantification and Geometrical view of Loss
9.1.2 Regression Loss
9.1.3 Cross Entropy Loss
9.1.4 Cross Entropy Loss for pixel and vector values mismatch
9.1.5 SoftMax
9.1.6 SoftMax Cross Entropy Loss
9.1.7 Focal Loss
9.1.8 Hinge Loss
9.2 Optimization
9.2.1 Geometrical view of Optimization
9.2.2 SGD: Stochastic Gradient Descent and mini batches
9.2.3 PyTorch code for Stochastic Gradient Descent
9.2.4 Momentum
9.2.5 Geometric View: constant loss contours, gradient descent and momentum
9.2.6 NAG: Nesterov Accelerated Gradients
9.2.7 AdaGrad
9.2.8 RMSProp
9.2.9 Adam Optimizer
9.3 Regularization
9.3.1 MDL: Minimum Descriptor Length - an Occam's Razor View of optimization
9.3.2 L2 Regularization
9.3.3 L1 Regularization
9.3.4 Sparsity: L1 vs L2 Regularization
9.3.5 Bayes Theorem and Stochastic view of optimization
9.3.6 Dropout
Chapter Summary
Chapter 10: One, Two and Three Dimensional Convolution and Transposed Convolution in Neural Networks
10.1 One Dimensional Convolution: Graphical and Algebraical view
10.1.1 Curve Smoothing via 1D Convolution
10.1.2 Curve Edge Detection via 1D Convolution
10.1.3 One Dimensional Convolution as Matrix Multiplication
10.1.4 PyTorch: One-dimensional convolution with custom weights
10.2 Convolution Output Size
10.3 Two Dimensional Convolution: Graphical and Algebraic view
10.3.1 Image Smoothing via 2D Convolution
10.3.2 Image Edge Detection via 2D Convolution
10.3.3 PyTorch: Two-dimensional convolution with custom weights
10.3.4 Two Dimensional Convolution as Matrix Multiplication
10.4 Three Dimensional Convolution
10.4.1 Video Motion Detection via 3D Convolution
10.4.2 PyTorch: Three-dimensional convolution with custom weights
10.5 Transposed Convolution or Fractionally Strided Convolution
10.5.1 Application of Transposed convolution: AutoEncoders and Embeddings
10.5.2 Transposed Convolution Output Size
10.5.3 Upsampling via Transpose Conv
10.6 Convolution Layers to a Neural Network
10.6.1 PyTorch: Adding Convolution Layers to a Neural Network
10.7 Pooling
10.8 Chapter Summary
Chapter 11: Deep Convolutional Neural Network Architectures for Image Classification and Object Detection
11.1 Introduction
11.2 Convolutional Neural Networks (CNNs) for Image Classification - LeNet
11.2.1 PyTorch: Implementing LeNet for image classification on MNIST
11.3 Towards deeper neural networks
11.3.1 VGG (Visual Geometry Group) Net
11.3.2 Inception: Network in Network paradigm
11.3.3 ResNet: Why simply stacking layers to add depth does not scale
11.3.4 PyTorch Lightning
11.4 Object Detection: A brief history
11.4.1 R-CNN
11.4.2 Fast R-CNN
11.4.3 Faster R-CNN
11.5 Faster R-CNN: A deep dive
11.5.1 Convolution Backbone
11.5.2 Region Proposal Network
11.5.3 Fast R-CNN
11.5.4 Training Faster R-CNN
11.5.5 Other Object Detection Paradigms
11.6 Chapter Summary
Chapter 12: Manifolds, homeomorphism and Neural Networks
12.1 Manifolds
12.1.1 Locally Euclidean
12.1.2 Hausdorff
12.1.3 Second Countable
12.2 Homeomorphism
12.3 Neural Networks and homeomorphism between manifolds
12.4 Chapter Summary
Appendix
A.1 Dot Product and cosine of the angle between two vectors
A.2 Computing variance of Gaussian Distribution
A.3 Two Theorems in Statistics
Notations
π SIMILAR VOLUMES
The mathematical paradigms that underlie deep learning typically start out as hard-to-read academic papers, often leaving engineers in the dark about how their models actually function. Math and Architectures of Deep Learning bridges the gap between theory and practice, laying out the math of deep l
Shine a spotlight into the deep learning βblack boxβ. This comprehensive and detailed guide reveals the mathematical and architectural concepts behind deep learning models, so you can customize, maintain, and explain them more effectively. Inside Math and Architectures of Deep Learning you will f
<p>This book offers a timely reflection on the remarkable range of algorithms and applications that have made the area of deep learning so attractive and heavily researched today. Introducing the diversity of learning mechanisms in the environment of big data, and presenting authoritative studies in