Machine Learning With Python: Theory and Applications

✍ Scribed by G R Liu

Publisher: World Scientific Pub Co Inc
Year: 2022
Tongue: English
Leaves: 693
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Machine Learning (ML) has become a very important area of research widely used in various industries.

This compendium introduces the basic concepts, fundamental theories, essential computational techniques, codes, and applications related to ML models. With a strong foundation, one can comfortably learn related topics, methods, and algorithms. Most importantly, readers with strong fundamentals can even develop innovative and more effective machine models for his/her problems. The book is written to achieve this goal.

The useful reference text benefits professionals, academics, researchers, graduate and undergraduate students in AI, ML and neural networks.

✦ Table of Contents

Contents
About the Author
1 Introduction
1.1 Naturally Learned Ability for Problem Solving
1.2 Physics-Law-based Models
1.3 Machine Learning Models, Data-based
1.4 General Steps for Training Machine Learning Models
1.5 Some Mathematical Concepts, Variables, and Spaces
1.5.1 Toy examples
1.5.2 Feature space
1.5.3 Affine space
1.5.4 Label space
1.5.5 Hypothesis space
1.5.6 Definition of a typical machine learning model, a mathematical view
1.6 Requirements for Creating Machine Learning Models
1.7 Types of Data
1.8 Relation Between Physics-Law-based and Data-based Models
1.9 This Book
1.10 Who May Read This Book
1.11 Codes Used in This Book
References
2 Basics of Python
2.1 An Exercise
2.2 Briefing on Python
2.3 Variable Types
2.3.1 Numbers
2.3.2 Underscore placeholder
2.3.3 Strings
2.3.4 Conversion between types of variables
2.3.5 Variable formatting
2.4 Arithmetic Operators
2.4.1 Addition, subtraction, multiplication, division, and pow
2.4.2 Built-in functions
2.5 Boolean Values and Operators
2.6 Lists: A diversified variable type container
2.6.1 List creation, appending, concatenation, and updating
2.6.2 Element-wise addition of lists
2.6.3 Slicing strings and lists
2.6.4 Underscore placeholders for lists
2.6.5 Nested list (lists in lists in lists)
2.7 Tuples: Value preserved
2.8 Dictionaries: Indexable via keys
2.8.1 Assigning data to a dictionary
2.8.2 Iterating over a dictionary
2.8.3 Removing a value
2.8.4 Merging two dictionaries
2.9 Numpy Arrays: Handy for scientific computation
2.9.1 Lists vs. Numpy arrays
2.9.2 Structure of a numpy array
2.9.3 Axis of a numpy array
2.9.4 Element-wise computations
2.9.5 Handy ways to generate multi-dimensional arrays
2.9.6 Use of external package: MXNet
2.9.7 In-place operations
2.9.8 Slicing from a multi-dimensional array
2.9.9 Broadcasting
2.9.10 Converting between MXNet NDArray and NumPy
2.9.11 Subsetting in Numpy
2.9.12 Numpy and universal functions (ufunc)
2.9.13 Numpy array and vector/matrix
2.10 Sets: No Duplication
2.10.1 Intersection of two sets
2.10.2 Difference of two sets
2.11 List Comprehensions
2.12 Conditions, “if” Statements, “for” and “while” Loops
2.12.1 Comparison operators
2.12.2 The “in” operator
2.12.3 The “is” operator
2.12.4 The ‘not’ operator
2.12.5 The “if” statements
2.12.6 The “for” loops
2.12.7 The “while” loops
2.12.8 Ternary conditionals
2.13 Functions (Methods)
2.13.1 Block structure for function definition
2.13.2 Function with arguments
2.13.3 Lambda functions (Anonymous functions)
2.14 Classes and Objects
2.14.1 A simplest class
2.14.2 A class for scientific computation
2.14.3 Subclass (class inheritance)
2.15 Modules
2.16 Generation of Plots
2.17 Code Performance Assessment
2.18 Summary
Reference
3 Basic Mathematical Computations
3.1 Linear Algebra
3.1.1 Scalar numbers
3.1.2 Vectors
3.1.3 Matrices
3.1.4 Tensors
3.1.5 Sum and mean of a tensor
3.1.6 Dot-product of two vectors
3.1.7 Outer product of two vectors
3.1.8 Matrix-vector product
3.1.9 Matrix-matrix multiplication
3.1.10 Norms
3.1.11 Solving algebraic system equations
3.1.12 Matrix inversion
3.1.13 Eigenvalue decomposition of a matrix
3.1.14 Condition number of a matrix
3.1.15 Rank of a matrix
3.2 Rotation Matrix
3.3 Interpolation
3.3.1 1-D piecewise linear interpolation using numpy.interp
3.3.2 1-D least-square solution approximation
3.3.3 1-D interpolation using interp1d
3.3.4 2-D spline representation using bisplrep
3.3.5 Radial basis functions for smoothing and interpolation
3.4 Singular Value Decomposition
3.4.1 SVD formulation
3.4.2 Algorithms for SVD
3.4.3 Numerical examples
3.4.4 SVD for data compression
3.5 Principal Component Analysis
3.5.1 PCA formulation
3.5.2 Numerical examples
3.5.2.1 Example 1: PCA using a three-line code
3.5.2.2 Example 2: Truncated PCA
3.6 Numerical Root Finding
3.7 Numerical Integration
3.7.1 Trapezoid rule
3.7.2 Gauss integration
3.8 Initial data treatment
3.8.1 Min-max scaling
3.8.2 “One-hot” encoding
3.8.3 Standard scaling
References
4 Statistics and Probability-based Learning Model
4.1 Analysis of Probability of an Event
4.1.1 Random sampling, controlled random sampling
4.1.2 Probability
4.2 Random Distributions
4.2.1 Uniform distribution
4.2.2 Normal distribution (Gaussian distribution)
4.3 Entropy of Probability
4.3.1 Example 1: Probability and its entropy
4.3.2 Example 2: Variation of entropy
4.3.3 Example 3: Entropy for events with a variable that takes different numbers of values of uniform distribution
4.4 Cross-Entropy: Predicated and True Probability
4.4.1 Example 1: Cross-entropy of a quality prediction
4.4.2 Example 2: Cross-entropy of a poor prediction
4.5 KL-Divergence
4.5.1 Example 1: KL-divergence of a distribution of quality prediction
4.5.2 Example 2: KL-divergence of a poorly predicted distribution
4.6 Binary Cross-Entropy
4.6.1 Example 1: Binary cross-entropy for a distribution of quality prediction
4.6.2 Example 2: Binary cross-entropy for a poorly predicted distribution
4.6.3 Example 3: Binary cross-entropy for more uniform true distribution: A quality prediction
4.6.4 Example 4: Binary cross-entropy for more uniform true distribution: A poor prediction
4.7 Bayesian Statistics
4.8 Naive Bayes Classification: Statistics-based Learning
4.8.1 Formulation
4.8.2 Case study: Handwritten digits recognition
4.8.3 Algorithm for the Naive Bayes classification
4.8.4 Testing the Naive Bayes model
4.8.5 Discussion
5 Prediction Function and Universal Prediction Theory
5.1 Linear Prediction Function and Affine Transformation
5.1.1 Linear prediction function: A basic hypothesis
5.1.2 Predictability for constants, the role of the bias
5.1.3 Predictability for linear functions: The role of the weights
5.1.4 Prediction of linear functions: A machine learning procedure
5.1.5 Affine transformation
5.2 Affine Transformation Unit (ATU), A Simplest Network
5.3 Typical Data Structures
5.4 Demonstration Examples of Affine Transformation
5.4.1 An edge, a rectangle under affine transformation
5.4.2 A circle under affine transformation
5.4.3 A spiral under affine transformation
5.4.4 Fern leaf under affine transformation
5.4.5 On linear prediction function with affine transformation
5.4.6 Affine transformation wrapped with activation function
5.5 Parameter Encoding and the Essential Mechanism of Learning
5.5.1 The x to ŵ encoding, a data-parameter converter unit
5.5.2 Uniqueness of the encoding
5.5.3 Uniqueness of the encoding: Not affectedby activation function
5.5.3 Uniqueness of the encoding: Not affected by activation function
5.6 The Gradient of the Prediction Function
5.7 Affine Transformation Array (ATA)
5.8 Predictability of High-Order Functions of a Deepnet
5.8.1 A role of activation functions
5.8.2 Formation of a deepnet by chaining ATA
5.8.3 Example: A 1 → 1 → 1 network
5.9 Universal Prediction Theory
5.10 Nonlinear Affine Transformations
5.11 Feature Functions in Physics-Law-based Models
References
6 The Perceptron and SVM
6.1 Linearly Separable Classification Problems
6.2 A Python Code for the Perceptron
6.3 The Perceptron Convergence Theorem
6.4 Support Vector Machine
6.4.1 Problem statement
6.4.2 Formulation of objective function and constraints
6.4.3 Modified objective function with constraints: Multipliers method
6.4.4 Converting to a standard quadratic programming problem
6.4.5 Prediction in SVM
6.4.6 Example: A Python code for SVM
6.4.7 Confusion matrix
6.4.8 Example: A Sickit-learn class for SVM
6.4.9 SVM for datasets not separable with hyperplanes
6.4.10 Kernel trick
6.4.11 Example: SVM classification with curves
6.4.12 Multiclass classification via SVM
6.4.13 Example: Use of SVM classifiers for iris dataset
References
7 Activation Functions and Universal Approximation Theory
7.1 Sigmoid Function (σ(z))
7.2 Sigmoid Function of an Affine Transformation Function
7.3 Neural-Pulse-Unite (NPU)
7.4 Universal Approximation Theorem
7.4.1 Function approximation using NPUs
7.4.2 Function approximations using neuron basis functions
7.4.3 Remarks
7.5 Hyperbolic Tangent Function (tanh)
7.6 Relu Functions
7.7 Softplus Function
7.8 Conditions for activation functions
7.9 Novel activation functions
7.9.1 Rational activation function
7.9.2 Power function
7.9.3 Power-linear function
7.9.4 Power-quadratic function
References
8 Automatic Differentiation and Autograd
8.1 General Issues on Optimization and Minimization
8.2 Analytic Differentiation
8.3 Numerical Differentiation
8.4 Automatic Differentiation
8.4.1 The concept of automatic or algorithmic differentiation
8.4.2 Differentiation of a function with respect to a vector and matrix
8.5 Autograd Implemented in Numpy
8.6 Autograd Implemented in the MXNet
8.6.1 Gradients of scalar functions with simple variable
8.6.2 Gradients of scalar functions in high dimensions
8.6.3 Gradients of scalar functions with quadratic variables in high dimensions
8.6.4 Gradient of scalar function with a matrix of variables in high dimensions
8.7 Gradients for Functions with Conditions
8.8 Example: Gradients of an L2 Loss Function for a Single Neuron
8.9 Examples: Differences Between Analytical, Autograd, and Numerical Differentiation
8.10 Discussion
References
9 Solution Existence Theory and Optimization Techniques
9.1 Introduction
9.2 Analytic Optimization Methods: Ideal Cases
9.2.1 Least square formulation
9.2.2 L2 loss function
9.2.3 Normal equation
9.2.4 Solution existence analysis
9.2.5 Solution existence theory
9.2.6 Effects of parallel data-points
9.2.7 Predictability of the solution against the label
9.3 Considerations in Optimization for Complex Problems
9.3.1 Local minima
9.3.2 Saddle points
9.3.3 Convex functions
9.4 Gradient Descent (GD) Method for Optimization
9.4.1 Gradient descent in one dimension
9.4.2 Remarks
9.4.3 Gradient descent in hyper-dimensions
9.4.4 Property of a convex function
9.4.5 The convergence theorem for the Gradient Decent algorithm
9.4.6 Setting or the learning rates
9.5 Stochastic Gradient Descent
9.5.1 Numerical experiment
9.6 Gradient Descent with Momentum
9.6.1 The most critical problem with GD methods
9.6.2 Formulation
9.6.3 Numerical experiment
9.7 Nesterov Accelerated Gradient
9.7.1 Formulation
9.8 AdaGrad Gradient Algorithm
9.8.1 Formulation
9.8.2 Numerical experiment
9.9 RMSProp Gradient Algorithm
9.9.1 Formulation
9.9.2 Numerical experiment
9.10 AdaDelta Gradient Algorithm
9.10.1 The idea
9.10.2 Numerical experiment
9.11 Adam Gradient Algorithm
9.11.1 Formulation
9.11.2 Numerical experiment
9.12 A Case Study: Compare Minimization Techniques Used in MLPClassifier
9.13 Other Algorithms
References
10 Loss Functions for Regression
10.1 Formulations for Linear Regression
10.1.1 Mathematical model
10.1.2 Neural network configuration
10.1.3 The xw formulation
10.2 Loss Functions for Linear Regression
10.2.1 Mean squared error loss or L2 loss function
10.2.2 Absolute error loss or L1 loss function
10.2.3 Huber loss function
10.2.4 Log-cosh loss function
10.2.5 Comparison between these loss functions
10.2.6 Python codes for these loss functions
10.3 Python Codes for Regression
10.3.1 Linear regression using high-order polynomial and other feature functions
10.3.2 Linear regression using Gaussian basis functions
10.4 Neural Network Model for Linear Regressions with Big Datasets
10.4.1 Setting up neural network models
10.4.2 Create data iterators
10.4.3 Training parameters
10.4.4 Define the neural network
10.4.5 Define the loss function
10.4.6 Use of optimizer
10.4.7 Execute the training
10.4.8 Examining training progress
10.5 Neural Network Model for Nonlinear Regression
10.5.1 Train models on the Boston housing price dataset
10.5.2 Plotting partial dependence for two features
10.5.3 Plot curves on top of each other
10.6 On Nonlinear Regressions
10.7 Conclusion
References
11 Loss Functions and Models for Classification
11.1 Prediction Functions
11.1.1 Linear function
11.1.2 Logistic prediction function
11.1.3 The tanh prediction function
11.2 Loss Functions for Classification Problems
11.2.1 The margin concept
11.2.2 0–1 loss
11.2.3 Hinge loss
11.2.4 Logistic loss
11.2.5 Exponential loss
11.2.6 Square loss
11.2.7 Binary cross-entropy loss
11.2.8 Remarks
11.3 A Simple Neural Network for Classification
11.4 Example of Binary Classification Using Neural Network with mxnet
11.4.1 Dataset for binary classification
11.4.2 Define loss functions
11.4.3 Plot the convergence curve of the loss function
11.4.4 Computing the accuracy of the trained model
11.5 Example of Binary Classification Using Sklearn
11.6 Regression with Decision Tree, AdaBoost, and Gradient Boosting
References
12 Multiclass Classification
12.1 Softmax Activation Neural Networks for k-Classifications
12.2 Cross-Entropy Loss Function for k-Classifications
12.3 Case Study 1: Handwritten Digit Classification with 1-Layer NN
12.3.1 Set contexts according to computer hardware
12.3.2 Loading the MNIST dataset
12.3.3 Set model parameters
12.3.4 Multiclass logistic regression
12.3.5 Defining a neural network model
12.3.6 Defining the cross-entropy loss function
12.3.7 Optimization method
12.3.8 Accuracy evaluation
12.3.9 Initiation of the model and training execution
12.3.10 Prediction with the trained model
12.4 Case Study 2: Handwritten Digit Classification with Sklearn Random Forest Multi-Classifier
12.5 Case Study 3: Comparison of Random Forest, Extra-Forest, and Gradient Boosting for Multi-Classifier
12.6 Multi-Classification via TensorFlow
12.7 Remarks
Reference
13 Multilayer Perceptron (MLP) for Regression and Classification
13.1 The General Architecture and Formulations of MLP
13.1.1 The general architecture
13.1.2 The xw+b formulation
13.1.3 The xw formulation, use of affine transformation weight matrix
13.1.4 MLP configuration with affine transformation weight matrix
13.1.5 Space evolution process in MLP
13.2 Neurons-Samples Theory
13.2.1 Affine spaces and the training parameters used in an MLP
13.2.2 Neurons-Samples Theory for MLPs
13.3 Nonlinear Activation Functions for the Hidden Layers
13.4 General Rule for Estimating Learning Parameters in an MLP
13.5 Key Techniques for MLP and Its Capability
13.6 A Case Study on Handwritten Digits Using MXNet
13.6.1 Import necessary libraries and load data
13.6.2 Set neural network model parameters
13.6.3 Softmax cross entropy loss function
13.6.4 Define a neural network model
13.6.5 Optimization method
13.6.6 Model accuracy evaluation
13.6.7 Training the neural network and timing the training
13.6.8 Prediction with the model trained
13.7 Visualization of MLP Weights Using Sklearn
13.7.1 Import necessary Sklearn module
13.7.2 Load MNIST dataset
13.7.3 Set an MLP model
13.7.4 Training the MLP model and time the training
13.7.5 Performance analysis
13.7.6 Viewing the weight matrix as images
13.8 MLP for Nonlinear Regression
13.8.1 California housing data and preprocessing
13.8.2 Configure, train, and test the MLP
13.8.3 Compute and plot the partial dependence
13.8.4 Comparison studies on different regressors
13.8.5 Gradient boosting regressor
13.8.6 Decision tree regressor
References
14 Overfitting and Regularization
14.1 Why Regularization
14.2 Tikhonov Regularization
14.2.1 Demonstration examples: One data-point
14.2.2 Demonstration examples: Two data-points
14.2.3 Demonstration examples: Three data-points
14.2.4 Summary of the case studies
14.3 A Case Study on Regularization Effects using MXNet
14.3.1 Load the MNIST dataset
14.3.2 Define a neural network model
14.3.3 Define loss function and optimizer
14.3.4 Define a function to evaluate the accuracy
14.3.5 Define a utility function plotting convergence curve
14.3.6 Train the neural network model
14.3.7 Evaluation of the trained model: A typical case of overfitting
14.3.8 Application of L2 regularization
14.3.9 Re-initializing the parameters
14.3.10 Training the L2-regularized neural network model
14.3.11 Effect of the L2 regularization
14.4 A Case Study on Regularization Parameters Using Sklearn
References
15 Convolutional Neural Network (CNN) for Classification and Object Detection
15.1 Filter and Convolution
15.2 Affine Transformation Unit in CNNs
15.3 Pooling
15.4 Up Sampling
15.5 Configuration of a Typical CNN
15.6 Some Landmark CNNs
15.6.1 LeNet-5
15.6.2 AlexNet
15.6.3 VGG-16
15.6.4 ResNet
15.6.5 Inception
15.6.6 YOLO: A CONV net for object detection
15.7 An Example of Convolutional Neural Network
15.7.1 Import TensorFlow
15.7.2 Download and preparation of a CIFAR10 dataset
15.7.3 Verification of the data
15.7.4 Creation of Conv2D layers
15.7.5 Add Dense layers to the Conv2D layers
15.7.6 Compile and train the CNN model
15.7.7 Evaluation of the trained CNN model
15.8 Applications of YOLO for Object Detection
References
16 Recurrent Neural Network (RNN) and Sequence Feature Models
16.1 A Typical Structure of LSTMs
16.2 Formulation of LSTMs
16.2.1 General formulation
16.2.2 LSTM layer and standard neural layer
16.2.3 Reduced LSTM
16.3 Peephole LSTM
16.4 Gated Recurrent Units (GRUs)
16.5 Examples
16.5.1 A simple reduced LSTM with a standard NN layer for regression
16.5.2 LSTM class in tensorflow.keras
16.5.3 Using LSTM for handwritten digit recognition
16.5.4 Using LSTM for predicting dynamics of moving vectors
16.6 Examples of LSTM for Speech Recognition
References
17 Unsupervised Learning Techniques
17.1 Background
17.2 K-means for Clustering
17.2.1 Initialization of means
17.2.2 Assignment of data-points to clusters
17.2.3 Update of means
17.2.4 Example 1: Case studies on comparison of initiation methods for K-means clustering
17.2.4.1 Define a function for benchmarking study
17.2.4.2 Generation of synthetic data-points
17.2.4.3 Examination of different initiation methods
17.2.4.4 Visualize the clustering results
17.2.5 Example 2: K-means clustering on the handwritten digit dataset
17.2.5.1 Load handwritten digit dataset
17.2.5.2 Examination of different initiation methods
17.2.5.3 Visualize the results for handwritten digit clustering using PCA
17.3 Mean-Shift for Clustering Without Pre-Specifying k
17.4 Autoencoders
17.4.1 Basic structure of autoencoders
17.4.2 Example 1: Image compression and denoising
17.4.3 Example 2: Image segmentation
17.5 Autoencoder vs. PCA
17.6 Variational Autoencoder (VAE)
17.6.1.1 Key ideas in VAE
17.6.1.2 KL-divergence for two single-variable normal distributions
17.6.1.3 KL-divergence for two multi-variable normal distributions
References
18 Reinforcement Learning (RL)
18.1 Basic Underlying Concept
18.1.1 Problem statement
18.1.2 Applications in sciences, engineering, and business
18.1.3 Reinforcement learning approach
18.1.4 Actions in discrete time: Solution strategy
18.2 Markov Decision Process
18.3 Policy
18.4 Value Functions
18.5 Bellman Equation
18.6 Q-learning Algorithm
18.6.1 Example 1: A robot explores a room with unknown obstacles with Q-learning algorithm
18.6.2 OpenAI Gym
18.6.3 Define utility functions
18.6.4 A simple Q-learning algorithm
18.6.5 Hyper-parameters and convergence
18.7 Q-Network Learning
18.7.1 Example 2: A robot explores a room with unknown obstacles with Q-Network
18.7.2 Building TensorFlow graph
18.7.3 Results from the Q-Network
18.8 Policy gradient methods
18.8.1 PPO with NN policy
18.8.2 Strategy used in policy gradient methods and PPO
18.8.2.1 Build an NN model for policy
18.8.2.2 P and R formulation
18.8.3 Ratio policy
18.8.4 PPO: Controlling a pole staying upright
18.8.5 Save and reload the learned model
18.8.6 Evaluate and view the trained model
18.8.7 PPO: Self-driving car
18.8.8 View samples of the racing car before training
18.8.9 Train the racing car using the CNN policy
18.8.10 Evaluate and view the learned model
18.9 Remarks
References
Index

📜 SIMILAR VOLUMES

Machine Learning with Python: Theory and

📁 Machine Learning with Python: Theory and Implementation

✍ Amin Zollanvari 📂 Library 📅 2023 🏛 Springer 🌐 English

<span>This book is meant as a textbook for undergraduate and graduate students who are willing to understand essential elements of machine learning from both a theoretical and a practical perspective. The choice of the topics in the book is made based on one criterion: whether the practical utility

Machine Learning with Python: Theory and

📁 Machine Learning with Python: Theory and Implementation

✍ Amin Zollanvari 📂 Library 📅 2023 🏛 Springer 🌐 English

Applied Machine Learning with Python

📁 Applied Machine Learning with Python

✍ Andrea Giussani 📂 Library 📅 2020 🏛 Bocconi University Press 🌐 English

If you are looking for an engaging book, rich in learning features, which will guide you through the field of Machine Learning, this is it. This book is a modern, concise guide of the topic. It focuses on current ensemble and boosting methods, highlighting contemporray techniques such as XGBoost (20

Machine Learning Theory and Applications

📁 Machine Learning Theory and Applications: Hands-on Use Cases with Python on Classical and Quantum Machines

✍ Xavier Vasques 📂 Library 📅 2024 🏛 Wiley 🌐 English

-Machine Learning Theory and Applications- Enables readers to understand mathematical concepts behind data engineering and machine learning algorithms and apply them using open-source Python libraries Machine Learning Theory and Applications delves into the realm of machine learning and deep l

Machine Learning Theory and Applications

📁 Machine Learning Theory and Applications : Hands-on Use Cases with Python on Classical and Quantum Machines

✍ Vasques, Xavier; 📂 Library 📅 2024 🌐 English

Machine Learning Theory and ApplicationsEnables readers to understand mathematical concepts behind data engineering and machine learning algorithms and apply them using open-source Python libraries Machine Learning Theory and Applications delves into the realm of machine learning and deep learning,

Python Machine Learning : Perform Python

📁 Python Machine Learning : Perform Python Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow

✍ Vahid Mirjalili, Sebastian Raschka 📂 Library 📅 2017 🏛 Packt Publishing 🌐 English

Unlock modern machine learning and deep learning techniques with Python by using the latest cutting-edge open source Python libraries. About This Book Second edition of the bestselling book on Machine Learning A practical approach to key frameworks in data science, machine learning, and deep learnin