Machine Learning Methods

✍ Scribed by Hang Li

Publisher: Springer
Year: 2023
Tongue: English
Leaves: 530
Edition: 1st ed. 2024
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book provides a comprehensive and systematic introduction to the principal machine learning methods, covering both supervised and unsupervised learning methods. It discusses essential methods of classification and regression in supervised learning, such as decision trees, perceptrons, support vector machines, maximum entropy models, logistic regression models and multiclass classification, as well as methods applied in supervised learning, like the hidden Markov model and conditional random fields. In the context of unsupervised learning, it examines clustering and other problems as well as methods such as singular value decomposition, principal component analysis and latent semantic analysis. As a fundamental book on machine learning, it addresses the needs of researchers and students who apply machine learning as an important tool in their research, especially those in fields such as information retrieval, natural language processing and text data mining. In order to understand the concepts and methods discussed, readers are expected to have an elementary knowledge of advanced mathematics, linear algebra and probability statistics. The detailed explanations of basic principles, underlying concepts and algorithms enable readers to grasp basic techniques, while the rigorous mathematical derivations and specific examples included offer valuable insights into machine learning.

✦ Table of Contents

Preface
Contents
1 Introduction to Machine Learning and Supervised Learning
1.1 Machine Learning
1.1.1 Characteristics of Machine Learning
1.1.2 The Object of Machine Learning
1.1.3 The Purpose of Machine Learning
1.1.4 Methods of Machine Learning
1.1.5 The Study of Machine Learning
1.1.6 The Importance of Machine Learning
1.2 Classification of Machine Learning
1.2.1 The Basic Classification
1.2.2 Classification by Model Types
1.2.3 Classification by Algorithm
1.2.4 Classification by Technique
1.3 Three Elements of Machine Learning Methods
1.3.1 Model
1.3.2 Strategy
1.3.3 Algorithm
1.4 Model Evaluation and Model Selection
1.4.1 Training Error and Test Error
1.4.2 Over-Fitting and Model Selection
1.5 Regularization and Cross-Validation
1.5.1 Regularization
1.5.2 Cross-Validation
1.6 Generalization Ability
1.6.1 Generalization Error
1.6.2 Generalization Error Bound
1.7 Generative Approach and Discriminative Model
1.8 Supervised Learning Application
1.8.1 Classification
1.8.2 Tagging
1.8.3 Regression
References
2 Perceptron
2.1 The Perceptron Model
2.2 Perceptron Learning Strategy
2.2.1 Linear Separability of the Dataset
2.2.2 Perceptron Learning Strategy
2.3 Perceptron Learning Algorithm
2.3.1 The Primal Form of the Perceptron Learning Algorithm
2.3.2 Convergence of the Algorithm
2.3.3 The Dual Form of the Perceptron Learning Algorithm
References
3 K-Nearest Neighbor
3.1 The K-Nearest Neighbor Algorithm
3.2 The K-Nearest Neighbor Model
3.2.1 Model
3.2.2 Distance Metrics
3.2.3 The Selection of k Value
3.2.4 Classification Decision Rule
3.3 Implementation of K-Nearest Neighbor: The kd-Tree
3.3.1 Constructing the kd-Tree
3.3.2 Searching for kd-Tree
References
4 The Naïve Bayes Method
4.1 The Learning and Classification of Naïve Bayes
4.1.1 Basic Methods
4.1.2 Implications of Posterior Probability Maximization
4.2 Parameter Estimation of the Naïve Bayes Method
4.2.1 Maximum Likelihood Estimation
4.2.2 Learning and Classification Algorithms
4.2.3 Bayesian Estimation
References
5 Decision Tree
5.1 Decision Tree Model and Learning
5.1.1 Decision Tree Model
5.1.2 Decision Tree and If-Then Rules
5.1.3 Decision Tree and Conditional Probability Distributions
5.1.4 Decision Tree Learning
5.2 Feature Selection
5.2.1 The Feature Selection Problem
5.2.2 Information Gain
5.2.3 Information Gain Ratio
5.3 Generation of Decision Tree
5.3.1 ID3 Algorithm
5.3.2 C4.5 Generation Algorithm
5.4 Pruning of Decision Tree
5.5 CART Algorithm
5.5.1 CART Generation
5.5.2 CART Pruning
References
6 Logistic Regression and Maximum Entropy Model
6.1 Logistic Regression Model
6.1.1 Logistic Distribution
6.1.2 Binomial Logistic Regression Model
6.1.3 Model Parameter Estimation
6.1.4 Multi-nomial Logistic Regression
6.2 Maximum Entropy Model
6.2.1 Maximum Entropy Principle
6.2.2 Definition of Maximum Entropy Model
6.2.3 Learning of the Maximum Entropy Model
6.2.4 Maximum Likelihood Estimation
6.3 Optimization Algorithm of Model Learning
6.3.1 Improved Iterative Scaling
6.3.2 Quasi-Newton Method
References
7 Support Vector Machine
7.1 Linear Support Vector Machine in the Linearly Separable Case and Hard Margin Maximization
7.1.1 Linear Support Vector Machine in the Linearly Separable Case
7.1.2 Function Margin and Geometric Margin
7.1.3 Maximum Margin
7.1.4 Dual Algorithm of Learning
7.2 Linear Support Vector Machine and Soft Margin Maximization
7.2.1 Linear Support Vector Machine
7.2.2 Dual Learning Algorithm
7.2.3 Support Vector
7.2.4 Hinge Loss Function
7.3 Non-Linear Support Vector Machine and Kernel Functions
7.3.1 Kernel Trick
7.3.2 Positive Definite Kernel
7.3.3 Commonly Used Kernel Functions
7.3.4 Nonlinear Support Vector Classifier
7.4 Sequential Minimal Optimization Algorithm
7.4.1 The Method of Solving Two-Variable Quadratic Programming
7.4.2 Selection Methods of Variables
7.4.3 SMO Algorithm
References
8 Boosting
8.1 AdaBoost Algorithm
8.1.1 The Basic Idea of Boosting
8.1.2 AdaBoost Algorithm
8.1.3 AdaBoost Example
8.2 Training Error Analysis of AdaBoost Algorithm
8.3 Explanation of AdaBoost Algorithm
8.3.1 Forward Stepwise Algorithm
8.3.2 Forward Stepwise Algorithm and AdaBoost
8.4 Boosting Tree
8.4.1 Boosting Tree Model
8.4.2 Boosting Tree Algorithm
8.4.3 Gradient Boosting
References
9 EM Algorithm and Its Extensions
9.1 Introduction of the EM Algorithm
9.1.1 EM Algorithm
9.1.2 Derivation of the EM Algorithm
9.1.3 Application of the EM Algorithm in Unsupervised Learning
9.2 The Convergence of the EM Algorithm
9.3 Application of the EM Algorithm in the Learning of the Gaussian Mixture Model
9.3.1 Gaussian Mixture Model
9.3.2 The EM Algorithm for Parameter Estimation of the Gaussian Mixture Model
9.4 Extensions of the EM Algorithm
9.4.1 The Maximization-Maximization Algorithm of F-Function
9.4.2 GEM Algorithm
9.5 Summary
9.6 Further Reading
9.7 Exercises
References
10 Hidden Markov Model
10.1 The Basic Concept of Hidden Markov Model
10.1.1 Definition of Hidden Markov Model
10.1.2 The Generation Process of the Observation Sequence
10.1.3 Three Basic Problems of the Hidden Markov Model
10.2 Probability Calculation Algorithms
10.2.1 Direct Calculation Method
10.2.2 Forward Algorithm
10.2.3 Backward Algorithm
10.2.4 Calculation of Some Probabilities and Expected Values
10.3 Learning Algorithms
10.3.1 Supervised Learning Methods
10.3.2 Baum-Welch Algorithm
10.3.3 Baum-Welch Model Parameter Estimation Formula
10.4 Prediction Algorithm
10.4.1 Approximation Algorithm
10.4.2 Viterbi Algorithm
References
11 Conditional Random Field
11.1 Probabilistic Undirected Graphical Model
11.1.1 Model Definition
11.1.2 Factorization of Probabilistic Undirected Graphical Model
11.2 The Definition and Forms of Conditional Random Field
11.2.1 The Definition of Conditional Random Field
11.2.2 The Parameterized Form of the Conditional Random Field
11.2.3 The Simplified Form of Conditional Random Field
11.2.4 The Matrix Form of the Conditional Random Field
11.3 The Probability Computation Problem of Conditional Random Field
11.3.1 Forward–Backward Algorithm
11.3.2 Probability Computation
11.3.3 The Computation of Expected Value
11.4 Learning Algorithms of Conditional Random Field
11.4.1 Improved Iterative Scaling
11.4.2 Quasi-Newton Method
11.5 The Prediction Algorithm of Conditional Random Field
References
12 Summary of Supervised Learning Methods
12.1 Application
12.2 Models
12.3 Learning Strategies
12.4 Learning Algorithms
13 Introduction to Unsupervised Learning
13.1 The Fundamentals of Unsupervised Learning
13.2 Basic Issues
13.2.1 Clustering
13.2.2 Dimensionality Reduction
13.2.3 Probability Model Estimation
13.3 Three Elements of Machine Learning
13.4 Unsupervised Learning Methods
13.4.1 Clustering
13.4.2 Dimensionality Reduction
13.4.3 Topic Modeling
13.4.4 Graph Analytics
References
14 Clustering
14.1 Basic Concepts of Clustering
14.1.1 Similarity or Distance
14.1.2 Class or Cluster
14.1.3 Distance Between Classes
14.2 Hierarchical Clustering
14.3 k-means Clustering
14.3.1 Model
14.3.2 Strategy
14.3.3 Algorithm
14.3.4 Algorithm Characteristics
References
15 Singular Value Decomposition
15.1 Introduction
15.2 Definition and Properties of Singular Value Decomposition
15.2.1 Definition and Theorem
15.2.2 Compact Singular Value Decomposition and Truncated Singular Value Decomposition
15.2.3 Geometry Interpretation
15.2.4 Main Properties
15.3 Computation of Singular Value Decomposition
15.4 Singular Value Decomposition and Matrix Approximation
15.4.1 Frobenius Norm
15.4.2 Optimal Approximation of the Matrix
15.4.3 The Outer Product Expansion of Matrix
References
16 Principal Component Analysis
16.1 Overall Principal Component Analysis
16.1.1 Basic Ideas
16.1.2 Definition and Derivation
16.1.3 Main Properties
16.1.4 The Number of Principal Components
16.1.5 The Overall Principal Components of Normalized Variables
16.2 Sample Principal Component Analysis
16.2.1 The Definition and Properties of the Sample Principal Components
16.2.2 Eigenvalue Decomposition Algorithm of Aorrelation Matrix
16.2.3 Singular Value Decomposition Algorithm for Data Matrix
References
17 Latent Semantic Analysis
17.1 Word Vector Space and Topic Vector Space
17.1.1 Word Vector Space
17.1.2 Topic Vector Space
17.2 Latent Semantic Analysis Algorithm
17.2.1 Matrix Singular Value Decomposition Algorithm
17.2.2 Examples
17.3 Non-negative Matrix Factorization Algorithm
17.3.1 Non-negative Matrix Factorization
17.3.2 Latent Semantic Analysis Model
17.3.3 Formalization of Non-negative Matrix Factorization
17.3.4 Algorithm
References
18 Probabilistic Latent Semantic Analysis
18.1 Probabilistic Latent Semantic Analysis Model
18.1.1 Basic Ideas
18.1.2 Generative Model
18.1.3 Co-occurrence Model
18.1.4 Model Properties
18.2 Algorithms for Probabilistic Latent Semantic Analysis
References
19 Markov Chain Monte Carlo Method
19.1 Monte Carlo Method
19.1.1 Random Sampling
19.1.2 Mathematical Expectation Estimate
19.1.3 Integral Computation
19.2 Markov Chain
19.2.1 Basic Definition
19.2.2 Discrete-Time Markov Chain
19.2.3 Continuous-Time Markov Chain
19.2.4 Properties of Markov Chain
19.3 Markov Chain Monte Carlo Method
19.3.1 Basic Ideas
19.3.2 Basic Steps
19.3.3 Markov Chain Monte Carlo Method and Machine Learning
19.4 Metropolis–Hasting Algorithm
19.4.1 Fundamental Concepts
19.4.2 Metropolis–Hastings Algorithm
19.4.3 The Single-Component Metropolis–Hastings Algorithm
19.5 Gibbs Sampling
19.5.1 Basic Principles
19.5.2 Gibbs Sampling Algorithm
19.5.3 Sampling Computation
References
20 Latent Dirichlet Allocation
20.1 Dirichlet Distribution
20.1.1 Definition of Distribution
20.1.2 Conjugate Prior
20.2 Latent Dirichlet Allocation Model
20.2.1 Basic Ideas
20.2.2 Model Definition
20.2.3 Probability Graphical Model
20.2.4 The Changeability of Random Variable Sequences
20.2.5 Probability Formula
20.3 Gibbs Sampling Algorithm for LDA
20.3.1 Basic Ideas
20.3.2 Major Parts of Algorithm
20.3.3 Algorithm Post-processing
20.3.4 Algorithm
20.4 Variational EM Algorithm for LDA
20.4.1 Variational Reasoning
20.4.2 Variational EM Algorithm
20.4.3 Algorithm Derivation
20.4.4 Algorithm Summary
References
21 The PageRank Algorithm
21.1 The Definition of PageRank
21.1.1 Basic Ideas
21.1.2 The Directed Graph and Random Walk Model
21.1.3 The Basic Definition of PageRank
21.1.4 General Definition of PageRank
21.2 Computation of PageRank
21.2.1 Iterative Algorithm
21.2.2 Power Method
21.2.3 Algebraic Algorithms
References
22 A Summary of Unsupervised Learning Methods
22.1 The Relationships and Characteristics of Unsupervised Learning Methods
22.1.1 The Relationships Between Various Methods
22.1.2 Unsupervised Learning Methods
22.1.3 Basic Machine Learning Methods
22.2 The Relationships and Characteristics of Topic Models
References
Appendix A Gradient Descent
Appendix B Newton Method and Quasi-Newton Method
Appendix C Language Duality
Appendix D Basic Subspaces of Matrix
Appendix E The Definition of KL Divergence and the Properties of Dirichlet Distribution
Color Diagrams
Index

📜 SIMILAR VOLUMES

Machine Learning Methods

📁 Machine Learning Methods

✍ Hang Li 📂 Library 📅 2023 🏛 Springer 🌐 English

<span>This book provides a comprehensive and systematic introduction to the principal machine learning methods, covering both supervised and unsupervised learning methods. It discusses essential methods of classification and regression in supervised learning, such as decision trees, perceptrons, sup

Machine Learning Methods

📁 Machine Learning Methods

✍ Hang Li 📂 Library 📅 2023 🏛 Springer Nature Singapore 🌐 English

Ensemble Methods for Machine Learning

📁 Ensemble Methods for Machine Learning

✍ Gautam Kunapuli 📂 Library 📅 2023 🏛 Manning 🌐 English

Ensemble machine learning combines the power of multiple machine learning approaches, working together to deliver models that are highly performant and highly accurate. Inside Ensemble Methods for Machine Learning you will find: • Methods for classification, regression, and recommendations • So

Statistical Methods for Machine Learning

📁 Statistical Methods for Machine Learning

✍ Jason Brownlee 📂 Library 📅 0 🌐 English

Statistical Methods for Machine Learning

📁 Statistical Methods for Machine Learning

✍ Brownlee J. 📂 Library 📅 2019 🌐 English

Information Fusion: Machine Learning Met

📁 Information Fusion: Machine Learning Methods

✍ Jinxing Li, Bob Zhang, David Zhang 📂 Library 📅 2022 🏛 Springer 🌐 English

<p><span>In the big data era, increasing information can be extracted from the same source object or scene. For instance, a person can be verified based on their fingerprint, palm print, or iris information, and a given image can be represented by various types of features, including its texture, co