First-Order and Stochastic Optimization Methods for Machine Learning

✍ Scribed by Guanghui Lan

Publisher: Springer Nature
Year: 2020
Tongue: English
Leaves: 591
Series: Springer in the Data Sciences
Edition: 1
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book covers not only foundational materials but also the most recent progresses made during the past few years on the area of machine learning algorithms. In spite of the intensive research and development in this area, there does not exist a systematic treatment to introduce the fundamental concepts and recent progresses on machine learning algorithms, especially on those based on stochastic optimization methods, randomized algorithms, nonconvex optimization, distributed and online learning, and projection free methods. This book will benefit the broad audience in the area of machine learning, artificial intelligence and mathematical programming community by presenting these recent developments in a tutorial style, starting from the basic building blocks to the most carefully designed and complicated algorithms for machine learning.

✦ Table of Contents

Preface
Contents
1 Machine Learning Models
1.1 Linear Regression
1.2 Logistic Regression
1.3 Generalized Linear Models
1.3.1 Exponential Family
1.3.2 Model Construction
1.4 Support Vector Machines
1.5 Regularization, Lasso, and Ridge Regression
1.6 Population Risk Minimization
1.7 Neural Networks
1.8 Exercises and Notes
2 Convex Optimization Theory
2.1 Convex Sets
2.1.1 Definition and Examples
2.1.2 Projection onto Convex Sets
2.1.3 Separation Theorem
2.2 Convex Functions
2.2.1 Definition and Examples
2.2.2 Differentiable Convex Functions
2.2.3 Non-differentiable Convex Functions
2.2.4 Lipschitz Continuity of Convex Functions
2.2.5 Optimality Conditions for Convex Optimization
2.2.6 Representer Theorem and Kernel
2.3 Lagrange Duality
2.3.1 Lagrange Function and Duality
2.3.2 Proof of Strong Duality
2.3.3 Saddle Points
2.3.4 Karush–Kuhn–Tucker Conditions
2.3.5 Dual Support Vector Machine
2.4 Legendre–Fenchel Conjugate Duality
2.4.1 Closure of Convex Functions
2.4.2 Conjugate Functions
2.5 Exercises and Notes
3 Deterministic Convex Optimization
3.1 Subgradient Descent
3.1.1 General Nonsmooth Convex Problems
3.1.2 Nonsmooth Strongly Convex Problems
3.1.3 Smooth Convex Problems
3.1.4 Smooth and Strongly Convex Problems
3.2 Mirror Descent
3.3 Accelerated Gradient Descent
3.4 Game Interpretation for Accelerated Gradient Descent
3.5 Smoothing Scheme for Nonsmooth Problems
3.6 Primal–Dual Method for Saddle-Point Optimization
3.6.1 General Bilinear Saddle Point Problems
3.6.2 Smooth Bilinear Saddle Point Problems
3.6.3 Smooth and Strongly Convex Bilinear Saddle PointProblems
3.6.4 Linearly Constrained Problems
3.7 Alternating Direction Method of Multipliers
3.8 Mirror-Prox Method for Variational Inequalities
3.8.1 Monotone Variational Inequalities
3.8.2 Generalized Monotone Variational Inequalities
3.9 Accelerated Level Method
3.9.1 Nonsmooth, Smooth, and Weakly SmoothProblems
3.9.2 Saddle Point Problems
3.10 Exercises and Notes
4 Stochastic Convex Optimization
4.1 Stochastic Mirror Descent
4.1.1 General Nonsmooth Convex Functions
4.1.2 Smooth Convex Problems
4.1.3 Accuracy Certificates
4.2 Stochastic Accelerated Gradient Descent
4.2.1 Problems Without Strong Convexity
4.2.2 Nonsmooth Strongly Convex Problems
4.2.3 Smooth and Strongly Convex Problems
4.2.4 Accuracy Certificates
4.3 Stochastic Convex–Concave Saddle Point Problems
4.3.1 General Algorithmic Framework
4.3.2 Minimax Stochastic Problems
4.3.3 Bilinear Matrix Games
4.4 Stochastic Accelerated Primal–Dual Method
4.4.1 Accelerated Primal–Dual Method
4.4.2 Stochastic Bilinear Saddle Point Problems
4.5 Stochastic Accelerated Mirror-Prox Method
4.5.1 Algorithmic Framework
4.5.2 Convergence Analysis
4.6 Stochastic Block Mirror Descent Method
4.6.1 Nonsmooth Convex Optimization
4.6.1.1 The SBMD Algorithm for Nonsmooth Problems
4.6.1.2 Convergence Properties of SBMD for Nonsmooth Problems
4.6.1.3 Nonsmooth Strongly Convex Problems
4.6.1.4 Large-Deviation Properties for Nonsmooth Problems
4.6.2 Convex Composite Optimization
4.7 Exercises and Notes
5 Convex Finite-Sum and Distributed Optimization
5.1 Random Primal–Dual Gradient Method
5.1.1 Multi-Dual-Player Game Reformulation
5.1.2 Randomization on Gradient Computation
5.1.3 Convergence for Strongly Convex Problems
5.1.3.1 Some Basic Tools
5.1.3.2 General Results for RPDG
5.1.3.3 Main Convergence Results
5.1.4 Lower Complexity Bound for Randomized Methods
5.1.5 Generalization to Problems Without StrongConvexity
5.1.5.1 Smooth Problems with Bounded Feasible Sets
5.1.5.2 Structured Nonsmooth Problems
5.1.5.3 Unconstrained Smooth Problems
5.2 Random Gradient Extrapolation Method
5.2.1 Gradient Extrapolation Method
5.2.1.1 The Algorithm
5.2.1.2 Convergence of GEM
5.2.2 Deterministic Finite-Sum Problems
5.2.2.1 Algorithmic Framework
5.2.2.2 Convergence Analysis
5.2.3 Stochastic Finite-Sum Problems
5.2.4 Distributed Implementation
5.3 Variance-Reduced Mirror Descent
5.3.1 Smooth Problems Without Strong Convexity
5.3.2 Smooth and Strongly Convex Problems
5.4 Variance-Reduced Accelerated Gradient Descent
5.4.1 Smooth Problems Without Strong Convexity
5.4.2 Smooth and Strongly Convex Problems
5.4.3 Problems Satisfying an Error-Bound Condition
5.5 Exercises and Notes
6 Nonconvex Optimization
6.1 Unconstrained Nonconvex Stochastic Optimization
6.1.1 Stochastic First-Order Methods
6.1.1.1 The Randomized Stochastic Gradient Method
6.1.1.2 A Two-Phase Randomized Stochastic Gradient Method
6.1.2 Stochastic Zeroth-Order Methods
6.1.2.1 The Randomized Stochastic Gradient Free Method
6.1.2.2 A Two-Phase Randomized Stochastic Gradient Free Method
6.2 Nonconvex Stochastic Composite Optimization
6.2.1 Some Properties of Prox-Mapping
6.2.2 Nonconvex Mirror Descent Methods
6.2.3 Nonconvex Stochastic Mirror Descent Methods
6.2.3.1 A Randomized Stochastic Mirror Descent Method
6.2.3.2 A Two-Phase Randomized Stochastic Mirror Descent Method
6.2.4 Stochastic Zeroth-Order Methods for CompositeProblems
6.3 Nonconvex Stochastic Block Mirror Descent
6.4 Nonconvex Stochastic Accelerated Gradient Descent
6.4.1 Nonconvex Accelerated Gradient Descent
6.4.1.1 Minimization of Smooth Functions
6.4.1.2 Minimization of Nonconvex Composite Functions
6.4.2 Stochastic Accelerated Gradient Descent Method
6.4.2.1 Minimization of Stochastic Smooth Functions
6.4.2.2 Minimization of Nonconvex Stochastic Composite Functions
6.5 Nonconvex Variance-Reduced Mirror Descent
6.5.1 Basic Scheme for Deterministic Problems
6.5.2 Generalization for Stochastic OptimizationProblems
6.6 Randomized Accelerated Proximal-Point Methods
6.6.1 Nonconvex Finite-Sum Problems
6.6.1.1 The RapGrad Algorithm
6.6.1.2 Convergence Analysis for RapGrad
6.6.2 Nonconvex Multi-Block Problems
6.6.2.1 The RapDual Algorithm
6.6.2.2 Convergence Analysis for RapDual
6.7 Exercises and Notes
7 Projection-Free Methods
7.1 Conditional Gradient Method
7.1.1 Classic Conditional Gradient
7.1.1.1 Smooth Convex Problems Under an LO Oracle
7.1.1.2 Bilinear Saddle Point Problems Under an LO Oracle
7.1.1.3 General Nonsmooth Problems Under an LO Oracle
7.1.1.4 Strongly Convex Problems Under an Enhanced LO Oracle
7.1.2 New Variants of Conditional Gradient
7.1.2.1 Primal Averaging CndG Method
7.1.2.2 Primal–Dual Averaging CndG Methods
7.1.3 Lower Complexity Bound
7.1.3.1 A Generic LCP Algorithm
7.1.3.2 Lower Complexity Bounds for Smooth Minimization
7.1.3.3 Lower Complexity Bounds for Nonsmooth Minimization
7.2 Conditional Gradient Sliding Method
7.2.1 Deterministic Conditional Gradient Sliding
7.2.1.1 Smooth Convex Optimization
7.2.1.2 Strongly Convex Optimization
7.2.2 Stochastic Conditional Gradient Sliding Method
7.2.2.1 The Algorithm and the Main Convergence Results
7.2.2.2 The Large Deviation Results
7.2.3 Generalization to Saddle Point Problems
7.3 Nonconvex Conditional Gradient Method
7.4 Stochastic Nonconvex Conditional Gradient
7.4.1 Basic Scheme for Finite-Sum Problems
7.4.2 Generalization for Stochastic OptimizationProblems
7.5 Stochastic Nonconvex Conditional Gradient Sliding
7.5.1 Wolfe Gap vs Projected Gradient
7.5.2 Projection-Free Method to Drive ProjectedGradient Small
7.6 Exercises and Notes
8 Operator Sliding and Decentralized Optimization
8.1 Gradient Sliding for Composite Optimization
8.1.1 Deterministic Gradient Sliding
8.1.2 Stochastic Gradient Sliding
8.1.3 Strongly Convex and Structured NonsmoothProblems
8.1.3.1 Strongly Convex Optimization
8.1.3.2 Structured Nonsmooth Problems
8.2 Accelerated Gradient Sliding
8.2.1 Composite Smooth Optimization
8.2.2 Composite Bilinear Saddle Point Problems
8.2.2.1 Saddle Point Problems
8.2.2.2 Strongly Convex Composite Saddle Point Problems
8.3 Communication Sliding and Decentralized Optimization
8.3.1 Problem Formulation
8.3.1.1 Problem Formulation
8.3.1.2 Gap Functions: Termination Criteria
8.3.1.3 Prox-Function
8.3.2 Decentralized Communication Sliding
8.3.2.1 The DCS Algorithm
8.3.2.2 Convergence of DCS on General Convex Functions
8.3.2.3 Boundedness of "026B30D y*"026B30D
8.3.2.4 Convergence of DCS on Strongly Convex Functions
8.3.3 Stochastic Decentralized Communication Sliding
8.3.3.1 The SDCS Algorithm
8.3.3.2 Convergence of SDCS on General Convex Functions
8.3.3.3 Convergence of SDCS on Strongly Convex Functions
8.3.4 High Probability Results
8.3.5 Convergence Analysis
8.4 Exercises and Notes
References
Index

📜 SIMILAR VOLUMES

First-order and stochastic optimization

📁 First-order and stochastic optimization methods for machine learning

✍ Lan G 📂 Library 📅 2020 🏛 Springer 🌐 English

Accelerated Optimization for Machine Lea

📁 Accelerated Optimization for Machine Learning: First-Order Algorithms

✍ Zhouchen Lin; Huan Li; Cong Fang 📂 Library 📅 2020 🏛 Springer 🌐 English

This book on optimization includes forewords by Michael I. Jordan, Zongben Xu and Zhi-Quan Luo. Machine learning relies heavily on optimization to solve problems with its learning models, and first-order optimization algorithms are the mainstream approaches. The acceleration of first-order optimizat

Optimization and Machine Learning: Optim

📁 Optimization and Machine Learning: Optimization for Machine Learning and Machine Learning for Optimization

✍ Rachid Chelouah, Patrick Siarry 📂 Library 📅 2022 🏛 Wiley-ISTE 🌐 English

<span>Machine learning and optimization techniques are revolutionizing our world. Other types of information technology have not progressed as rapidly in recent years, in terms of real impact. The aim of this book is to present some of the innovative techniques in the field of optimization and machi

Optimization and Machine Learning: Optim

📁 Optimization and Machine Learning: Optimization for Machine Learning and Machine Learning for Optimization

✍ Rachid Chelouah 📂 Library 📅 2022 🏛 Wiley-Iste 🌐 English

Machine learning and optimization techniques are revolutionizing our world. Other types of information technology have not progressed as rapidly in recent years, in terms of real impact. The aim of this book is to present some of the innovative techniques in the field of optimization and machine lea

Stochastic Optimization for Large-scale

📁 Stochastic Optimization for Large-scale Machine Learning

✍ Vinod Kumar Chauhan 📂 Library 📅 2021 🏛 CRC Press 🌐 English

<p>Advancements in the technology and availability of data sources have led to the `Big Data' era. Working with large data offers the potential to uncover more fine-grained patterns and take timely and accurate decisions, but it also creates a lot of challenges such as slow training and scalability

Machine Learning and Optimization for En

📁 Machine Learning and Optimization for Engineering Design (Engineering Optimization: Methods and Applications)

✍ Apoorva S. Shastri (editor), Kailash Shaw (editor), Mangal Singh (editor) 📂 Library 📅 2023 🏛 Springer 🌐 English

<p><span>This book aims to provide a collection of state-of-the-art scientific and technical research papers related to machine learning-based algorithms in the field of optimization and engineering design. The theoretical and practical development for numerous engineering applications such as smart