<p>This book covers not only foundational materials but also the most recent progresses made during the past few years on the area of machine learning algorithms. In spite of the intensive research and development in this area, there does not exist a systematic treatment to introduce the fundamental
First-order and stochastic optimization methods for machine learning
โ Scribed by Lan G
- Publisher
- Springer
- Year
- 2020
- Tongue
- English
- Leaves
- 591
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Table of Contents
Preface......Page 7
Contents......Page 9
1.1 Linear Regression......Page 14
1.2 Logistic Regression......Page 17
1.3.1 Exponential Family......Page 20
1.3.2 Model Construction......Page 21
1.4 Support Vector Machines......Page 24
1.5 Regularization, Lasso, and Ridge Regression......Page 28
1.6 Population Risk Minimization......Page 29
1.7 Neural Networks......Page 30
1.8 Exercises and Notes......Page 33
2.1.1 Definition and Examples......Page 34
2.1.2 Projection onto Convex Sets......Page 36
2.1.3 Separation Theorem......Page 38
2.2.1 Definition and Examples......Page 42
2.2.2 Differentiable Convex Functions......Page 43
2.2.3 Non-differentiable Convex Functions......Page 44
2.2.4 Lipschitz Continuity of Convex Functions......Page 46
2.2.5 Optimality Conditions for Convex Optimization......Page 48
2.2.6 Representer Theorem and Kernel......Page 49
2.3 Lagrange Duality......Page 50
2.3.1 Lagrange Function and Duality......Page 51
2.3.2 Proof of Strong Duality......Page 52
2.3.3 Saddle Points......Page 54
2.3.4 KarushโKuhnโTucker Conditions......Page 55
2.3.5 Dual Support Vector Machine......Page 57
2.4.1 Closure of Convex Functions......Page 58
2.4.2 Conjugate Functions......Page 61
2.5 Exercises and Notes......Page 63
3.1 Subgradient Descent......Page 65
3.1.1 General Nonsmooth Convex Problems......Page 66
3.1.2 Nonsmooth Strongly Convex Problems......Page 69
3.1.3 Smooth Convex Problems......Page 70
3.2 Mirror Descent......Page 72
3.3 Accelerated Gradient Descent......Page 76
3.4 Game Interpretation for Accelerated Gradient Descent......Page 81
3.5 Smoothing Scheme for Nonsmooth Problems......Page 84
3.6 PrimalโDual Method for Saddle-Point Optimization......Page 86
3.6.1 General Bilinear Saddle Point Problems......Page 90
3.6.2 Smooth Bilinear Saddle Point Problems......Page 91
3.6.3 Smooth and Strongly Convex Bilinear Saddle PointProblems......Page 92
3.6.4 Linearly Constrained Problems......Page 93
3.7 Alternating Direction Method of Multipliers......Page 96
3.8 Mirror-Prox Method for Variational Inequalities......Page 98
3.8.1 Monotone Variational Inequalities......Page 99
3.8.2 Generalized Monotone Variational Inequalities......Page 101
3.9 Accelerated Level Method......Page 104
3.9.1 Nonsmooth, Smooth, and Weakly SmoothProblems......Page 105
3.9.2 Saddle Point Problems......Page 114
3.10 Exercises and Notes......Page 120
4.1 Stochastic Mirror Descent......Page 124
4.1.1 General Nonsmooth Convex Functions......Page 125
4.1.2 Smooth Convex Problems......Page 129
4.1.3 Accuracy Certificates......Page 133
4.2 Stochastic Accelerated Gradient Descent......Page 139
4.2.1 Problems Without Strong Convexity......Page 145
4.2.2 Nonsmooth Strongly Convex Problems......Page 148
4.2.3 Smooth and Strongly Convex Problems......Page 150
4.2.4 Accuracy Certificates......Page 155
4.3 Stochastic ConvexโConcave Saddle Point Problems......Page 159
4.3.1 General Algorithmic Framework......Page 160
4.3.2 Minimax Stochastic Problems......Page 164
4.3.3 Bilinear Matrix Games......Page 166
4.4 Stochastic Accelerated PrimalโDual Method......Page 169
4.4.1 Accelerated PrimalโDual Method......Page 171
4.4.2 Stochastic Bilinear Saddle Point Problems......Page 181
4.5 Stochastic Accelerated Mirror-Prox Method......Page 192
4.5.1 Algorithmic Framework......Page 193
4.5.2 Convergence Analysis......Page 195
4.6 Stochastic Block Mirror Descent Method......Page 210
4.6.1.1 The SBMD Algorithm for Nonsmooth Problems......Page 212
4.6.1.2 Convergence Properties of SBMD for Nonsmooth Problems......Page 214
4.6.1.3 Nonsmooth Strongly Convex Problems......Page 217
4.6.1.4 Large-Deviation Properties for Nonsmooth Problems......Page 219
4.6.2 Convex Composite Optimization......Page 222
4.7 Exercises and Notes......Page 229
5.1 Random PrimalโDual Gradient Method......Page 232
5.1.1 Multi-Dual-Player Game Reformulation......Page 236
5.1.2 Randomization on Gradient Computation......Page 238
5.1.3.1 Some Basic Tools......Page 241
5.1.3.2 General Results for RPDG......Page 242
5.1.3.3 Main Convergence Results......Page 247
5.1.4 Lower Complexity Bound for Randomized Methods......Page 252
5.1.5.1 Smooth Problems with Bounded Feasible Sets......Page 257
5.1.5.2 Structured Nonsmooth Problems......Page 259
5.1.5.3 Unconstrained Smooth Problems......Page 260
5.2 Random Gradient Extrapolation Method......Page 262
5.2.1.1 The Algorithm......Page 264
5.2.1.2 Convergence of GEM......Page 266
5.2.2 Deterministic Finite-Sum Problems......Page 271
5.2.2.1 Algorithmic Framework......Page 272
5.2.2.2 Convergence Analysis......Page 273
5.2.3 Stochastic Finite-Sum Problems......Page 282
5.2.4 Distributed Implementation......Page 287
5.3 Variance-Reduced Mirror Descent......Page 289
5.3.1 Smooth Problems Without Strong Convexity......Page 293
5.3.2 Smooth and Strongly Convex Problems......Page 295
5.4 Variance-Reduced Accelerated Gradient Descent......Page 298
5.4.1 Smooth Problems Without Strong Convexity......Page 301
5.4.2 Smooth and Strongly Convex Problems......Page 305
5.4.3 Problems Satisfying an Error-Bound Condition......Page 311
5.5 Exercises and Notes......Page 313
6.1 Unconstrained Nonconvex Stochastic Optimization......Page 315
6.1.1.1 The Randomized Stochastic Gradient Method......Page 318
6.1.1.2 A Two-Phase Randomized Stochastic Gradient Method......Page 324
6.1.2 Stochastic Zeroth-Order Methods......Page 328
6.1.2.1 The Randomized Stochastic Gradient Free Method......Page 329
6.1.2.2 A Two-Phase Randomized Stochastic Gradient Free Method......Page 335
6.2 Nonconvex Stochastic Composite Optimization......Page 338
6.2.1 Some Properties of Prox-Mapping......Page 340
6.2.2 Nonconvex Mirror Descent Methods......Page 342
6.2.3.1 A Randomized Stochastic Mirror Descent Method......Page 344
6.2.3.2 A Two-Phase Randomized Stochastic Mirror Descent Method......Page 352
6.2.4 Stochastic Zeroth-Order Methods for CompositeProblems......Page 357
6.3 Nonconvex Stochastic Block Mirror Descent......Page 362
6.4 Nonconvex Stochastic Accelerated Gradient Descent......Page 369
6.4.1 Nonconvex Accelerated Gradient Descent......Page 371
6.4.1.1 Minimization of Smooth Functions......Page 372
6.4.1.2 Minimization of Nonconvex Composite Functions......Page 378
6.4.2.1 Minimization of Stochastic Smooth Functions......Page 384
6.4.2.2 Minimization of Nonconvex Stochastic Composite Functions......Page 392
6.5 Nonconvex Variance-Reduced Mirror Descent......Page 398
6.5.1 Basic Scheme for Deterministic Problems......Page 399
6.5.2 Generalization for Stochastic OptimizationProblems......Page 402
6.6 Randomized Accelerated Proximal-Point Methods......Page 405
6.6.1.1 The RapGrad Algorithm......Page 406
6.6.1.2 Convergence Analysis for RapGrad......Page 410
6.6.2.1 The RapDual Algorithm......Page 418
6.6.2.2 Convergence Analysis for RapDual......Page 423
6.7 Exercises and Notes......Page 429
7.1 Conditional Gradient Method......Page 431
7.1.1.1 Smooth Convex Problems Under an LO Oracle......Page 433
7.1.1.2 Bilinear Saddle Point Problems Under an LO Oracle......Page 436
7.1.1.3 General Nonsmooth Problems Under an LO Oracle......Page 438
7.1.1.4 Strongly Convex Problems Under an Enhanced LO Oracle......Page 441
7.1.2.1 Primal Averaging CndG Method......Page 443
7.1.2.2 PrimalโDual Averaging CndG Methods......Page 446
7.1.3.1 A Generic LCP Algorithm......Page 449
7.1.3.2 Lower Complexity Bounds for Smooth Minimization......Page 450
7.1.3.3 Lower Complexity Bounds for Nonsmooth Minimization......Page 452
7.2 Conditional Gradient Sliding Method......Page 454
7.2.1.1 Smooth Convex Optimization......Page 456
7.2.1.2 Strongly Convex Optimization......Page 463
7.2.2.1 The Algorithm and the Main Convergence Results......Page 466
7.2.2.2 The Large Deviation Results......Page 471
7.2.3 Generalization to Saddle Point Problems......Page 474
7.3 Nonconvex Conditional Gradient Method......Page 478
7.4 Stochastic Nonconvex Conditional Gradient......Page 479
7.4.1 Basic Scheme for Finite-Sum Problems......Page 480
7.4.2 Generalization for Stochastic OptimizationProblems......Page 484
7.5.1 Wolfe Gap vs Projected Gradient......Page 487
7.5.2 Projection-Free Method to Drive ProjectedGradient Small......Page 488
7.6 Exercises and Notes......Page 492
8.1 Gradient Sliding for Composite Optimization......Page 493
8.1.1 Deterministic Gradient Sliding......Page 496
8.1.2 Stochastic Gradient Sliding......Page 507
8.1.3 Strongly Convex and Structured NonsmoothProblems......Page 514
8.1.3.1 Strongly Convex Optimization......Page 515
8.1.3.2 Structured Nonsmooth Problems......Page 516
8.2 Accelerated Gradient Sliding......Page 519
8.2.1 Composite Smooth Optimization......Page 522
8.2.2 Composite Bilinear Saddle Point Problems......Page 537
8.2.2.1 Saddle Point Problems......Page 538
8.2.2.2 Strongly Convex Composite Saddle Point Problems......Page 539
8.3 Communication Sliding and Decentralized Optimization......Page 542
8.3.1.1 Problem Formulation......Page 545
8.3.1.2 Gap Functions: Termination Criteria......Page 546
8.3.2 Decentralized Communication Sliding......Page 548
8.3.2.1 The DCS Algorithm......Page 549
8.3.2.2 Convergence of DCS on General Convex Functions......Page 551
8.3.2.3 Boundedness of "026B30D y*"026B30D......Page 554
8.3.2.4 Convergence of DCS on Strongly Convex Functions......Page 556
8.3.3.1 The SDCS Algorithm......Page 559
8.3.3.2 Convergence of SDCS on General Convex Functions......Page 560
8.3.3.3 Convergence of SDCS on Strongly Convex Functions......Page 562
8.3.4 High Probability Results......Page 565
8.3.5 Convergence Analysis......Page 567
8.4 Exercises and Notes......Page 575
References......Page 577
Index......Page 586
๐ SIMILAR VOLUMES
This book on optimization includes forewords by Michael I. Jordan, Zongben Xu and Zhi-Quan Luo. Machine learning relies heavily on optimization to solve problems with its learning models, and first-order optimization algorithms are the mainstream approaches. The acceleration of first-order optimizat
<span>Machine learning and optimization techniques are revolutionizing our world. Other types of information technology have not progressed as rapidly in recent years, in terms of real impact. The aim of this book is to present some of the innovative techniques in the field of optimization and machi
Machine learning and optimization techniques are revolutionizing our world. Other types of information technology have not progressed as rapidly in recent years, in terms of real impact. The aim of this book is to present some of the innovative techniques in the field of optimization and machine lea
<p>Advancements in the technology and availability of data sources have led to the `Big Data' era. Working with large data offers the potential to uncover more fine-grained patterns and take timely and accurate decisions, but it also creates a lot of challenges such as slow training and scalability
<p><span>This book aims to provide a collection of state-of-the-art scientific and technical research papers related to machine learning-based algorithms in the field of optimization and engineering design. The theoretical and practical development for numerous engineering applications such as smart