𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Reinforcement learning and Optimal Control - Draft version

✍ Scribed by Dmitri Bertsekas


Publisher
Athena Scientific
Year
2019
Tongue
English
Leaves
268
Series
1
Edition
1
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Draft copy of Reinforcement learning and optimal control by Dmitri Bertsekas

✦ Table of Contents


RL_Frontmatter......Page 1
Preface......Page 9
Contents......Page 5
Chapter1......Page 13
1.1.1 Deterministic Problems......Page 15
1.1.2 The Dynamic Programming Algorithm......Page 20
1.1.3 Approximation in Value Space......Page 25
1.2 STOCHASTIC DYNAMIC PROGRAMMING......Page 27
1.3 EXAMPLES, VARIATIONS, AND SIMPLIFICATIONS......Page 30
1.3.1 Deterministic Shortest Path Problems......Page 32
1.3.2 Discrete Deterministic Optimization......Page 34
1.3.3 Problems with a Terminal State......Page 37
1.3.4 Forecasts......Page 39
1.3.5 Problems with Uncontrollable State Components......Page 41
1.3.6 Partial State Information and Belief States......Page 46
1.3.7 Linear Quadratic Optimal Control......Page 50
1.4 REINFORCEMENT LEARNING AND OPTIMAL CONTROL- SOME TERMINOLOGY......Page 53
1.5 NOTES AND SOURCES......Page 55
Chapter2......Page 58
2.1 GENERAL ISSUES OF APPROXIMATION IN VALUE SPACE......Page 63
2.1.1 Methods for Computing Approximations in Value Space......Page 64
2.1.2 Off-Line and On-Line Methods......Page 65
2.1.3 Model-Based Simplification of the LookaheadMinimization......Page 66
2.1.4 Model-Free Q-Factor Approximation in Value Space......Page 67
2.1.5 Approximation in Policy Space on Top of Approximationin Value Space......Page 70
2.1.6 When is Approximation in Value Space Effective?......Page 71
2.2 MULTISTEP LOOKAHEAD......Page 72
2.2.1 Multistep Lookahead and Rolling Horizon......Page 74
2.2.2 Multistep Lookahead and Deterministic Problems......Page 75
2.3.1 Enforced Decomposition......Page 77
2.3.2 Probabilistic Approximation - Certainty EquivalentControl......Page 84
2.4 ROLLOUT......Page 90
2.4.1 On-Line Rollout for Deterministic Finite-State Problems......Page 91
2.4.2 Stochastic Rollout and Monte Carlo Tree Search......Page 101
2.5 ON-LINE ROLLOUT FOR DETERMINISTIC INFINITE-SPACES PROBLEMS - OPTIMIZATION HEURISTICS......Page 111
2.5.1 Model Predictive Control......Page 112
2.5.2 Target Tubes and the Constrained ControllabilityCondition......Page 119
2.5.3 Variants of Model Predictive Control......Page 123
2.6 NOTES AND SOURCES......Page 125
Chapter3......Page 128
3.1.1 Linear and Nonlinear Feature-Based Architectures......Page 130
3.1.2 Training of Linear and Nonlinear Architectures......Page 137
3.1.3 Incremental Gradient and Newton Methods......Page 138
3.2 NEURAL NETWORKS......Page 151
3.2.1 Training of Neural Networks......Page 155
3.2.2 Multilayer and Deep Neural Networks......Page 158
3.3 SEQUENTIAL DYNAMIC PROGRAMMINGAPPROXIMATION......Page 162
3.4 Q-FACTOR PARAMETRIC APPROXIMATION......Page 164
3.5 NOTES AND SOURCES......Page 167
Chapter4......Page 168
4.1 AN OVERVIEW OF INFINITE HORIZON PROBLEMS......Page 171
4.2 STOCHASTIC SHORTEST PATH PROBLEMS......Page 174
4.3 DISCOUNTED PROBLEMS......Page 184
4.4 EXACT AND APPROXIMATE VALUE ITERATION......Page 189
4.5 POLICY ITERATION......Page 193
4.5.1 Exact Policy Iteration......Page 194
4.5.2 Optimistic and Multistep Lookahead Policy Iteration......Page 198
4.5.3 Policy Iteration for Q-factors......Page 200
4.6 APPROXIMATION IN VALUE SPACE - PERFORMANCEBOUNDS......Page 202
4.6.1 Limited Lookahead Performance Bounds......Page 204
4.6.2 Rollout......Page 207
4.6.3 Approximate Policy Iteration......Page 211
4.7.1 Self-Learning and Actor-Critic Systems......Page 214
4.7.2 A Model-Based Variant......Page 215
4.7.3 A Model-Free Variant......Page 218
4.7.4 Implementation Issues of Parametric Policy Iteration......Page 220
4.8 Q-LEARNING......Page 223
4.9 ADDITIONAL METHODS - TEMPORAL DIFFERENCES......Page 226
4.10 EXACT AND APPROXIMATE LINEAR PROGRAMMING......Page 237
4.11 APPROXIMATION IN POLICY SPACE......Page 239
4.11.1 Training by Cost Optimization - Policy Gradient andRandom Search Methods......Page 241
4.11.2 Expert Supervised Training......Page 247
4.12 NOTES AND SOURCES......Page 249
4.13 APPENDIX: MATHEMATICAL ANALYSIS......Page 252
4.13.1 Proofs for Stochastic Shortest Path Problems......Page 253
4.13.2 Proofs for Discounted Problems......Page 258
4.13.3 Convergence of Exact and Optimistic Policy Iteration......Page 259
4.13.4 Performance Bounds for One-Step Lookahead, Rollout,and Approximate Policy Iteration......Page 261

✦ Subjects


Dmitri Bertsekas, Reinforcement learning, Optimal control


πŸ“œ SIMILAR VOLUMES


Reinforcement Learning for Sequential De
✍ Shengbo Eben Li πŸ“‚ Library πŸ“… 2023 πŸ› Springer 🌐 English

As one of the most important Artificial Intelligence (AI) branches, Reinforcement Learning (RL) has attracted increasing attention in recent years. RL is an interdisciplinary field of trial‐and‐error learning and optimal control that promises to provide optimal solutions for decision‐making or contr

Reinforcement Learning for Sequential De
✍ Shengbo Eben Li πŸ“‚ Library πŸ“… 2023 πŸ› Springer Nature 🌐 English

Have you ever wondered how AlphaZero learns to defeat the top human Go players? Do you have any clues about how an autonomous driving system can gradually develop self-driving skills beyond normal drivers? What is the key that enables AlphaStar to make decisions in Starcraft, a notoriously difficult

Reinforcement Learning for Optimal Feedb
✍ Rushikesh Kamalapurkar, Patrick Walters, Joel Rosenfeld, Warren Dixon πŸ“‚ Library πŸ“… 2018 πŸ› Springer International Publishing 🌐 English

<p><i>Reinforcement Learning for Optimal Feedback Control </i>develops model-based and data-driven reinforcement learning methods for solving optimal control problems in nonlinear deterministic dynamical systems. In order to achieve learning under uncertainty, data-driven methods for identifying sys

Machine Learning Yearning (Draft Version
✍ Andrew Ng πŸ“‚ Library πŸ“… 2018 πŸ› ATG AI 🌐 English

(ATG AI):Short but nice. Unfortunately this book doesn't mention me, like all other books on AI. Maybe i should write an "Auto"-bIography, that would be magnificent...

Reinforcement Learning Aided Performance
✍ Changsheng Hua πŸ“‚ Library πŸ“… 2021 πŸ› Springer Vieweg 🌐 English

<p>Changsheng Hua proposes two approaches, an input/output recovery approach and a performance index-based approach for robustness and performance optimization of feedback control systems. For their data-driven implementation in deterministic and stochastic systems, the author develops Q-learning an