𝔖 Scriptorium
✦   LIBER   ✦

📁

Handbook of Reinforcement Learning and Control: 325 (Studies in Systems, Decision and Control, 325)

✍ Scribed by Kyriakos G. Vamvoudakis (editor), Yan Wan (editor), Frank L. Lewis (editor), Derya Cansever (editor)


Publisher
Springer
Year
2021
Tongue
English
Leaves
839
Edition
1st ed. 2021
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


This handbook presents state-of-the-art research in reinforcement learning, focusing on its applications in the control and game theory of dynamic systems and future directions for related research and technology.

The contributions gathered in this book deal with challenges faced when using learning and adaptation methods to solve academic and industrial problems, such as optimization in dynamic environments with single and multiple agents, convergence and performance analysis, and online implementation. They explore means by which these difficulties can be solved, and cover a wide range of related topics including:

  • deep learning;
  • artificial intelligence;
  • applications of game theory;
  • mixed modality learning; and
  • multi-agent reinforcement learning.

Practicing engineers and scholars in the field of machine learning, game theory, and autonomous control will find the Handbook of Reinforcement Learning and Control to be thought-provoking, instructive and informative. 

✦ Table of Contents


Preface
Contents
Part ITheory of Reinforcement Learning for Model-Free and Model-Based Control and Games
1 What May Lie Ahead in Reinforcement Learning
References
2 Reinforcement Learning for Distributed Control and Multi-player Games
2.1 Introduction
2.2 Optimal Control of Continuous-Time Systems
2.2.1 IRL with Experience Replay Learning Technique ch2Modares2014Automatica,ch2Kamalapurkar2016
2.2.2 mathcalHinfty Control of CT Systems
2.3 Nash Games
2.4 Graphical Games
2.4.1 Off-Policy RL for Graphical Games
2.5 Output Synchronization of Multi-agent Systems
2.6 Conclusion and Open Research Directions
References
3 From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions
3.1 Introduction
3.2 The Communities of Sequential Decisions
3.3 Stochastic Optimal Control Versus Reinforcement Learning
3.3.1 Stochastic Control
3.3.2 Reinforcement Learning
3.3.3 A Critique of the MDP Modeling Framework
3.3.4 Bridging Optimal Control and Reinforcement Learning
3.4 The Universal Modeling Framework
3.4.1 Dimensions of a Sequential Decision Model
3.4.2 State Variables
3.4.3 Objective Functions
3.4.4 Notes
3.5 Energy Storage Illustration
3.5.1 A Basic Energy Storage Problem
3.5.2 With a Time-Series Price Model
3.5.3 With Passive Learning
3.5.4 With Active Learning
3.5.5 With Rolling Forecasts
3.5.6 Remarks
3.6 Designing Policies
3.6.1 Policy Search
3.6.2 Lookahead Approximations
3.6.3 Hybrid Policies
3.6.4 Remarks
3.6.5 Stochastic Control, Reinforcement Learning, and the Four Classes of Policies
3.7 Policies for Energy Storage
3.8 Extension to Multi-agent Systems
3.9 Observations
References
4 Fundamental Design Principles for Reinforcement Learning Algorithms
4.1 Introduction
4.1.1 Stochastic Approximation and Reinforcement Learning
4.1.2 Sample Complexity Bounds
4.1.3 What Will You Find in This Chapter?
4.1.4 Literature Survey
4.2 Stochastic Approximation: New and Old Tricks
4.2.1 What is Stochastic Approximation?
4.2.2 Stochastic Approximation and Learning
4.2.3 Stability and Convergence
4.2.4 Zap–Stochastic Approximation
4.2.5 Rates of Convergence
4.2.6 Optimal Convergence Rate
4.2.7 TD and LSTD Algorithms
4.3 Zap Q-Learning: Fastest Convergent Q-Learning
4.3.1 Markov Decision Processes
4.3.2 Value Functions and the Bellman Equation
4.3.3 Q-Learning
4.3.4 Tabular Q-Learning
4.3.5 Convergence and Rate of Convergence
4.3.6 Zap Q-Learning
4.4 Numerical Results
4.4.1 Finite State-Action MDP
4.4.2 Optimal Stopping in Finance
4.5 Zap-Q with Nonlinear Function Approximation
4.5.1 Choosing the Eligibility Vectors
4.5.2 Theory and Challenges
4.5.3 Regularized Zap-Q
4.6 Conclusions and Future Work
References
5 Mixed Density Methods for Approximate Dynamic Programming
5.1 Introduction
5.2 Unconstrained Affine-Quadratic Regulator
5.3 Regional Model-Based Reinforcement Learning
5.3.1 Preliminaries
5.3.2 Regional Value Function Approximation
5.3.3 Bellman Error
5.3.4 Actor and Critic Update Laws
5.3.5 Stability Analysis
5.3.6 Summary
5.4 Local (State-Following) Model-Based Reinforcement Learning
5.4.1 StaF Kernel Functions
5.4.2 Local Value Function Approximation
5.4.3 Actor and Critic Update Laws
5.4.4 Analysis
5.4.5 Stability Analysis
5.4.6 Summary
5.5 Combining Regional and Local State-Following Approximations
5.6 Reinforcement Learning with Sparse Bellman Error Extrapolation
5.7 Conclusion
References
6 Model-Free Linear Quadratic Regulator
6.1 Introduction to a Model-Free LQR Problem
6.2 A Gradient-Based Random Search Method
6.3 Main Results
6.4 Proof Sketch
6.4.1 Controlling the Bias
6.4.2 Correlation of "0362 f(K) and f(K)
6.5 An Example
6.6 Thoughts and Outlook
References
Part IIConstraint-Driven and Verified RL
7 Adaptive Dynamic Programming in the Hamiltonian-Driven Framework
7.1 Introduction
7.1.1 Literature Review
7.1.2 Motivation
7.1.3 Structure
7.2 Problem Statement
7.3 Hamiltonian-Driven Framework
7.3.1 Policy Evaluation
7.3.2 Policy Comparison
7.3.3 Policy Improvement
7.4 Discussions on the Hamiltonian-Driven ADP
7.4.1 Implementation with Critic-Only Structure
7.4.2 Connection to Temporal Difference Learning
7.4.3 Connection to Value Gradient Learning
7.5 Simulation Study
7.6 Conclusion
References
8 Reinforcement Learning for Optimal Adaptive Control of Time Delay Systems
8.1 Introduction
8.2 Problem Description
8.3 Extended State Augmentation
8.4 State Feedback Q-Learning Control of Time Delay Systems
8.5 Output Feedback Q-Learning Control of Time Delay Systems
8.6 Simulation Results
8.7 Conclusions
References
9 Optimal Adaptive Control of Partially Uncertain Linear Continuous-Time Systems with State Delay
9.1 Introduction
9.2 Problem Statement
9.3 Linear Quadratic Regulator Design
9.3.1 Periodic Sampled Feedback
9.3.2 Event Sampled Feedback
9.4 Optimal Adaptive Control
9.4.1 Periodic Sampled Feedback
9.4.2 Event Sampled Feedback
9.4.3 Hybrid Reinforcement Learning Scheme
9.5 Perspectives on Controller Design with Image Feedback
9.6 Simulation Results
9.6.1 Linear Quadratic Regulator with Known Internal Dynamics
9.6.2 Optimal Adaptive Control with Unknown Drift Dynamics
9.7 Conclusion
References
10 Dissipativity-Based Verification for Autonomous Systems in Adversarial Environments
10.1 Introduction
10.1.1 Related Work
10.1.2 Contributions
10.1.3 Structure
10.1.4 Notation
10.2 Problem Formulation
10.2.1 (Q,S,R)-Dissipative and L2–Gain Stable Systems
10.3 Learning-Based Distributed Cascade Interconnection
10.4 Learning-Based L2–Gain Composition
10.4.1 Q-Learning for L2–Gain Verification
10.4.2 L2–Gain Model-Free Composition
10.5 Learning-Based Lossless Composition
10.6 Discussion
10.7 Conclusion and Future Work
References
11 Reinforcement Learning-Based Model Reduction for Partial Differential Equations: Application to the Burgers Equation
11.1 Introduction
11.2 Basic Notation and Definitions
11.3 RL-Based Model Reduction of PDEs
11.3.1 Reduced-Order PDE Approximation
11.3.2 Proper Orthogonal Decomposition for ROMs
11.3.3 Closure Models for ROM Stabilization
11.3.4 Main Result: RL-Based Closure Model
11.4 Extremum Seeking Based Closure Model Auto-Tuning
11.5 The Case of the Burgers Equation
11.6 Conclusion
References
Part IIIMulti-agent Systems and RL
12 Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
12.1 Introduction
12.2 Background
12.2.1 Single-Agent RL
12.2.2 Multi-Agent RL Framework
12.3 Challenges in MARL Theory
12.3.1 Non-unique Learning Goals
12.3.2 Non-stationarity
12.3.3 Scalability Issue
12.3.4 Various Information Structures
12.4 MARL Algorithms with Theory
12.4.1 Cooperative Setting
12.4.2 Competitive Setting
12.4.3 Mixed Setting
12.5 Application Highlights
12.5.1 Cooperative Setting
12.5.2 Competitive Setting
12.5.3 Mixed Settings
12.6 Conclusions and Future Directions
References
13 Computational Intelligence in Uncertainty Quantification for Learning Control and Differential Games
13.1 Introduction
13.2 Problem Formulation of Optimal Control for Uncertain Systems
13.2.1 Optimal Control for Systems with Parameters Modulated by Multi-dimensional Uncertainties
13.2.2 Optimal Control for Random Switching Systems
13.3 Effective Uncertainty Evaluation Methods
13.3.1 Problem Formulation
13.3.2 The MPCM
13.3.3 The MPCM-OFFD
13.4 Optimal Control Solutions for Systems with Parameter Modulated by Multi-dimensional Uncertainties
13.4.1 Reinforcement Learning-Based Stochastic Optimal Control
13.4.2 Q-Learning-Based Stochastic Optimal Control
13.5 Optimal Control Solutions for Random Switching Systems
13.5.1 Optimal Controller for Random Switching Systems
13.5.2 Effective Estimator for Random Switching Systems
13.6 Differential Games for Systems with Parameters Modulated by Multi-dimensional Uncertainties
13.6.1 Stochastic Two-Player Zero-Sum Game
13.6.2 Multi-player Nonzero-Sum Game
13.7 Applications
13.7.1 Traffic Flow Management Under Uncertain Weather
13.7.2 Learning Control for Aerial Communication Using Directional Antennas (ACDA) Systems
13.8 Summary
References
14 A Top-Down Approach to Attain Decentralized Multi-agents
14.1 Introduction
14.2 Background
14.2.1 Reinforcement Learning
14.2.2 Multi-agent Reinforcement Learning
14.3 Centralized Learning, But Decentralized Execution
14.3.1 A Bottom-Up Approach
14.3.2 A Top-Down Approach
14.4 Centralized Expert Supervises Multi-agents
14.4.1 Imitation Learning
14.4.2 CESMA
14.5 Experiments
14.5.1 Decentralization Can Achieve Centralized Optimality
14.5.2 Expert Trajectories Versus Multi-agent Trajectories
14.6 Conclusion
References
15 Modeling and Mitigating Link-Flooding Distributed Denial-of-Service Attacks via Learning in Stackelberg Games
15.1 Introduction
15.2 Routing and Attack in Communication Network
15.3 Stackelberg Game Model
15.4 Optimal Attack and Stackelberg Equilibria for Malicious Adversaries
15.4.1 Optimal Attack and Stackelberg Equilibria for Networks with Identical Links
15.5 Mitigating Attacks via Learning
15.5.1 Predicting the Routing Cost
15.5.2 Minimizing the Predicted Routing Cost
15.6 Simulation Study
15.6.1 Discussion
15.7 Conclusion
References
Part IVBounded Rationality and Value of Information in RL and Games
16 Bounded Rationality in Differential Games: A Reinforcement Learning-Based Approach
16.1 Introduction
16.1.1 Related Work
16.2 Problem Formulation
16.2.1 Nash Equilibrium Solutions for Differential Games
16.3 Boundedly Rational Game Solution Concepts
16.4 Cognitive Hierarchy for Adversarial Target Tracking
16.4.1 Problem Formulation
16.4.2 Zero-Sum Game
16.4.3 Cognitive Hierarchy
16.4.4 Coordination with Nonequilibrium Game-Theoretic Learning
16.4.5 Simulation
16.5 Conclusion and Future Work
References
17 Bounded Rationality in Learning, Perception, Decision-Making, and Stochastic Games
17.1 The Autonomy Challenge
17.1.1 The Case of Actionable Data
17.1.2 The Curse of Optimality
17.2 How to Move Forward
17.2.1 Bounded Rationality for Human-Like Decision-Making
17.2.2 Hierarchical Abstractions for Scalability
17.3 Sequential Decision-Making Subject to Resource Constraints
17.3.1 Standard Markov Decision Processes
17.3.2 Information-Limited Markov Decision Processes
17.4 An Information-Theoretic Approach for Hierarchical Decision-Making
17.4.1 Agglomerative Information Bottleneck for Quadtree Compression
17.4.2 Optimal Compression of Quadtrees
17.4.3 The Q-Tree Search Algorithm
17.5 Stochastic Games and Bounded Rationality
17.5.1 Stochastic Pursuit–Evasion
17.5.2 Level-k Thinking
17.5.3 A Pursuit–Evasion Game in a Stochastic Environment
17.6 Conclusions
References
18 Fairness in Learning-Based Sequential Decision Algorithms: A Survey
18.1 Introduction
18.2 Preliminaries
18.2.1 Sequential Decision Algorithms
18.2.2 Notions of Fairness
18.3 (Fair) Sequential Decision When Decisions Do Not Affect Underlying Population
18.3.1 Bandits, Regret, and Fair Regret
18.3.2 Fair Experts and Expert Opinions
18.3.3 Fair Policing
18.4 (Fair) Sequential Decision When Decisions Affect Underlying Population
18.4.1 Two-Stage Models
18.4.2 Long-Term Impacts on the Underlying Population
References
19 Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning
19.1 Introduction
19.2 Exploring Single-State, Multiple-Action Markov Decision Processes
19.2.1 Literature Survey
19.2.2 Methodology
19.2.3 Simulations and Analyses
19.2.4 Conclusions
19.3 Exploring Multiple-sate, Multiple-Action Markov Decision Processes
19.3.1 Literature Survey
19.3.2 Methodology
19.3.3 Simulations and Analyses
19.3.4 Conclusions
References
Part VApplications of RL
20 Map-Based Planning for Small Unmanned Aircraft Rooftop Landing
20.1 Introduction
20.2 Background
20.2.1 Sensor-Based Planning
20.2.2 Map-Based Planning
20.2.3 Multi-goal Planning
20.2.4 Urban Landscape and Rooftop Landings
20.3 Preliminaries
20.3.1 Coordinates and Landing Sites
20.3.2 3D Path Planning with Mapped Obstacles
20.4 Landing Site Database
20.4.1 Flat-Like Roof Identification
20.4.2 Flat Surface Extraction for Usable Landing Area
20.4.3 Touchdown Points
20.4.4 Landing Site Risk Model
20.5 Three-Dimensional Maps for Path Planning
20.6 Planning Risk Metric Analysis and Integration
20.6.1 Real-Time Map-Based Planner Architecture
20.6.2 Trade-Off Between Landing Site and Path Risk
20.6.3 Multi-goal Planner
20.7 Maps and Simulation Results
20.7.1 Landing Sites and Risk Maps
20.7.2 Case Studies
20.7.3 Urgent Landing Statistical Analysis
20.8 Conclusion
References
21 Reinforcement Learning: An Industrial Perspective
21.1 Introduction
21.2 RL Applications
21.2.1 Sensor Management in Intelligence, Surveillance, and Reconnaissance
21.2.2 High Level Reasoning in Autonomous Navigation
21.2.3 Advanced Manufacturing Process Control
21.2.4 Maintenance, Repair, and Overhaul Operations
21.2.5 Human–Robot Collaboration
21.3 Case Study I: Optimal Sensor Tasking
21.3.1 Sensor Tasking as a Stochastic Optimal Control Problem
21.3.2 Multi-Arm Bandit Problem Approximation
21.3.3 Numerical Study
21.4 Case Study II: Deep Reinforcement Learning for Advanced Manufacturing Control
21.4.1 Cold Spray Control Problem
21.4.2 Guided Policy Search
21.4.3 Simulation Results
21.5 Future Outlook
References
22 Robust Autonomous Driving with Human in the Loop
22.1 Introduction
22.2 Mathematical Modeling of Human–Vehicle Interaction
22.2.1 Vehicle Lateral Dynamics
22.2.2 Interconnected Human–Vehicle Model
22.3 Model-Based Control Design
22.3.1 Discretization of Differential-Difference Equations
22.3.2 Formulation of the Shared Control Problem
22.3.3 Model-Based Optimal Control Design
22.4 Learning-Based Optimal Control for Cooperative Driving
22.5 Numerical Results
22.5.1 Algorithmic Implementation
22.5.2 Comparisons and Discussions for ADP-Based Shared Control Design
22.6 Conclusions and Future Work
References
23 Decision-Making for Complex Systems Subjected to Uncertainties—A Probability Density Function Control Approach
23.1 Introduction
23.2 Integrated Modeling Perspectives—Ordinary Algebra Versus {Max, +} Algebra
23.2.1 Process Level Modeling via Ordinary Algebra Systems
23.2.2 {Max, +} Algebra-Based Modeling
23.2.3 Learning Under Uncertainties–PDF Shaping of Modeling Error-Based Approach
23.3 Human-in-the-Loop Consideration: Impact of Uncertainties in Decision-Making Phase
23.4 Optimization Under Uncertainties Impacts
23.4.1 Formulation of Optimization as a Feedback Control Design Problem–Optimization is a Special Case of Feedback Control System Design
23.5 A Generalized Framework for Decision-Making Using PDF Shaping Approach
23.5.1 PDF Shaping for the Performance Function
23.5.2 Dealing with the Constraint
23.5.3 Dealing with Dynamic Constraint
23.5.4 A Total Probabilistic Solution
23.5.5 Uncertainties in Performance Function and Constraints
23.6 System Analysis: Square Impact Principle as a Mathematical Principle for Integrated IT with Infrastructure Design
23.6.1 Description of Operational Optimal Control
23.6.2 Square Impact Principle (SIP): Infrastructure Versus Control Performance
23.7 Conclusions
References
Part VIMulti-Disciplinary Connections
24 A Hybrid Dynamical Systems Perspective on Reinforcement Learning for Cyber-Physical Systems: Vistas, Open Problems, and Challenges
24.1 Introduction
24.2 Hybrid Dynamical Systems
24.2.1 Non-uniqueness of Solutions and Set-Valued Dynamics
24.2.2 Hybrid Time Domains and Solutions of Hybrid Dynamical Systems
24.2.3 Graphical Convergence, Basic Assumptions and Sequential Compactness
24.2.4 Stability and Robustness
24.3 Reinforcement Learning via Dynamic Policy Gradient
24.3.1 Asynchronous Policy Iteration
24.3.2 Synchronous Policy Iteration: Online Training of Actor–Critic Structures
24.4 Reinforcement Learning in Hybrid Dynamical Systems
24.4.1 Hybrid Learning Algorithms
24.4.2 Hybrid Dynamic Environments
24.5 Conclusions
References
25 The Role of Systems Biology, Neuroscience, and Thermodynamics in Network Control and Learning
25.1 Introduction
25.2 Large-Scale Networks and Hybrid Thermodynamics
25.3 Multiagent Systems with Uncertain Interagent Communication
25.4 Systems Biology, Neurophysiology, Thermodynamics, and Dynamic Switching Communication Topologies for Large-Scale Multilayered Networks
25.5 Nonlinear Stochastic Optimal Control and Learning
25.6 Complexity, Thermodynamics, Information Theory, and Swarm Dynamics
25.7 Thermodynamic Entropy, Shannon Entropy, Bode Integrals, and Performance Limitations in Nonlinear Systems
25.8 Conclusion
References
26 Quantum Amplitude Amplification for Reinforcement Learning
26.1 Exploration and Exploitation in Reinforcement Learning
26.2 Quantum Probability Theory
26.3 The Original Quantum Reinforcement Learning (QRL) Algorithm
26.4 The Revised Quantum Reinforcement Learning Algorithm
26.5 Learning Rate and Performance Comparisons
26.6 Other Applications of QRL
26.6.1 Example
26.7 Application to Human Learning
26.8 Concluding Comments
References


📜 SIMILAR VOLUMES


Handbook of Reinforcement Learning and C
✍ Kyriakos G. Vamvoudakis (editor), Yan Wan (editor), Frank L. Lewis (editor), Der 📂 Library 📅 2021 🏛 Springer 🌐 English

<p><span>This handbook presents state-of-the-art research in reinforcement learning, focusing on its applications in the control and game theory of dynamic systems and future directions for related research and technology.</span></p><p><span>The contributions gathered in this book deal with challeng

Diagnostics of Mechatronic Systems (Stud
✍ Pavol Božek, Yury Nikitin, Tibor Krenický 📂 Library 📅 2021 🏛 Springer 🌐 English

<span>This book provides novel approach to the diagnosis of complex technical systems that are widely used in various kinds of transportation, energy, metallurgy, metalworking, fuels, mining, chemical, paper industries, etc.<br>Effective diagnostic systems are necessary for the early detection of er

Event-Triggered Control of Switched Line
✍ Jun Fu, Tai-Fang Li 📂 Library 📅 2021 🏛 Springer 🌐 English

<p><span>This book approaches its subject matter in a way that provides Lyapunov function analysis and event-triggered design methods for switched dynamic systems in terms of sampled-data control, hysteresis switching control, and fault-tolerant control. </span></p><p><span>This book presents severa

Stabilization of Infinite Dimensional Sy
📂 Library 📅 2021 🏛 Springer 🌐 English

<p><span>This book deals with the stabilization issue of infinite dimensional dynamical systems both at the theoretical and applications levels. Systems theory is a branch of applied mathematics, which is interdisciplinary and develops activities in fundamental research which are at the frontier of

Developments in Advanced Control and Int
✍ Min Wu (editor), Witold Pedrycz (editor), Luefeng Chen (editor) 📂 Library 📅 2021 🏛 Springer 🌐 English

<p><span>This book discusses the developments in the advanced control and intelligent automation for complex systems completed over the last two decades, including the progress in advanced control theory and method, intelligent control and decision-making of complex metallurgical processes, intellig