Reinforcement learning algorithms: analysis and applications.
✍ Scribed by Boris Belousov; Hany Abdulsamad; Pascal Klink; Simone Parisi; Jan Peters
- Year
- 2021
- Tongue
- English
- Leaves
- 197
- Series
- Studies in Computational Intelligence,
- Category
- Library
No coin nor oath required. For personal study only.
✦ Table of Contents
Preface
Contents
Biology, Reward, Exploration
Prediction Error and Actor-Critic Hypotheses in the Brain
1 Introduction
2 Computational View
2.1 Temporal Difference Learning
2.2 Actor Critic
3 Behavioral View
3.1 Classical/Pavlovian Conditioning
3.2 Instrumental/Operant Conditioning
3.3 Credit Assignment Problem
4 Neural View
4.1 Reward Prediction Error Hypothesis
4.2 Actor-Critic Hypothesis
4.3 Multiple Critics Hypothesis
4.4 Limitations
5 Conclusion
References
Reviewing On-Policy/Off-Policy Critic Learning in the Context of Temporal Differences and Residual Learning
1 Introduction
1.1 Reinforcement Learning
1.2 Critic Learning
2 Objective Functions and Temporal Differences
2.1 Bellman Equation and Temporal Differences
3 Error Sources of Policy Evaluation Methods
4 Problems Occurring in Off-Policy Critic Learning
5 Temporal Differences and Bellman Residuals
5.1 Temporal-Difference Learning
5.2 Residual-Gradient Algorithm
5.3 Further Comparison
6 Recent Methods and Approaches
7 Conclusion
References
Reward Function Design in Reinforcement Learning
1 Introduction
2 Natural Reward Signals
2.1 Evolutionary Reward Signals: Survival and Fitness
2.2 Monetary Reward in Economics
3 Sparse Rewards
4 Reward Shaping
4.1 Shaping in Behavioral Science
4.2 Reward Shaping in Reinforcement Learning
4.3 Limitations of Reward Shaping and Its Relation to A*
5 Intrinsic Motivation
6 Conclusion
References
Exploration Methods in Sparse Reward Environments
1 Introduction
2 Exploration Methods
2.1 The Problem of Naive Exploration
2.2 Optimism in the Face of Uncertainty
2.3 Intrinsic Rewards
2.4 Bayesian RL Methods
2.5 Other Approaches
3 Conclusion
References
Information Geometry in Reinforcement Learning
A Survey on Constraining Policy Updates Using the KL Divergence
1 Introduction
1.1 Fisher Information Matrix (FIM) in Policy Gradients
1.2 FIM, KL Divergence, and Information Loss
2 Background
3 Methods
3.1 Relative Entropy Policy Search
3.2 Trust Region Policy Optimization
3.3 Proximal Policy Optimization
4 Discussion
5 Conclusion
References
Fisher Information Approximations in Policy Gradient Methods
1 Introduction
2 Background
2.1 Fisher Information Matrix
2.2 Natural Gradient
2.3 Kronecker Product
3 Structural FIM Approximations
3.1 Kronecker Factorization for Neural Networks
3.2 Recursive Approximation Schemes for the FIM
3.3 Tikhonov Damping for Stabilization
4 Monte Carlo FIM Approximations
4.1 Offline Empirical Fisher Estimation
4.2 Online Estimation Based on Exponential Averaging
4.3 Bayesian FIM Estimation
5 Discussion and Conclusion
References
Benchmarking the Natural Gradient in Policy Gradient Methods and Evolution Strategies
1 Introduction
2 Vanilla' Gradient
3 Natural Gradient
4 Policy Gradient Methods
4.1Vanilla' Policy Gradient
4.2 Natural Policy Gradient
4.3 Natural Actor-Critic Algorithms
4.4 Trust Region Policy Optimization
5 Natural Evolution Strategies
5.1 Search Gradients
5.2 Natural Gradients in Evolution Strategies
5.3 Exponential Natural Evolution Strategies
5.4 Separable Natural Evolution Strategies
6 Experiments
6.1 Platforms
6.2 Results
7 Discussion and Conclusion
References
Information-Loss-Bounded Policy Optimization
1 Introduction
2 Notation and Background
3 Method
3.1 Constrained Policy Optimization
3.2 Information Loss Bound
3.3 The Algorithm
4 Experiments
4.1 Simulated MuJoCo Tasks
4.2 Furuta Pendulum Swing-Up and Stabilization
5 Conclusion
References
Persistent Homology for Dimensionality Reduction
1 Introduction
2 Background and Terminology
3 Persistent Homology
3.1 Simplicial-Complexes
3.2 Homology
3.3 Computation
4 Successful Applications of Persistent Homology
4.1 Robotics and Deep Learning
4.2 Data Visualization, Neuroscience, and Physics
5 Conclusion
References
Model-Free Reinforcement Learning and Actor-Critic Methods
Model-Free Deep Reinforcement Learning—Algorithms and Applications
1 Introduction
2 Background
3 Off-Policy—Discrete Action Space
4 Off-Policy—Continuous Action Space
5 On-Policy
6 Applications—Discrete Space
7 Applications—Continuous Space
8 Conclusion and Discussion
References
Actor vs Critic: Learning the Policy or Learning the Value
1 Introduction
2 Notation and Background
2.1 Markov Decision Process
2.2 Value-Based (Critic-Only)
2.3 Policy Gradient (Actor-Only)
2.4 Actor-Critic
3 Actor Versus Critic
3.1 Actor-Only and Critic-Only: Differences
3.2 Combining Actor-Only and Critic-Only: The Actor-Critic Approach
3.3 Example Algorithms with Comparison
4 Conclusion
References
Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN
1 Introduction
2 Background and the Deep Q-Networks Algorithm
3 Limitations of the Deep Q-Networks Algorithm
4 Extensions to the Deep Q-Networks Algorithm
5 Combinations of Improvements
References
Distributed Methods for Reinforcement Learning Survey
1 Introduction
2 Notation of Multi-agent Reinforcement Learning
3 Taxononomy of Distributed Reinforcement Learning
3.1 Multi-agents
3.2 Parallel Methods
3.3 Population-Based
4 Applications
5 Discussion
6 Conclusion
References
Model-Based Learning and Control
Model-Based Reinforcement Learning from PILCO to PETS
1 Introduction
2 Reinforcement Learning and Policy Search
3 Model-Based Policy Search: PILCO
4 From Gaussian Processes to Neural Networks
5 From Policy Search to MPC
6 From PILCO to PETS
7 Conclusion
References
Challenges of Model Predictive Control in a Black Box Environment
1 Introduction
2 Terminology
2.1 Reinforcement Learning
2.2 Model-Based Reinforcement Learning
3 Model Predictive Control
3.1 Learning a Model
3.2 Optimizing the Trajectory
4 Challenges of MPC
4.1 Computation
4.2 Horizon Problem
5 Conclusion
References
Control as Inference?
1 Introduction
2 Discrete-Time Optimal Control
2.1 The Linear Quadratic Regulator
2.2 Differential Dynamic Programming
3 Discrete-Time Optimal Control as Message Passing
4 Continuous-Time Stochastic Optimal Control
5 Path Integral Control
6 Discussion
7 Conclusion
References
📜 SIMILAR VOLUMES
<p>The book is dedicated to the use of genetic algorithms in theoretical economic research. Genetic algorithms offer the chance of overcoming the limitations traditional mathematical tractability puts on economic research and thus open new horzions for economic theory. The book reveals close relatio
Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective.What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner
<span>Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the lea