Deep Reinforcement Learning

✍ Scribed by Aske Plaat

Publisher: Springer Nature
Year: 2022
Tongue: English
Leaves: 414
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Deep reinforcement learning has attracted considerable attention recently. Impressive results have been achieved in such diverse fields as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to understand problems that were previously considered to be very difficult. In the game of Go, the program AlphaGo has even learned to outmatch three of the world’s leading players.Deep reinforcement learning takes its inspiration from the fields of biology and psychology. Biology has inspired the creation of artificial neural networks and deep learning, while psychology studies how animals and humans learn, and how subjects’ desired behavior can be reinforced with positive and negative stimuli. When we see how reinforcement learning teaches a simulated robot to walk, we are reminded of how children learn, through playful exploration. Techniques that are inspired by biology and psychology work amazingly well in computers: animal behavior and the structure of the brain as new blueprints for science and engineering. In fact, computers truly seem to possess aspects of human behavior; as such, this field goes to the heart of the dream of artificial intelligence. These research advances have not gone unnoticed by educators. Many universities have begun offering courses on the subject of deep reinforcement learning. The aim of this book is to provide an overview of the field, at the proper level of detail for a graduate course in artificial intelligence. It covers the complete field, from the basic algorithms of Deep Q-learning, to advanced topics such as multi-agent reinforcement learning and meta learning.

✦ Table of Contents

Preface
Acknowledgments
Contents
List of Tables
1 Introduction
1.1 What Is Deep Reinforcement Learning?
1.1.1 Deep Learning
1.1.2 Reinforcement Learning
1.1.3 Deep Reinforcement Learning
1.1.4 Applications
1.1.4.1 Sequential Decision Problems
1.1.4.2 Robotics
1.1.4.3 Games
1.1.5 Four Related Fields
1.1.5.1 Psychology
1.1.5.2 Mathematics
1.1.5.3 Engineering
1.1.5.4 Biology
1.2 Three Machine Learning Paradigms
1.2.1 Supervised Learning
1.2.2 Unsupervised Learning
1.2.3 Reinforcement Learning
1.3 Overview of the Book
1.3.1 Prerequisite Knowledge
1.3.1.1 Course
1.3.1.2 Blogs and GitHub
1.3.2 Structure of the Book
1.3.2.1 Chapters
References
2 Tabular Value-Based Reinforcement Learning
Core Concepts
Core Problem
Finding a Supermarket
2.1 Sequential Decision Problems
2.1.1 Grid Worlds
2.1.2 Mazes and Box Puzzles
2.2 Tabular Value-Based Agents
2.2.1 Agent and Environment
2.2.2 Markov Decision Process
2.2.2.1 State S
2.2.2.2 Action A
2.2.2.3 Transition Ta
2.2.2.4 Reward Ra
2.2.2.5 Discount Factor γ
2.2.2.6 Policy π
2.2.3 MDP Objective
2.2.3.1 Trace τ
2.2.3.2 Return R
2.2.3.3 State Value V
2.2.3.4 State–Action Value Q
2.2.3.5 Reinforcement Learning Objective
2.2.3.6 Bellman Equation
2.2.4 MDP Solution Methods
2.2.4.1 Hands On: Value Iteration in Gym
2.2.4.2 OpenAI Gym
2.2.4.3 Taxi Example with Value Iteration
2.2.4.4 Model-Free Learning
2.2.4.5 Temporal Difference Learning
2.2.4.6 Find Policy by Value-Based Learning
2.2.4.7 Exploration
2.2.4.8 Bandit Theory
2.2.4.9 ε-Greedy Exploration
2.2.4.10 Off-Policy Learning
2.2.4.11 On-Policy SARSA
2.2.4.12 Off-Policy Q-Learning
2.2.4.13 Sparse Rewards and Reward Shaping
2.2.4.14 Hands on: Q-Learning on Taxi
2.2.4.15 Tuning Your Learning Rate
2.3 Classic Gym Environments
2.3.1 Mountain Car and Cartpole
2.3.2 Path Planning and Board Games
2.3.2.1 Path Planning
2.3.2.2 Board Games
Summary and Further Reading
Summary
Further Reading
Exercises
Questions
Exercises
References
3 Deep Value-Based Reinforcement Learning
Core Concepts
Core Problem
Core Algorithm
End-to-End Learning
3.1 Large, High-Dimensional, Problems
3.1.1 Atari Arcade Games
3.1.2 Real-Time Strategy and Video Games
3.2 Deep Value-Based Agents
3.2.1 Generalization of Large Problemswith Deep Learning
3.2.1.1 Minimizing Supervised Target Loss
3.2.1.2 Bootstrapping Q-Values
3.2.1.3 Deep Reinforcement Learning Target-Error
3.2.2 Three Challenges
3.2.2.1 Coverage
3.2.2.2 Correlation
3.2.2.3 Convergence
3.2.2.4 Deadly Triad
3.2.3 Stable Deep Value-Based Learning
3.2.3.1 Decorrelating States
3.2.3.2 Experience Replay
3.2.3.3 Infrequent Updates of Target Weights
3.2.3.4 Hands On: DQN and Breakout Gym Example
3.2.3.5 Install Stable Baselines
3.2.3.6 The DQN Code
3.2.4 Improving Exploration
3.2.4.1 Overestimation
3.2.4.2 Prioritized Experience Replay
3.2.4.3 Advantage Function
3.2.4.4 Distributional Methods
3.2.4.5 Noisy DQN
3.3 Atari 2600 Environments
3.3.1 Network Architecture
3.3.2 Benchmarking Atari
Summary and Further Reading
Summary
Further Reading
Exercises
Questions
Exercises
References
4 Policy-Based Reinforcement Learning
Core Concepts
Core Problem
Core Algorithms
Jumping Robots
4.1 Continuous Problems
4.1.1 Continuous Policies
4.1.2 Stochastic Policies
4.1.3 Environments: Gym and MuJoCo
4.1.3.1 Robotics
4.1.3.2 Physics Models
4.1.3.3 Games
4.2 Policy-Based Agents
4.2.1 Policy-Based Algorithm: REINFORCE
4.2.2 Bias–Variance Trade-Off in Policy-Based Methods
4.2.3 Actor Critic Bootstrapping
4.2.4 Baseline Subtraction with Advantage Function
4.2.5 Trust Region Optimization
4.2.6 Entropy and Exploration
4.2.7 Deterministic Policy Gradient
4.2.8 Hands On: PPO and DDPG MuJoCo Examples
4.3 Locomotion and Visuo-Motor Environments
4.3.1 Locomotion
4.3.2 Visuo-Motor Interaction
4.3.3 Benchmarking
Summary and Further Reading
Summary
Further Reading
Exercises
Questions
Exercises
References
5 Model-Based Reinforcement Learning
Core Concepts
Core Problem
Core Algorithms
Building a Navigation Map
5.1 Dynamics Models of High-Dimensional Problems
5.2 Learning and Planning Agents
5.2.1 Learning the Model
5.2.1.1 Modeling Uncertainty
5.2.1.2 Latent Models
5.2.2 Planning with the Model
5.2.2.1 Trajectory Rollouts and Model-Predictive Control
5.2.2.2 End-to-End Learning and Planning-by-Network
5.3 High-Dimensional Environments
5.3.1 Overview of Model-Based Experiments
5.3.2 Small Navigation Tasks
5.3.3 Robotic Applications
5.3.4 Atari Game Applications
5.3.5 Hands On: PlaNet Example
Summary and Further Reading
Summary
Further Reading
Exercises
Questions
Exercises
References
6 Two-Agent Self-Play
Core Concepts
Core Problem
Core Algorithms
Self-Play in Games
6.1 Two-Agent Zero-Sum Problems
6.1.1 The Difficulty of Playing Go
6.1.2 AlphaGo Achievements
6.2 Tabula Rasa Self-Play Agents
6.2.1 Move-Level Self-Play
6.2.1.1 Minimax
6.2.1.2 Monte Carlo Tree Search
6.2.2 Example-Level Self-Play
6.2.2.1 Policy and Value Network
6.2.2.2 Stability and Exploration
6.2.3 Tournament-Level Self-Play
6.2.3.1 Self-Play Curriculum Learning
6.2.3.2 Supervised and Reinforcement Curriculum Learning
6.3 Self-Play Environments
6.3.1 How to Design a World Class Go Program?
6.3.2 AlphaGo Zero Performance
6.3.3 AlphaZero
6.3.4 Open Self-Play Frameworks
6.3.5 Hands On: Hex in PolyGames Example
Summary and Further Reading
Summary
Further Reading
Exercises
Questions
Implementation: New or Make/Undo
Exercises
References
7 Multi-Agent Reinforcement Learning
Core Concepts
Core Problem
Core Algorithms
Self-driving Car
7.1 Multi-Agent Problems
Game Theory
Stochastic Games and Extensive-Form Games
Competitive, Cooperative, and Mixed Strategies
7.1.1 Competitive Behavior
7.1.2 Cooperative Behavior
7.1.2.1 Multi-Objective Reinforcement Learning
7.1.3 Mixed Behavior
7.1.3.1 Iterated Prisoner's Dilemma
7.1.4 Challenges
7.1.4.1 Partial Observability
7.1.4.2 Nonstationary Environments
7.1.4.3 Large State Space
7.2 Multi-Agent Reinforcement Learning Agents
7.2.1 Competitive Behavior
7.2.1.1 Counterfactual Regret Minimization
7.2.1.2 Deep Counterfactual Regret Minimization
7.2.2 Cooperative Behavior
7.2.2.1 Centralized Training/Decentralized Execution
7.2.2.2 Opponent Modeling
7.2.2.3 Communication
7.2.2.4 Psychology
7.2.3 Mixed Behavior
7.2.3.1 Evolutionary Algorithms
7.2.3.2 Swarm Computing
7.2.3.3 Population-Based Training
7.2.3.4 Self-play Leagues
7.3 Multi-Agent Environments
7.3.1 Competitive Behavior: Poker
7.3.2 Cooperative Behavior: Hide and Seek
7.3.3 Mixed Behavior: Capture the Flag and StarCraft
7.3.3.1 Capture the Flag
7.3.3.2 StarCraft
7.3.4 Hands On: Hide and Seek in the Gym Example
7.3.4.1 Multiplayer Environments
Summary and Further Reading
Summary
Further Reading
Exercises
Questions
Exercises
References
8 Hierarchical Reinforcement Learning
Core Concepts
Core Problem
Core Algorithms
Planning a Trip
8.1 Granularity of the Structure of Problems
8.1.1 Advantages
8.1.2 Disadvantages
8.1.2.1 Conclusion
8.2 Divide and Conquer for Agents
8.2.1 The Options Framework
8.2.1.1 Universal Value Function
8.2.2 Finding Subgoals
8.2.3 Overview of Hierarchical Algorithms
8.2.3.1 Tabular Methods
8.2.3.2 Deep Learning
8.3 Hierarchical Environments
8.3.1 Four Rooms and Robot Tasks
8.3.2 Montezuma's Revenge
8.3.3 Multi-Agent Environments
8.3.4 Hands On: Hierarchical Actor Critic Example
Summary and Further Reading
Summary
Further Reading
Exercises
Questions
Exercises
References
9 Meta-Learning
Core Concepts
Core Problem
Core Algorithms
Foundation Models
9.1 Learning to Learn Related Problems
9.2 Transfer Learning and Meta-Learning Agents
9.2.1 Transfer Learning
9.2.1.1 Task Similarity
9.2.1.2 Pretraining and Finetuning
9.2.1.3 Hands On: Pretraining Example
9.2.1.4 Multi-Task Learning
9.2.1.5 Domain Adaptation
9.2.2 Meta-Learning
9.2.2.1 Evaluating Few-Shot Learning Problems
9.2.2.2 Deep Meta-Learning Algorithms
9.2.2.3 Inner and Outer Loop Optimization
9.2.2.4 Recurrent Meta-Learning
9.2.2.5 Model-Agnostic Meta-Learning
9.2.2.6 Hyperparameter Optimization
9.2.2.7 Meta-Learning and Curriculum Learning
9.2.2.8 From Few-Shot to Zero-Shot Learning
9.3 Meta-Learning Environments
9.3.1 Image Processing
9.3.2 Natural Language Processing
9.3.3 Meta-Dataset
9.3.4 Meta-World
9.3.5 Alchemy
9.3.6 Hands On: Meta-World Example
Summary and Further Reading
Summary
Further Reading
Exercises
Questions
Exercises
References
10 Further Developments
10.1 Development of Deep Reinforcement Learning
10.1.1 Tabular Methods
10.1.2 Model-Free Deep Learning
10.1.3 Multi-Agent Methods
10.1.4 Evolution of Reinforcement Learning
10.2 Main Challenges
10.2.1 Latent Models
10.2.2 Self-Play
10.2.3 Hierarchical Reinforcement Learning
10.2.4 Transfer Learning and Meta-Learning
10.2.5 Population-Based Methods
10.2.6 Exploration and Intrinsic Motivation
10.2.7 Explainable AI
10.2.8 Generalization
10.3 The Future of Artificial Intelligence
References
A Mathematical Background
A.1 Sets and Functions
A.1.1 Sets
A.1.1.1 Discrete Set
A.1.1.2 Continuous Set
A.1.1.3 Conditioning a Set
A.1.1.4 Cardinality and Dimensionality
A.1.1.5 Cartesian Product
A.1.2 Functions
A.2 Probability Distributions
A.2.1 Discrete Probability Distributions
A.2.1.1 Parameters
A.2.1.2 Representing Discrete Random Variables
A.2.2 Continuous Probability Distributions
A.2.2.1 Parameters
A.2.3 Conditional Distributions
A.2.4 Expectation
A.2.4.1 Expectation of a Random Variable
A.2.4.2 Expectation of a Function of a Random Variable
A.2.5 Information Theory
A.2.5.1 Information
A.2.5.2 Entropy
A.2.5.3 Cross-Entropy
A.2.5.4 Kullback–Leibler Divergence
A.3 Derivative of an Expectation
A.4 Bellman Equations
References
B Deep Supervised Learning
B.1 Machine Learning
B.1.1 Training Set and Test Set
B.1.2 Curse of Dimensionality
B.1.3 Overfitting and the Bias–Variance Trade-Off
B.1.3.1 Regularization—the World Is Smooth
B.2 Deep Learning
B.2.1 Weights, Neurons
B.2.2 Backpropagation
B.2.2.1 Loss Function
B.2.3 End-to-End Feature Learning
B.2.3.1 Function Approximation
B.2.4 Convolutional Networks
B.2.4.1 Shared Weights
B.2.4.2 CNN Architecture
B.2.4.3 Max Pooling
B.2.5 Recurrent Networks
B.2.5.1 Long Short-Term Memory
B.2.6 More Network Architectures
B.2.6.1 Residual Networks
B.2.6.2 Generative Adversarial Networks
B.2.6.3 Autoencoders
B.2.6.4 Attention Mechanism
B.2.6.5 Transformers
B.2.7 Overfitting
B.3 Datasets and Software
B.3.1 MNIST and ImageNet
B.3.1.1 ImageNet
B.3.2 GPU Implementations
B.3.3 Hands On: Classification Example
Excercise
B.3.3.1 Installing TensorFlow and Keras
B.3.3.2 Keras MNIST Example
Exercises
Questions
Exercises
References
C Deep Reinforcement Learning Suites
C.1 Environments
C.2 Agent Algorithms
C.3 Deep Learning Suites
References
Glossary
Glossary
Index

📜 SIMILAR VOLUMES

Deep Reinforcement Learning

📁 Deep Reinforcement Learning

✍ Aske Plaat 📂 Library 📅 2022 🏛 Springer 🌐 English

<span>Deep reinforcement learning has attracted considerable attention recently. Impressive results have been achieved in such diverse fields as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to understand proble

Deep reinforcement learning

📁 Deep reinforcement learning

✍ Sewak M 📂 Library 📅 2019 🏛 Springer 🌐 English

Deep Reinforcement Learning

📁 Deep Reinforcement Learning

✍ Aske Plaat 📂 Library 📅 2022 🏛 Springer Nature 🌐 English

Grokking Deep Reinforcement Learning

📁 Grokking Deep Reinforcement Learning

✍ Miguel Morales 📂 Library 📅 2020 🏛 Manning Publications 🌐 English

<div> <p>We all learn through trial and error. We avoid the things that cause us to experience pain and failure. We embrace and build on the things that give us reward and success. This common pattern is the foundation of deep reinforcement learning: building machine learning systems that explore a

Grokking Deep Reinforcement Learning

📁 Grokking Deep Reinforcement Learning

✍ Miguel Morales 📂 Library 📅 2020 🏛 Manning Publications 🌐 English

We all learn through trial and error. We avoid the things that cause us to experience pain and failure. We embrace and build on the things that give us reward and success. This common pattern is the foundation of deep reinforcement learning: building machine learning systems that explore and learn b

Deep Reinforcement Learning Hands-On

📁 Deep Reinforcement Learning Hands-On

✍ Lapan 📂 Library 📅 0 🌐 English