<span>Deep reinforcement learning has attracted considerable attention recently. Impressive results have been achieved in such diverse fields as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to understand proble
Deep reinforcement learning
✍ Scribed by Sewak M
- Publisher
- Springer
- Year
- 2019
- Tongue
- English
- Leaves
- 215
- Category
- Library
No coin nor oath required. For personal study only.
✦ Table of Contents
Preface......Page 5
Who This Book Is For?......Page 7
What This Book Covers?......Page 8
Contents......Page 10
About the Author......Page 16
1.1 What Is Artificial Intelligence and How Does Reinforcement Learning Relate to It?......Page 17
1.2 Understanding the Basic Design of Reinforcement Learning......Page 18
1.3.1 Future Rewards......Page 19
1.3.3 Attribution of Rewards to Different Actions Taken in the Past......Page 20
1.3.5 Dealing with Different Types of Reward......Page 22
1.4 The State in Reinforcement Learning......Page 23
1.4.1 Let Us Score a Triplet in Tic-Tac-Toe......Page 24
1.4.2 Let Us Balance a Pole on a CartCartPole (The CartPole Problem)......Page 26
1.4.3.1 A Quick Intro to the Vision-Related Reinforcement Learning Problems......Page 27
1.4.3.3 The Vision ChallengeCNN While Playing Graphical GamesCNN......Page 28
1.4.3.4 Example of State Formulation for Graphical Games......Page 29
1.5 The Agent in Reinforcement Learning......Page 30
1.5.2 The Action–Value/Q-FunctionAction–Value......Page 31
1.5.4 The Policy and the On-Policy and Off-Policy Approaches......Page 32
1.6 Summary......Page 33
2.1 The Markov Decision Process (MDP)......Page 35
2.1.1 MDP Notations in Tuple Format......Page 36
2.2 The Bellman Equation......Page 37
2.2.1 Bellman Equation for Estimating the Value Function......Page 38
2.2.2 Bellman Equation for estimating the Action–Value/Q-function......Page 39
2.3.1 About Dynamic Programming......Page 40
2.4.1 Bellman Equation for Optimal Value Function and Optimal Policy......Page 41
2.4.2 Value Iteration and Synchronous and Asynchronous Update modes......Page 42
2.5 Summary......Page 43
3.1.1 Understanding the Grid-World......Page 44
3.1.2 Permissible State TransitionsState Transition Probability in Grid-World......Page 45
3.2.1 Inheriting an Environment Class or Building a Custom Environment Class......Page 46
3.2.2 Recipes to Build Our Own Custom Environment ClassGym......Page 47
3.3 Platform Requirements and Project Structure for the Code......Page 49
3.4 Code for Creating the Grid-World Environment......Page 51
3.5 Code for the Value Iteration Approach of Solving the Grid-World......Page 56
3.6 Code for the Policy Iteration Approach of Solving the Grid-World......Page 59
3.7 Summary......Page 64
4.1 Challenges with Classical DP......Page 65
4.2 Model-Based and Model-Free Approaches......Page 66
4.3 Temporal Difference (TD) Learning......Page 67
4.3.1 Estimation and Control Problems of Reinforcement Learning......Page 68
4.3.2 TD (0)TD Lambda......Page 69
4.3.3 TD (λ)TD (0) and Eligibility Trace......Page 70
4.4 SARSA......Page 71
4.5 Q-Learning......Page 72
4.6.2 Time Adaptive “epsilon”epsilon-greedy Algorithms (e.g., Annealing εTime Adaptive “epsilon”)......Page 74
4.6.5 Which Bandit Algorithm Should We Use?......Page 76
4.7 Summary......Page 77
5.1 Project Structure and Dependencies......Page 78
5.2.1 Imports and Logging (file Q_Lerning.py)......Page 80
5.2.2 Code for the Behavior Policy Class......Page 81
5.2.3 Code for the Q-Learning Agent’s Class......Page 83
5.2.5 Code for Custom Exceptions (File rl_exceptions.py)......Page 86
5.3 Training Statistics Plot......Page 87
6.1 Artificial Neurons—The Building Blocks of Deep Learning......Page 88
6.2 Feed-Forward Deep Neural Networks (DNN)......Page 90
6.2.1 Feed-Forward Mechanism in Deep Neural Networks......Page 92
6.3.1 Activation Functions in Deep Learning......Page 93
6.3.2 Loss Functions in Deep Learning......Page 95
6.3.3 Optimizers in Deep Learning......Page 96
6.4 Convolutional Neural Networks—Deep Learning for Vision......Page 97
6.4.1 Convolutional Layer......Page 98
6.4.3 Flattened and Fully Connected Layers......Page 99
6.5 Summary......Page 100
7.1 You Are not Alone!......Page 102
7.2.2 OpenAI Gym......Page 104
7.2.6 Garage......Page 105
7.3.3 Keras-RL......Page 106
7.3.5 RLlib......Page 107
8.1 General Artificial Intelligence......Page 108
8.2 An Introduction to “Google Deep Mind” and “AlphaGoGoogle DeepMind”......Page 109
8.3 The DQN Algorithm......Page 111
8.3.1 Experience Replay......Page 113
8.3.1.1 Prioritized Experience Replay......Page 114
8.3.1.2 Skipping Frames......Page 115
8.3.3 Clipping Rewards and Penalties......Page 116
8.4 Double DQNDDQN......Page 117
8.5 Dueling DQN......Page 118
8.6 Summary......Page 121
9.1 Project Structure and Dependencies......Page 122
9.2 Code for the Double DQN Agent (File: DoubleDQN.py)......Page 124
9.2.1 Code for the Behavior Policy Class (File: behavior_policy.py)......Page 132
9.2.2 Code for the Experience Replay Memory Class (File: experience_replay.py)......Page 136
9.3 Training Statistics Plots......Page 138
10.1 Introduction to Policy-Based Approaches and Policy Approximation......Page 140
10.2 Broad Difference Between Value-Based and Policy-Based Approaches......Page 142
10.3 Problems with Calculating the Policy Gradient......Page 145
10.4 The REINFORCE Algorithm......Page 146
10.4.2 Pseudocode for the REINFORCE Algorithm......Page 148
10.5.1 Cumulative Future Reward-Based Attribution......Page 149
10.5.2 Discounted Cumulative Future Rewards......Page 150
10.5.3 REINFORCE with Baseline......Page 151
10.7 Summary......Page 152
11.1 Introduction to Actor-Critic Methods......Page 154
11.2 Conceptual Design of the Actor-Critic Method......Page 156
11.3 Architecture for the Actor-Critic Implementation......Page 157
11.3.1 Actor-Critic Method and the (Dueling) DQN......Page 159
11.3.2 Advantage Actor-Critic Model Architecture......Page 161
11.4 Asynchronous Advantage Actor-Critic Implementation (A3C)......Page 162
11.5 (Synchronous) Advantage Actor-Critic Implementation (A2C)......Page 163
11.6 Summary......Page 165
12.1 Project Structure and Dependencies......Page 166
12.2 Code (A3C_Master—File: a3c_master.py)......Page 169
12.2.1 A3C_Worker (File: a3c_worker.py)......Page 173
12.2.2 Actor-Critic (TensorFlow) Model (File: actorcritic_model.py)......Page 179
12.2.3 SimpleListBasedMemory (File: experience_replay.py)......Page 181
12.3 Training Statistics Plots......Page 184
13.1 Deterministic Policy Gradient (DPG)......Page 186
13.1.1 Advantages of Deterministic Policy Gradient Over Stochastic Policy Gradient......Page 188
13.1.2 Deterministic Policy Gradient Theorem......Page 189
13.2 Deep Deterministic Policy Gradient (DDPG)......Page 191
13.2.1 Deep Learning Implementation-Related Modifications in DDPG......Page 192
13.2.2 DDPG Algorithm Pseudo-Code......Page 195
13.3 Summary......Page 196
14.1 High-Level Wrapper Libraries for Reinforcement Learning......Page 198
14.3 Project Structure and Dependencies......Page 199
14.4 Code (File: ddpg_continout_action.py)......Page 201
14.5 Agent Playing the “MountainCarContinous-v0” Environment......Page 204
Bibliography......Page 205
Index......Page 209
📜 SIMILAR VOLUMES
Deep reinforcement learning has attracted considerable attention recently. Impressive results have been achieved in such diverse fields as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to understand problems tha
Deep reinforcement learning has attracted considerable attention recently. Impressive results have been achieved in such diverse fields as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to understand problems tha
<div> <p>We all learn through trial and error. We avoid the things that cause us to experience pain and failure. We embrace and build on the things that give us reward and success. This common pattern is the foundation of deep reinforcement learning: building machine learning systems that explore a
We all learn through trial and error. We avoid the things that cause us to experience pain and failure. We embrace and build on the things that give us reward and success. This common pattern is the foundation of deep reinforcement learning: building machine learning systems that explore and learn b