New algorithms of the Q-learning type
โ
Shalabh Bhatnagar; K. Mohan Babu
๐
Article
๐
2008
๐
Elsevier Science
๐
English
โ 284 KB
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the 'current' randomized polic