๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

โœ Scribed by Peter L. Bartlett; Jonathan Baxter


Publisher
Elsevier Science
Year
2002
Tongue
English
Weight
156 KB
Volume
64
Category
Article
ISSN
0022-0000

No coin nor oath required. For personal study only.

โœฆ Synopsis


We model reinforcement learning as the problem of learning to control a partially observable Markov decision process (POMDP) and focus on gradient ascent approaches to this problem. In an earlier work (2001, J. Artificial Intelligence Res. 14) we introduced GPOMDP, an algorithm for estimating the performance gradient of a POMDP from a single sample path, and we proved that this algorithm almost surely converges to an approximation to the gradient. In this paper, we provide a convergence rate for the estimates produced by GPOMDP and give an improved bound on the approximation error of these estimates. Both of these bounds are in terms of mixing times of the POMDP.


๐Ÿ“œ SIMILAR VOLUMES


Robust multiscale algorithms for gradien
โœ Qing-Hua Lu; Xian-Min Zhang ๐Ÿ“‚ Article ๐Ÿ“… 2007 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 367 KB

Gradient-based techniques represent a very popular class of approaches to estimate motions. A robust multiscale algorithm of hierarchical estimation for gradient-based motion estimation is proposed in this article using a combination of robust statistical method and multiscale technique. In such a m