✦ LIBER ✦

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

✍ Scribed by Peter L. Bartlett; Jonathan Baxter

Publisher: Elsevier Science
Year: 2002
Tongue: English
Weight: 156 KB
Volume: 64
Category: Article
ISSN: 0022-0000
DOI: 10.1006/jcss.2001.1793

No coin nor oath required. For personal study only.

✦ Synopsis

We model reinforcement learning as the problem of learning to control a partially observable Markov decision process (POMDP) and focus on gradient ascent approaches to this problem. In an earlier work (2001, J. Artificial Intelligence Res. 14) we introduced GPOMDP, an algorithm for estimating the performance gradient of a POMDP from a single sample path, and we proved that this algorithm almost surely converges to an approximation to the gradient. In this paper, we provide a convergence rate for the estimates produced by GPOMDP and give an improved bound on the approximation error of these estimates. Both of these bounds are in terms of mixing times of the POMDP.

📜 SIMILAR VOLUMES

Reinforcement learning using a stochasti

Reinforcement learning using a stochastic gradient method with memory-based learning

✍ Takafumi Yamada; Satoshi Yamaguchi 📂 Article 📅 2010 🏛 John Wiley and Sons 🌐 English ⚖ 423 KB

Supervised and Reinforcement Evolutionar

Supervised and Reinforcement Evolutionary Learning for Wavelet-based Neuro-fuzzy Networks

✍ Cheng-Jian Lin; Yong-Cheng Liu; Chi-Yung Lee 📂 Article 📅 2008 🏛 Springer Netherlands 🌐 English ⚖ 507 KB

Robust multiscale algorithms for gradien

Robust multiscale algorithms for gradient-based motion estimation

✍ Qing-Hua Lu; Xian-Min Zhang 📂 Article 📅 2007 🏛 John Wiley and Sons 🌐 English ⚖ 367 KB

Gradient-based techniques represent a very popular class of approaches to estimate motions. A robust multiscale algorithm of hierarchical estimation for gradient-based motion estimation is proposed in this article using a combination of robust statistical method and multiscale technique. In such a m

Approximation accuracy, gradient methods

Approximation accuracy, gradient methods, and error bound for structured convex optimization

✍ Paul Tseng 📂 Article 📅 2010 🏛 Springer-Verlag 🌐 English ⚖ 471 KB

Reinforcement-learning-based self-organi

Reinforcement-learning-based self-organisation for cell configuration in multimedia mobile networks

✍ Liao, Ching-Yu ;Yu, Fei ;Leung, Victor C. M. ;Chang, Chung-Ju 📂 Article 📅 2005 🏛 John Wiley and Sons 🌐 English ⚖ 201 KB

A reinforcement learning-based scheme fo

A reinforcement learning-based scheme for direct adaptive optimal control of linear stochastic systems

✍ Wee Chin Wong; Jay H. Lee 📂 Article 📅 2009 🏛 John Wiley and Sons 🌐 English ⚖ 169 KB