✦ LIBER ✦

Bounded-parameter Markov decision processes

✍ Scribed by Robert Givan; Sonia Leach; Thomas Dean

Publisher: Elsevier Science
Year: 2000
Tongue: English
Weight: 375 KB
Volume: 122
Category: Article
ISSN: 0004-3702
DOI: 10.1016/s0004-3702(00)00047-3

No coin nor oath required. For personal study only.

✦ Synopsis

In this paper, we introduce the notion of a bounded-parameter Markov decision process (BMDP) as a generalization of the familiar exact MDP. A bounded-parameter MDP is a set of exact MDPs specified by giving upper and lower bounds on transition probabilities and rewards (all the MDPs in the set share the same state and action space). BMDPs form an efficiently solvable special case of the already known class of MDPs with imprecise parameters (MDPIPs). Bounded-parameter MDPs can be used to represent variation or uncertainty concerning the parameters of sequential decision problems in cases where no prior probabilities on the parameter values are available. Boundedparameter MDPs can also be used in aggregation schemes to represent the variation in the transition probabilities for different base states aggregated together in the same aggregate state.

We introduce interval value functions as a natural extension of traditional value functions. An interval value function assigns a closed real interval to each state, representing the assertion that the value of that state falls within that interval. An interval value function can be used to bound the performance of a policy over the set of exact MDPs associated with a given bounded-parameter MDP. We describe an iterative dynamic programming algorithm called interval policy evaluation that computes an interval value function for a given BMDP and specified policy. Interval policy evaluation on a policy π computes the most restrictive interval value function that is sound, i.e., that bounds the value function for π in every exact MDP in the set defined by the bounded-parameter MDP. We define optimistic and pessimistic criteria for optimality, and provide a variant of value iteration (Bellman, 1957) that we call interval value iteration that computes policies for a BMDP that are optimal with respect to these criteria. We show that each algorithm we present converges to the desired values in a polynomial number of iterations given a fixed discount factor.

📜 SIMILAR VOLUMES

Second order bounds for Markov Decision

Second order bounds for Markov Decision Processes

✍ L.C Thomas 📂 Article 📅 1981 🏛 Elsevier Science 🌐 English ⚖ 167 KB

Bounding reward measures of Markov model

Bounding reward measures of Markov models using the Markov decision processes

✍ Peter Buchholz 📂 Article 📅 2011 🏛 John Wiley and Sons 🌐 English ⚖ 388 KB

## SUMMARY For a Markov reward process, where upper and lower bounds for the transition rates and rewards are known, a new approach to bound the expected reward is presented. Based on a previous paper where sharp bounds have been defined for the problem, but only an inefficient and unstable algorit

Markov ratio decision processes

✍ V. Aggarwal; R. Chandrasekaran; K. P. K. Nair 📂 Article 📅 1977 🏛 Springer 🌐 English ⚖ 466 KB

On constrained Markov decision processes

✍ Moshe Haviv 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 330 KB

Semi-infinite Markov decision processes

✍ Ming Chen; Jerzy A. Filar; Ke Liu 📂 Article 📅 2000 🏛 Springer 🌐 English ⚖ 184 KB

Denumerable state nonhomogeneous Markov

Denumerable state nonhomogeneous Markov decision processes

✍ James C. Bean; Robert L. Smith; Jean B. Lasserre 📂 Article 📅 1990 🏛 Elsevier Science 🌐 English ⚖ 708 KB