✦ LIBER ✦

A reinforcement learning method using a dynamic reinforcement function based on action selection probability

✍ Scribed by Yugo Hasegawa; Satoko Takada; Hidehiro Nakano; Shuichi Arai; Arata Miyauchi

Publisher: John Wiley and Sons
Year: 2007
Tongue: English
Weight: 795 KB
Volume: 38
Category: Article
ISSN: 0882-1666
DOI: 10.1002/scj.20738

No coin nor oath required. For personal study only.

✦ Synopsis

Abstract

In this paper, the authors propose Dynamic Profit Sharing as a reinforcement learning method in which a reinforcement function in Profit Sharing (PS) is dynamically changed based on action selection probabilities. While the rationality theorem in Profit Sharing gives a necessary and sufficient condition for obtaining rational solutions [1], the proposed method gives a condition for improving the learning efficiency while stochastically maintaining sufficient rationality. By dynamically determining the reinforcement function that satisfies this condition, the reward distribution efficiency can be increased and learning can be accomplished quickly even for an environment in which a great many actions are required until the goal state is reached. The authors perform experiments using maze and pursuit problems as examples to verify the effectiveness of the proposed method. © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(7): 1– 11, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20738

📜 SIMILAR VOLUMES

Reinforcement learning using a stochasti

Reinforcement learning using a stochastic gradient method with memory-based learning

✍ Takafumi Yamada; Satoshi Yamaguchi 📂 Article 📅 2010 🏛 John Wiley and Sons 🌐 English ⚖ 423 KB

Learning to reach by reinforcement learn

Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions

✍ Minija Tamosiunaite; Tamim Asfour; Florentin Wörgötter 📂 Article 📅 2009 🏛 Springer-Verlag 🌐 English ⚖ 832 KB

A reinforcement learning method based on

A reinforcement learning method based on an immune network adapted to a semi-Markov decision process

✍ Nagahisa Kogawa; Masanao Obayashi; Kunikazu Kobayashi; Takashi Kuremoto 📂 Article 📅 2009 🏛 Springer Japan 🌐 English ⚖ 448 KB