𝔖 Bobbio Scriptorium
✦   LIBER   ✦

A reinforcement learning method using a dynamic reinforcement function based on action selection probability

✍ Scribed by Yugo Hasegawa; Satoko Takada; Hidehiro Nakano; Shuichi Arai; Arata Miyauchi


Publisher
John Wiley and Sons
Year
2007
Tongue
English
Weight
795 KB
Volume
38
Category
Article
ISSN
0882-1666

No coin nor oath required. For personal study only.

✦ Synopsis


Abstract

In this paper, the authors propose Dynamic Profit Sharing as a reinforcement learning method in which a reinforcement function in Profit Sharing (PS) is dynamically changed based on action selection probabilities. While the rationality theorem in Profit Sharing gives a necessary and sufficient condition for obtaining rational solutions [1], the proposed method gives a condition for improving the learning efficiency while stochastically maintaining sufficient rationality. By dynamically determining the reinforcement function that satisfies this condition, the reward distribution efficiency can be increased and learning can be accomplished quickly even for an environment in which a great many actions are required until the goal state is reached. The authors perform experiments using maze and pursuit problems as examples to verify the effectiveness of the proposed method. Β© 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(7): 1– 11, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20738


πŸ“œ SIMILAR VOLUMES