Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

On the locality of action domination in sequential decision making

Rachelson, Emmanuel, Lagoudakis Michael

Full record


URI: http://purl.tuc.gr/dl/dias/E0292307-A486-42F6-A1D4-8BF6498753E2
Year 2010
Type of Item Conference Full Paper
License
Details
Bibliographic Citation E. Rachelson and Michail G. Lagoudakis. (2010, Jan.). On the locality of action domination in sequential decision making. Presented at 11th International Symposium on Artificial Intelligence and Mathematics (ISAIM). [Online]. Available: http://www.researchgate.net/profile/Emmanuel_Rachelson/publication/221186156_On_the_locality_of_action_domination_in_sequential_decision_making/links/0fcfd5051c4eaad94f000000.pdf
Appears in Collections

Summary

In the field of sequential decision making and reinforcementlearning, it has been observed that good policies for mostproblems exhibit a significant amount of structure. In practice,this implies that when a learning agent discovers an actionis better than any other in a given state, this action actuallyhappens to also dominate in a certain neighbourhoodaround that state. This paper presents new results provingthat this notion of locality in action domination can be linkedto the smoothness of the environment’s underlying stochasticmodel. Namely, we link the Lipschitz continuity of a MarkovDecision Process to the Lispchitz continuity of its policies’value functions and introduce the key concept of influence radiusto describe the neighbourhood of states where the dominatingaction is guaranteed to be constant. These ideas aredirectly exploited into the proposed Localized Policy Iteration(LPI) algorithm, which is an active learning version ofRollout-based Policy Iteration. Preliminary results on the InvertedPendulum domain demonstrate the viability and thepotential of the proposed approach.

Services

Statistics