URI | http://purl.tuc.gr/dl/dias/2F95F669-B215-44BD-90AF-6176BD490AA9 | - |
Αναγνωριστικό | http://arxiv.org/ftp/arxiv/papers/1301/1301.0580.pdf | - |
Γλώσσα | en | - |
Μέγεθος | 10 pages | en |
Τίτλος | Value function approximation in zero–sum Markov games | en |
Δημιουργός | Lagoudakis Michael | en |
Δημιουργός | Λαγουδακης Μιχαηλ | el |
Δημιουργός | Parr,R. | en |
Περίληψη | This paper investigates value function approximation
in the context of zero-sum Markov
games, which can be viewed as a generalization
of the Markov decision process (MDP) framework
to the two-agent case. We generalize error
bounds from MDPs to Markov games and
describe generalizations of reinforcement learning
algorithms to Markov games. We present
a generalization of the optimal stopping problem
to a two-player simultaneous move Markov
game. For this special problem, we provide
stronger bounds and can guarantee convergence
for LSTD and temporal difference learning with
linear value function approximation. We demonstrate
the viability of value function approximation
for Markov games by using the Least squares
policy iteration (LSPI) algorithm to learn good
policies for a soccer domain and a flow control
problem. | en |
Τύπος | Πλήρης Δημοσίευση σε Συνέδριο | el |
Τύπος | Conference Full Paper | en |
Άδεια Χρήσης | http://creativecommons.org/licenses/by/4.0/ | en |
Ημερομηνία | 2015-11-13 | - |
Ημερομηνία Δημοσίευσης | 2002 | - |
Θεματική Κατηγορία | Artificial Intelligence | en |
Βιβλιογραφική Αναφορά | M.G. Lagoudakis and R. Parr. (2002, Aug.). Value function approximation in zero–sum Markov games. [Online]. Available: http://arxiv.org/ftp/arxiv/papers/1301/1301.0580.pdf | en |