Ιδρυματικό Αποθετήριο
Πολυτεχνείο Κρήτης

EN | EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Είσοδος

Least-squares methods in reinforcement learning for control

Lagoudakis Michael, Parr Ronald, Littman Michael L.

Απλή Εγγραφή

URI	http://purl.tuc.gr/dl/dias/E2365192-8D33-48EE-8069-2097118F293B	-
Αναγνωριστικό	https://doi.org/10.1007/3-540-46014-4_23	-
Αναγνωριστικό	https://users.cs.duke.edu/~parr/setn02.pdf	-
Γλώσσα	en	-
Μέγεθος	12 pages	en
Τίτλος	Least-squares methods in reinforcement learning for control	en
Δημιουργός	Lagoudakis Michael	en
Δημιουργός	Λαγουδακης Μιχαηλ	el
Δημιουργός	Parr Ronald	en
Δημιουργός	Littman Michael L.	en
Περίληψη	Least-squares methods have been successfully used for prediction problems in the context of reinforcement learning, but little has been done in extending these methods to control problems. This paper presents an overview of our research efforts in using least-squares techniques for control. In our early attempts, we considered a direct extension of the Least-Squares Temporal Difference (LSTD) algorithm in the spirit of Q-learning. Later, an effort to remedy some limitations of this algorithm (approximation bias, poor sample utilization) led to the Least- Squares Policy Iteration (LSPI) algorithm, which is a form of model-free approximate policy iteration and makes efficient use of training samples collected in any arbitrary manner. The algorithms are demonstrated on a variety of learning domains, including algorithm selection, inverted pendulum balancing, bicycle balancing and riding, multiagent learning in factored domains, and, recently, on two-player zero-sum Markov games and the game of Tetris.	en
Τύπος	Πλήρης Δημοσίευση σε Συνέδριο	el
Τύπος	Conference Full Paper	en
Άδεια Χρήσης	http://creativecommons.org/licenses/by-nc-nd/4.0/	en
Ημερομηνία	2015-11-17	-
Ημερομηνία Δημοσίευσης	2002	-
Βιβλιογραφική Αναφορά	M. Lagoudakis , R. Parr , M. Littman ," Least-Squares Methods in Reinforcement Learning for Control," in 2002 Proceedings of the 2nd Hellenic Conference on Artificial Intelligence (SETN), pp. 249-260, doi: 10.1007/3-540-46014-4_23	en

Υπηρεσίες

Στατιστικά

Copyright © DIAS 2013