Ιδρυματικό Αποθετήριο
Πολυτεχνείο Κρήτης
EN  |  EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Model–free least–squares policy iteration

Lagoudakis Michael, Parr, R.

Πλήρης Εγγραφή


URI: http://purl.tuc.gr/dl/dias/CDADBEEF-15F4-44B5-89B2-295FEC71FDAE
Έτος 2001
Τύπος Πλήρης Δημοσίευση σε Συνέδριο
Άδεια Χρήσης
Λεπτομέρειες
Βιβλιογραφική Αναφορά M. G. Lagoudakis and R. Parr. (2001, Dec.).Model–free least–squares policy iteration. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.4345&rep=rep1&type=pdf
Εμφανίζεται στις Συλλογές

Περίληψη

We propose a new approach to reinforcement learning which combinesleast squares function approximation with policy iteration. Ourmethod is model-free and completely o policy. We are motivatedby the least squares temporal dierence learning algorithm (LSTD),which is known for its ecient use of sample experiences comparedto pure temporal dierence algorithms. LSTD is ideal for predictionproblems, however it heretofore has not had a straightforward applicationto control problems. Moreover, approximations learned by LSTDare strongly inuenced by the visitation distribution over states. Ournew algorithm, Least-Squares Policy Iteration (LSPI) addresses theseissues. The result is an o-policy method which can use (or reuse)data collected from any source. We test LSPI on several problems,including a bicycle simulator in which it learns to guide the bicycleto a goal eciently by merely observing a relatively small number ofcompletely random trials.

Υπηρεσίες

Στατιστικά