Model–free least–squares policy iteration

Lagoudakis Michael, Parr, R.

Πλήρης Εγγραφή

URI:

http://purl.tuc.gr/dl/dias/CDADBEEF-15F4-44B5-89B2-295FEC71FDAE

Έτος

2001

Τύπος

Πλήρης Δημοσίευση σε Συνέδριο

Άδεια Χρήσης

Λεπτομέρειες

Βιβλιογραφική Αναφορά

M. G. Lagoudakis and R. Parr. (2001, Dec.).Model–free least–squares policy iteration. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.4345&rep=rep1&type=pdf

Εμφανίζεται στις Συλλογές

Δημοσιεύσεις σε Συνέδρια στην Κοινότητα Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Περίληψη

We propose a new approach to reinforcement learning which combinesleast squares function approximation with policy iteration. Ourmethod is model-free and completely o policy. We are motivatedby the least squares temporal dierence learning algorithm (LSTD),which is known for its ecient use of sample experiences comparedto pure temporal dierence algorithms. LSTD is ideal for predictionproblems, however it heretofore has not had a straightforward applicationto control problems. Moreover, approximations learned by LSTDare strongly inuenced by the visitation distribution over states. Ournew algorithm, Least-Squares Policy Iteration (LSPI) addresses theseissues. The result is an o-policy method which can use (or reuse)data collected from any source. We test LSPI on several problems,including a bicycle simulator in which it learns to guide the bicycleto a goal eciently by merely observing a relatively small number ofcompletely random trials.

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Model–free least–squares policy iteration

Lagoudakis Michael, Parr, R.

Περίληψη

Υπηρεσίες

Εξαγωγή

Κοινοποίηση

Στατιστικά

Μεταδεδομένων & Περιεχομένου σε METS:

Μεταδεδομένων σε Μορφότυπο: