Deep reinforcement learning exploiting a mentor's guidance

Chrysomallis Iason

Full record

URI:

http://purl.tuc.gr/dl/dias/DB5990C1-E4C8-422D-892D-786C001A6813

Year

2021

Type of Item

Diploma Work

License

Details

Bibliographic Citation

Iason Chrysomallis, "Deep reinforcement learning exploiting a mentor's guidance", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2021 https://doi.org/10.26233/heallink.tuc.90213

Appears in Collections

Diploma Works in Community School of Electrical and Computer Engineering

Diploma Works in Community Intelligent Systems Laboratory

Summary

Imitation is a popular technique of behavioral learning widely practiced in nature. The most famous applications involve animal babies imitating their parents, with imitation providing the stepping stone to walk their first steps in life survival. Additionally, imitation examples can be found in cross species instances, with most known samples the voice imitation of parrots or crow behavioral imitation.The imitation learning paradigm has naturally been taken up in machine learning applications, implemented in supervised learning and in reinforcement learning, mostly with the use of explicit imitation, where the mentor agent attempts to explicitly teach learners. Implicit imitation, on the other hand, assumes that learning agents observe the state transitions of an agent they use as a mentor, and try to recreate them based on their own abilities and knowledge of their environment. Though it has also been employed with some success in the past, implicit imitation has only recently been utilized in conjunction with deep reinforcement learning, the current leading reinforcement learning paradigm. In this thesis, we enhance the operation of implicit imitation by adding four state-of-the-art deep reinforcement learning algorithms, treated as "imitation optimization modules". These include Double Deep Q-network [Hasselt, Guez, and Silver, 2016], Prioritized Experience Replay [Schaul et al., 2016], Dueling Network Architecture [Wang et al., 2016] and Parameter Space Noise for Exploration [Plappert et al., 2018]. We modify these appropriately to better fit the implicit imitation learning paradigm.By enabling and disabling those methods we create diverse combinations of them; systematically test and compare the viability of each one of these combinations; and end up with a clear "winner": the combination of Double Deep Q-network, Prioritized Experience Replay and Dueling Network Architecture.

Search

Browse

My Space

Deep reinforcement learning exploiting a mentor's guidance

Chrysomallis Iason

Summary

Available Files

Services

Export

Share

Statistics

Metadata & Content in a METS Package:

Metadata in Format: