Iason Chrysomallis, "Deep reinforcement learning exploiting a mentor's guidance", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2021
https://doi.org/10.26233/heallink.tuc.90213
Imitation is a popular technique of behavioral learning widely practiced in nature. The most famous applications involve animal babies imitating their parents, with imitation providing the stepping stone to walk their first steps in life survival. Additionally, imitation examples can be found in cross species instances, with most known samples the voice imitation of parrots or crow behavioral imitation.The imitation learning paradigm has naturally been taken up in machine learning applications, implemented in supervised learning and in reinforcement learning, mostly with the use of explicit imitation, where the mentor agent attempts to explicitly teach learners. Implicit imitation, on the other hand, assumes that learning agents observe the state transitions of an agent they use as a mentor, and try to recreate them based on their own abilities and knowledge of their environment. Though it has also been employed with some success in the past, implicit imitation has only recently been utilized in conjunction with deep reinforcement learning, the current leading reinforcement learning paradigm. In this thesis, we enhance the operation of implicit imitation by adding four state-of-the-art deep reinforcement learning algorithms, treated as "imitation optimization modules". These include Double Deep Q-network [Hasselt, Guez, and Silver, 2016], Prioritized Experience Replay [Schaul et al., 2016], Dueling Network Architecture [Wang et al., 2016] and Parameter Space Noise for Exploration [Plappert et al., 2018]. We modify these appropriately to better fit the implicit imitation learning paradigm.By enabling and disabling those methods we create diverse combinations of them; systematically test and compare the viability of each one of these combinations; and end up with a clear "winner": the combination of Double Deep Q-network, Prioritized Experience Replay and Dueling Network Architecture.